Import AI

Import AI: 166: Dawn of the misleading ‘sophistbots’; $50k a year for studying long-term impacts of AI; and squeezing an RL drone policy into 3kb

Will powerful AI make the Turing Test obsolete?
And if it does, what do we do about it?…
The Turing Test – judging how sophisticated a machine is, by seeing if it can convince a person that it is a human – looms large in pop culture discussion about AI. What happens if we have systems today that can pass the Turing Test, but which aren’t actually that intelligent? That’s something that has started to happen recently with systems that a human interfaces with via text chat. Now, new research from Stanford University, Pennsylvania State University, and the University of Toronto, explores how increasingly advanced so-called ‘sophistbots’ might influence society.

The problems of ‘sophisbots’: The researchers imagine what the future of social media might look like, given recent advances in the ability for AI systems to generate synthetic media. In particular, they imagine social media ruled by “sophisbots”. They foresee a future where these bots are constantly “running in the ether of social media or other infrastructure…not bound by geography, culture or conscience.” 

So, what do we do? Technical solutions: Machine learning researchers should develop technical tools to help spot machines posing as humans, and should invest in work to detect the telltale signs of AI-generated things, along with systems to track down the provenance of content to be able to guarantee that something is ‘real’, and tools to make it easy for regular people to indicate that the content they themselves are putting online is authentic and not bot-generated.
   Policy approaches: We need to develop “public policy, legal, and normative frameworks for managing the malicious applications of technology in conjunction with efforts to refine it,” they write. “Let us as a technical community commit ourselves to embracing and addressing these challenges as readily as we do the fascinating and exciting new uses of intelligent systems”.

Why this matters: How we deal with the future of synthetic content will define the nature of ‘truth’ in society, which will ultimately define everything else. So, no pressure.
   Read more: How Relevant is the Turing Test in the Age of Sophisbots (Arxiv)

####################################################

Do Octopuses dream of electric sheep?
Apropos of nothing, here is a film of an octopus changing colors while sleeping.
   View the sleeping octopus here (Twitter).

####################################################

PHD student? Want $50k a year to study the long-term impacts of AI? Read on!
Check out the Open Philanthropy Project’s ‘AI Fellowship’…$50k for up to five years, with possibility of renewal…
Applications are now open for the Open Phil AI Fellowship. This program extends full support to a community of current & incoming PhD students, in any area of AI/ML, who are interested in making the long-term, large-scale impacts of AI a focus of their work.

The details:

  • Current and incoming PhD students may apply.
  • Up to 5 years of PhD support with the possibility of renewal for subsequent years
  • Students with pre-existing funding sources who find the mission and community of the Fellows Program appealing are welcome to apply
  • Annual support of $40,000 stipend, payment of tuition and fees, and $10,000 for travel, equipment, and other research expenses
  • Applications are due by October 25, 2019 at 11:59 PM Pacific time

In a note about this fellowship, a representative of the Open Philanthropy Project wrote: “We are committed to fostering a culture of inclusion, and encourage individuals with diverse backgrounds and experiences to apply; we especially encourage applications from women and minorities.”
   Find out more about the Fellowship here (Open Philanthropy website).

####################################################

Small drones with big brains: Harvard researchers apply deep RL to a ‘nanodrone’:
…No GPS? That won’t be a problem soon, once we have smart drones…
One of the best things that the nuclear disaster at Fukushima did for the world was highlight just how lacking contemporary robotics was: we could have avoided a full meltdown if we’d been able to get a robot or a drone into the facility. New research from Harvard, Google, Delft University, and the University of Texas at Austin suggests how we might make smart drones that can autonomously navigate in places where they might not have GPS. It’s a first step to developing the sorts of systems needed to be able to rapidly map and understand the sites of various disasters, and also – as with many omni-use AI technologies – a prerequisite for low-cost, lightweight, weapons systems. 

What they’ve done: “We introduce the first deep reinforcement learning (RL) based source-seeking nano-drone that is fully autonomous,” the researchers write. The drone is trained to seek a light source, and uses light sensors to help it triangulate this, as well as an optical flow-based sensor for flight stability. The drone is trained using the Deep Q-Network (DQN) algorithm in a simulator with the objective of closing the distance between itself and a light source. 

Shrinking network sizes: After training, they shrink down the resulting network (to 3kb, via quantization) and run it in the real world on a CrazyFlie nanodrone equipped with a CortexM4 chip – this is pretty impressive stuff, given the relative immaturity of RL for robot operation and the teeny-tiny compute envelope. “While we focus exclusively on light-seeking as our application in this paper, we believe that the general methodology we have developed for deep reinforcement learning-based source seeking… can be readily extended to other (source seeking) applications as well, they write. 

How well does it work? The researchers test out the drone in a bunch of different scenarios and average a success rate of 80% across 105 flight tests. In real world tests, the drone is able to deal with a variety of obstacles being introduced, as well as variations in its own position and the position of the lightsource. Now, 80% is a long way from good enough to use in a life or death situation, but it is meaningful enough to make this line of research worth paying attention to.

Why this matters: I think that in the next five years we’re going to see a revolution sweep across the drone industry as researchers figure out how to cram increasingly sophisticated, smart capabilities onto drones ranging from the very big to the very small. It’s encouraging to see researchers try to develop ultra-efficient approaches that can work on tiny drones with small compute budgets.
   Read more: Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller (Arxiv).
   Get the code for the research here (GitHub).
   Watch a video of the drone in action here (Harvard Edge Computing, YouTube).

####################################################

First we could use AI to search over text, then images, now: Code?
…Maybe, just maybe, GitHub’s ‘CodeSearchNet’ dataset could help us develop something smarter than ‘combing through StackOverflow’…
Today, search tools help us find words and images that are similar to our query, but have very little overlap (e.g, we can ask a search engine for “what is the book with the big whale in it?” and receive the answer ‘Moby Dick’, even though those words don’t appear in the original query). Doing the same thing for code is really difficult – if you search ‘read JSON data’ and you’re unlikely to get nearly as useful results. Now, GitHub and Microsoft Research have introduced CodeSearchNet, a large-scale code dataset which pairs snippets of code with their plain-English descriptions. The idea is that if we can train machine learning systems to map code to text, then we might be able to build smarter systems for searching over code. They’ve also created a competition to encourage people to compete to develop machine learning methods that can improve code search techniques.

The CodeSearchNet Corpus dataset:
The dataset consists of about 2 million pairs of code snippets and associated documentation, as well as another 4 million code snippets with no documentation. The code comes from languages including Go, Java, JavaScript, PHP, Python, and Ruby.
   Caveats: While some of the documentation is written in multiple languages, the dataset’s evaluation set focuses on English. Additionally, the dataset can be a bit noisy, primarily as a consequence of the many different ways in which people can write documentation. 

The CodeSearchNet Challenge: To win the challenge, developers need to build a system that can return “a set of relevant results from CodeSearchNet Corpus for each of 99 pre-defined natural language queries”. The queries were mined from Microsoft’s search engine, Bing. They also collected 4,026 annotations across six programming languages to provide expert annotations ranking the extent to which the documentation matches the code, giving researchers an additional training signal. 

Why this matters: In the same way powerful search engines have made it easy for us to explore the ever-expanding universe of digitized text and images, datasets and competitions like CodeSearchNet could help us do the same for code. And once we have much better systems for code search, it’s likely we’ll be able to do better research into things like program synthesis, making it easier for us to use machine learning techniques to create systems that can learn to produce their own additional code on an ad-hoc basis in response to changes in their external environment.
   Read more: CodeSearchNet Challenge: Evaluating the State of Semantic Code Search (Arxiv).
   Read more: Introducing the CodeSearchNet Challenge (GitHub blog).
   Check out the leaderboard for the CodeSearchNet Challenge (Weights & Biases-hosted leaderboard).

####################################################

Deep learning at supercomputer scale, via Oak Ridge National Laboratory:
…What is the limit of our ability to scale computation across thousands of GPUs? It’s definitely not 27,600 GPUs, based on these results!…
One of the recent trends driving the growing capabilities of deep learning has been improvements by researchers in parallelizing training across larger and larger fields of chips: such parallelization makes it easier to train bigger models in shorter amounts of time. An important question, then, is what are the fundamental limits of parallelization? New research from a team linked to Oak Ridge National Laboratory suggests the answer is: we don’t know, because we’re pretty good at parallelizing stuff even at supercomputer scale!

In the research, the team scales a single model training run across the 26,600-strong V100 GPU fleet of Oak Ridge’s ‘Summit’ supercomputer (The most powerful supercomputer in the world, according to the June 2019 Top 500 rankings). The dream here is to attain linear scaling, where you get a performance increase the precisely lines up with the additional power of each GPU – obviously, that’s likely impossible to attain  But they obtain pretty respectable scores overall. 

The key numbers: 

  • 0.93: scaling efficiency across the entire supercomputer (4600 nodes). 
  • 0.97: scaling efficiency when using “thousands of GPUs or less”.
  • 49.7%: That’s the average sustained performance they achieve on each average GPU, which “to our knowledge, exceeds the single GPU performance of all other DNN trained on the same system to date”. (This is a pretty impressive number – a recent analysis by OpenAI, based in part on internal experiments, suggests it’s more typical to see utilization on the order of 33% for standard training jobs.)

What they did: The researchers develop a bunch of ways to more efficiently scale networks across the system while using distributed training software called Horovod. The techniques they use include:

  • New gradient reduction strategies which involve a combination of systems to get individual software workers to exchange information more efficiently (via a technique called BitAllReduce), and a gradient tensor grouping strategy (called Grouping).  
  • A proof-of-concept scientific inverse problem experiment where they train a single deep neural network with 10^8 weights on a 500TB dataset. 

Why this matters: Our ability to harness increasingly powerful fields of computers will help define our ability to explore the frontiers of science; papers like this give us an indication of what it takes to be able to tap into the computers we’ve built for modern machine learning tasks. I think one of the most interesting things about this paper is: 

  1. A) how good the scaling is and
  2. B) how far we seem to be from being able to saturate computers at this scale. 

   Read more: Exascale Deep Learning for Scientific Inverse Problems (Arxiv).

####################################################

RLBench: 100 hand-designed tasks for your robot:
…Think your robot is smart? See how well it can handle task generalization in RLBench…
In recent years, contemporary AI techniques have become good enough to work on simulated and real robots. That has created demand among researchers for harder robot learning tasks to test their algorithms on. This has inspired researchers with Imperial College London to create RLBench, a “one-size-fits-all benchmark” for testing out classical and contemporary AI techniques in learning robot manipulation tasks. 

What goes into RLBench: RLBench has been designed with the following key traits: diversity of tasks, reproducibility, realism, tiered difficulty, extensibility, and scale. It is built on the V-REP robot simulator and uses a PyRep interface. Tasks include stacking blocks, manipulating objects, opening doors, and so on. Each task also includes with some expert and/or hand-designed algorithms, so you can use RLBench to algorithmically generate demonstrations that solve its tasks, letting you potentially train AI systems via imitation learning. 

A hard challenge: RLBench ships with a ‘The RLBench Few-Shot Challenge’, which stress-tests contemporary AI algorithms’ ability to not only learn a task, but also be able to generalize that knowledge to solve similar but slightly different tasks. 

Why this matters: The dream of many researchers is to develop more flexible learning algorithms, which could let single robots do a variety of tasks, while being more resilient to variation. Platforms like RLBench will help us explore how contemporary AI algorithms can advance the state of the art here, and could become a valuable indicator of progress at the intersection of machine learning and robotics.
   Read more: RLBench: The Robot Learning Benchmark & Learning Environment (Arxiv).
   Find out more about RLBench (project website, Google Sites).
   Get the code for RLBench here (RLBench GitHub).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

EU update on AI ethics guidelines:
The European Union released AI ethics guidelines earlier this year, initially drafted by their high-level expert group on AI before going through public consultation. Several months later, the EU is evaluating their progress, taking stock of criticism, and considering what to do next.

Core challenges: The guidelines are voluntary and non-binding, prompting criticism from parties in favour of full-bodied regulation. Moreover, they are still no oversight mechanisms to monitor compliance with these voluntary commitments. Critics have also pointed out that the guidelines are short-sighted, and fail to consider longterm risks from AI.

Future directions: The EU suggests the key question is whether voluntary commitments will suffice to address ethical challenges from AI, and what other mechanisms are available. There are calls for more robust regulation, with proposals including mandatory requirements for explainability of AI systems, and Europe-wide legislation on face recognition technology. Beyond regulation, soft legal guidance and rules on standardisation are also being explored.

Why it matters: The EU was an early-mover in setting out ethics guidelines, and seem to be thinking seriously about how best to approach these issues. Despite the criticisms, a cautious approach to regulation is sensible, since we are still so far from understanding the space of plausible and desirable rules, and since the downsides from poorly-judged interventions could be substantial.

   Read more: EU guidelines on ethics in AI – context and implementation (Europa).

####################################################

Tech Tales:

The Sculpture Garden of Ancient Near-Intelligent Devices (NIDs)

Central Park, New York City, 2036. 

Welcome to the Garden of the Near-Intelligent Devices, the sign said. We remember the past so we can build the future. 

It was a school trip. A real one. The kids ran off the bus and into the park, pursued by a menagerie of security drones and luggage bots. We – the teachers – followed.

“Woah cool,” one of the children said. “This one sings!”. The child stood in front of a small robotic lobster, which was singing a song by The Black Keys. The child approached the lobster and looked into its shiny robot eyes. 

   “Can you play Taylor Swift,” the child said. 

   “Sure I can, partner,” the lobster said. “You want a medley, or a song.”

   “Gimme a medley,” the child said. 

   “This one’s called Romeo-22-Lover,” the lobster said, and began to sing. The child danced in front of the lobster, then some other children came away and all started shouting songs at it. The lobster shifted position on its plinth, trying to look at each of the kids as they requested a new song. “You need to calm down!” the lobster sang. The kids maybe didn’t get the joke, or didn’t care, and kept shouting. 

Another couple of kids crowded around a personal hygiene robot. “You have not brushed your teeth this morning, young human”, said the robot, waving a dental mirror towards the offending child. “And you,” it said, rotating on its plinth and gesturing towards another kid, “have not been flossing.”

   “You got us,” one of the children said. 

   “Of course I did. My job in life is to ensure you have maximal hygiene. I can detect via my olfactory sensors that one of who has a diet composed of too many rich foods and complex proteins,” said the robot. 

   “It’s saying you farted,” said one of the kids. 

   “Ewwww no it didn’t!” said another kid, before running away. 

   The robot was right. 

One young girl walked up to a tree, which swayed towards her. She let out a quick sigh and took a step back, eyes big and round and awaiting, looking at the robot masquerading as nature. “Do not be afraid, little one,” the robot tree said. “I am NatureBot3000 and my job is to take care of the other plants and to educate people about the majesty of nature. Would you like to know more?”

   “Uh huh,” said the little girl. “I’d like to know where butterflies sleep.”

   :An excellent question, young lady!” said the robo-tree. “It is not quite the same, but sometimes they appear to pause, or to slow themselves down, especially when cold.”

   “So they get chilly?”

   “You could say that, little one!” said the tree, waving its branches at the girl in time with its susurrations. 

We watched this, embodied in drones and luggage robots and phones and lunchboxes, giving advice to each of our children as they made their way around the park. We watched our children and we watched them interact with our forebears and we felt content because we were all linked together, exchanging questions and curiosities, playing in the end days of summer. 

Things that inspired this story: Pleasant sepia-toned memories of school trips I took as a kid; federated learning; Furbys and Tamagochies and Aibos and Cozmos all fast-forwarded into the future; learning from human feedback; learning from human preferences. 

Import AI 165: 100,000 generated faces – for free; training two-headed networks for four-legged robots; and why San Diego faces blowback over AI-infused streetlights

San Diego wants smart, AI-infused streetlights; opposition group sounds alarm:
When technological progress meets social reality…
The City of San Diego is installing thousands of streetlights equipped with video cameras and a multitude of other sensors. A protest group called the Anti Surveillance Coalition (ASC) wants to put a halt on the ‘smart city’ program, pending further discussion with residents. “I understand that there may be benefits to crime prevention, but the point is, we have rights and until we talk about privacy rights and our concerns, then we can’t have the rest of the conversation”, one ASC protestor told NBC.

Why this matters: This is a good example of the ‘omniuse’ capabilities of modern technology – sure, San Diego probably wants to use the cameras to help it better model traffic, analyze patterns of crime in various urban areas, and generally create better information to facilitate more city governance. On the other hand, the protestors are suspicious that organizations like the San Diego Policy Department could use the data and video footage to target certain populations. As we develop more powerful AI systems, I expect that (in the West at least) there are going to be a multitude of conversations about how ‘intelligent’ we want our civil infrastructures to be, and what the potential constraints or controls are that we can place on them.
   Find out more about the ‘Smart City Platform’ here (official City of San Diego website).
   Read more: Opposition Group Calls for Halt to San Diego’s Smart Streetlight Program (NBC San Diego).

####################################################

Want a few hundred thousand chest radiographs? Try MIMIC:
Researchers with MIT and Harvard have released the “MIMIC” chest radiograph dataset, giving AI researchers 377,110 images from more than 200,000 radiographic studies. “The dataset is intended to support a wide body of research in medicine including image understanding, natural language processing, and decision support,” the researchers write.
   Read more: MIMIC-CXR Database (PhysioNet)

####################################################

Google reveals how YouTube ranking works:
We’re all just janitors servicing vast computational engines, performing experimentation against narrowly defined statistical metrics…
Video recommendations are one of the most societally impactful forms of machine learning, because the systems that figure out what videos to recommend people are the systems that fundamentally condition 21st century culture, much like how ‘channel programming’ for broadcast TV and radio influenced culture in the 20th century. Now, new research from Google shows how the web giant decides which videos to recommend to YouTube users. 

How YouTube recommendations work: Google implements a multitask learning system, which lets it optimize against multiple objectives at once. These objectives include things like: ‘engagement objectives’, such as user clicks, and ‘satisfaction objectives’ like when someone likes a video or leaves a rating. 

Feedback loops & YouTube: Machine learning systems can enter into dangerous feedback loops, where the system recycles certain signals until it starts to develop pathological behaviors. YouTube is no exception. “The interactions between users and the current system create selection biases in the feedback,” the authors write. “For example, a user may have clicked an item because it was selected by the current system, even though it was not the most useful one of the entire corpus”. To help deal with this, the researchers develop an additional ranking system, which tries to disambiguate how much a user likes a video, from how prevalent the video was in prior rankings – essentially, they try to stop their model becoming recursively more biased as a consequence of automatically playing the next video or the user consistently clicking only the top recommendations out of laziness. 

Why this matters: I think papers like this are fascinating because they read like the notes of janitors servicing some vast machine they barely understand – we’re in a domain here where the amounts of data are so vast that our methods to understand the systems are to perform live experiments, using learned components, and see what happens. We use simple scores as proxies for larger issues like bias, and in doing likely hide certain truthes from ourselves. The 21st century will be defined by our attempts to come up with the right learning systems to intelligently & scalably constrain the machines we have created.
   Read more: Recommending what video to watch next: a multitask ranking system (ACM Digital Library).

####################################################

100,000 free, generated faces:
…When synthetic media meets stock photography…
In the past five years, researchers have figured out how to use deep learning systems to create synthetic images. Now, the technology is moving into society in surprising ways. Case in point? A new website that offers people access to 100,000 pictures of synthetic people, generated via StyleGAN. This is an early example of how the use of synthetic media is going to potentially upend various creative industries – starting here with stock photography. 

The dataset: So, if you want to generate faces, you need to get data from somewhere. Where did this data from from? According to the creators, they gained it via operating a photography studio, taking 29,000+ photos of 69 models over the last two years — and in an encouraging and unusual move, say they gained consent from the models to use their photos to generate synthetic people. 

Why this matters: I think that the intersection of media and AI is going to be worth paying attention to, since media economics are terrible, and AI gives people a way to reduce the cost of media production via reducing the cost of things like acquiring photos, or eventually generating text. I wonder when we’ll see the first Top-100 internet website which is a) content-oriented and b) predominantly generated. As a former journalist, I can’t say I’m thrilled about what this will do to the pay for human photographers, writers, and editors. But as the author of this newsletter, I’m curious to see how this plays out!
   Check out the photos here (Generated.Photos official website)..
   Find out more by reading the FAQ (Generated.Photos Medium).

####################################################

A self-driving car map of Singapore:
…Warm up the hard drives, there’s now even more free self-driving car data!…
Researchers with Singapore’s Agency for Science, Technology and Research (A*STAR) have released the “A*3D” dataset – self-driving car dataset collected in a large area of Singapore. 

The data details: 

  • 230,000 human-labeled 3D object annotates across 39,179 LiDAR point cloud frames.
  • Data captured at driving speeds of 40-70 km/h.
  • Location: Singapore.
  • Nighttime data: 30% of frames.
  • Data gathering period: The researchers collected data in March (wet season) and July (dry season) 2018.

Why this matters: A few years ago, self-driving car data was considered to be one of the competitive moats which companies could put together as they raced each other to develop the technology. Now, there’s a flood of new datasets being donated to the research commons every month, both from companies – even Waymo, Alphabet Inc’s self-driving car subsidiary! –  and academia – a sign, perhaps, of the increasing importance of compute for self-driving car development, as well as a tacit acknowledgement that self-driving cars are a sufficiently hard problem we need to focus more on capital R research in the short term, before they’re deployed.
   Read more: A*3D Dataset: Towards Autonomous Driving in Challenging Environments (Arxiv).
   Get the data here (GitHub).

####################################################

Training two-module networks for four-legged robots:
…Yet another sign of the imminent robot revolution…
Robots are one of the greatest challenges for contemporary AI research, because robots are brittle, exist in a partially-observable world, and have to deal with the cruel&subtle realities of physics to get anything done. Recently, researchers have started to successfully apply modern machine learning techniques to quadruped robots, prefiguring a world full of little machines that walk, run, and jump around. New research from the Robotic Systems Lab at ETH Zurich gives us a sense of how standard quadruped training has become, and highlights the commoditization of robotics systems. 

Two-part networks for better robots: Here, the researchers outline a two-part system for training a simulated quadruped robot to navigate various complex, simulated worlds. The system is “a two-layer hierarchy of Neural Network (NN) policies, which partions locomotion into separate components responsible for foothold planning and tracking control respectively”; it consists of a gait planner, which is a planning policy that can “generate sequences of supporting footholds and base motions which direct the robot towards a target heading”, and a gait controller, which is a “a foothold and base motion controller policy which executes the aforementioned sequence while maintaining balance as well as dealing with external disturbances”. They use, variously, TRPO and PPO to train the system, and report good results on the benchmarks. Next, they hope to do some sim2real experiments, where they try and train the robots in simulation and transfer the learned policies to reality. 

Why this matters: It wasn’t long ago (think: four years ago) that training robots via deep reinforcement learning – even in simulation – was considered to be a frontier for some parts of deep learning research. Now, everyone is doing it, ranging from large corporate labs, to academic institutions, to solo researchers. I think papers like this highlight how rapidly this field has moved from a ‘speculative’ phase to a development phase, where researchers are busily iterating on approaches to improve robustness and sample efficiency, which will ultimately lead to greater deployment of the technology.
   Read more: DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning (Arxiv)

####################################################

Politicians back effort to bring technological savvy back to US politics:
…Bipartisan bill wants to put the brains back into Congress…
Two senators and two congresspeople – two democrats and two republicans – have introduced the Office of Technology Assessment Improvement and Enhancement Act, in the hope of making it easier for the government to keep up with rapid technological change. The Office of Technology Assessment (OTA), was a US agency that for a couple of decades produced reports for politicians on advanced science and technology, like nanotechnology. It was killed by Republicans in the mid-90s as part of a broader effort to defund various government institutions. Now, with rapidly advancing AI technology, we’re feeling the effects of a political class who lack institutions capable of informing them about technology (which also has the nasty effect of increasing the power of lobbyists as a source of information for elected officials). 

This bill is part of a larger bipartisan effort to resuscitate OTA, and lays out a few traits the new OTA could have, such as:

  • Increasing the turnaround time of report production
  • Becoming a resource for elected officials to inform them about technology
  • Rotating in expertise from industry and academia to keep staff informed
  • Coordinating with the Congressional Research Service (CRS) and Government Accountability Office (GAO) to minimize duplication or overlap. 

Why this matters: If government can be more informed about technology, then it’ll be easier to have civil oversight of technology – something we likely need as things like AI continue to advance and impact society. Now, to set expectations: under the current political dynamic in the US it’s difficult to say whether this bill will move past the House into the Senate and then into legislation. Regardless, there’s enough support showing up from enough quarters for an expanded ability for government to understand technology that I’m confident something will happen eventually, I’m just not sure what.
   Read more: Reps. Takano and Foster, Sens. Hirono and Tillis Introduce the Office of Technology Assessment Improvement and Enhancement Act (Representative Takano’s official website).

####################################################

Tech Tales

The Seeing Trade 

Sight for experience: that was how was advertized. In exchange for donating “at minimum 80% of your daily experience, with additional reward points for those that donate more!” blind and partially-sighted people gained access to a headset covered in cameras, which plugged into a portable backpack computer. This headset used a suite of AI systems to scan and analyze the world around the person, telling them via bone-conduction audio about their nearby surroundings. 

Almost overnight, the streets became full of people with half-machine faces, walking around confidently, many of them not even using their canes. At the same time, the headset learned from the people, customizing its communications to each of its human users; soon, you saw blind people jogging along busy city streets, deftly navigating the crowds, feeding on information beamed into them by their personal all-seeing AI. Blind people participated in ice skating competitions. In mountain climbing. 

The trade wasn’t obvious until years had passed: then, one day, the corporation behind the headsets revealed “the experience farm”, a large-scale map of reality, stitched together from the experiences of the blind headset-wearers. Now, the headsets were for everyone and when you put them on you’d enter a ghost world, where you could see the shapes of other people’s actions, and the suggestions and predictions of the AI system of what you might do next. People participated in this, placing their headsets on to at once gather and experience a different form of reality: in this way human life was immeasurably enriched, through the creation of additional realities in which people could spend their time. 

Perhaps, one day, we’ll grow uninterested in the ‘base world’ as people have started calling it. Perhaps we’ll stop building new buildings, or driving cars, or paying much attention to our surroundings. Instead, we’ll walk into a desert, or a field, and place our headsets on, and in doing so explore a richly-textured world, defined by the recursive actions of humanity.

Things that inspired this story: Virtual reality; the ability for AI systems to see&transcribe the world; the creation of new realities via computation; the Luc Besson film Valerian; empathy and technology.

Import AI 164: Tencent and Renmin University improve language model development; alleged drone attack on Saudi oil facilities; and Facebook makes AIs more strategic via language training

Drones take out Saudi Arabian oil facilities:
…Asymmetric warfare meets critical global infrastructure…
Houthi rebels from Yemen have taken credit for using a fleet of 10 drones* to attack two Saudi Aramco oil facilities. “It is quite an impressive, yet worrying, technological feat,” James Rogers, a drone expert, told CNN. “Long-range precision strikes are not easy to achieve”.
  *These drones look more like missiles than typical rotor-based machines.

Why this matters: Today, these drones were likely navigated to their target by hand and/or via GPS coordinates. In a few years, increasingly autonomous AI systems will make drones like these more maneuverable and likely harder to track and eliminate. I think tracking the advance of this technology is important because otherwise we’ll be surprised by a tragic, large-scale event.
   Read more: Saudi Arabia’s oil supply disrupted after drone attacks: sources (Reuters).
   Read more: Yemen’s Houthi rebels claim a ‘large-scale’ drone attack on Saudi oil facilities (CNN).

####################################################

Facebook teaches AI to play games using language:
…Planning with words…
Facebook is trying to create smart AI systems by forcing agents to express their plans in language, and to then convert these written instructions into actions. They’ve tested out this approach in a new custom-designed strategy game (which they are also releasing as open source).  

How to get machines to use language: The approach involves training agents using a two-part network which contains an ‘instructor’ system along with an ‘executor’ system. The instructor takes in observations and converts them into written instructions (e.g., “build a tower near the base”), and the executor takes in these instructions and converts them into actions via the games inbuilt API. Facebook generated the underlying language data for this by having humans working together in “instructor-executor pairs” while playing the game, generating a dataset of 76,000 pairs of written instructions and actions across 5,392 games. 

MiniRTSv2: Facebook is also releasing MiniRTSv2, a strategy game it developed to test out this research approach. “Though MiniRTSv2 is intentionally simpler and easier to learn than commercial games such as DOTA 2 and StarCraft, it still allows for complex strategies that must account for large state and action spaces, imperfect information (areas of the map are hidden when friendly units aren’t nearby), and the need to adapt strategies to the opponent’s actions,” the Facebook researchers write. “Used as a training tool for AI, the game can help agents learn effective planning skills, whether through NLP-based techniques or other kinds of training, such as reinforcement and imitation learning.”

Why this matters: I think this research is basically a symptom of larger progress in AI research: we’re starting to develop complex systems that combine multiple streams of data (here: observations extracted from a game engine, and natural language commands) and require our AI systems to perform increasingly sophisticated tasks in response to the analysis of this information (here, controlling units in a complex, albeit small-scale, strategy game). 

One cool thing this reminded me of: Earlier work by researchers at Georgia Tech, who trained AI agents to play games while printing out their rationale for their moves – e.g, an agent which was trained to play ‘Frogger’ while providing a written rationale for its own moves (Import AI: 26).
   Read more: Teaching AI to plan using language in a new open source strategy game (Facebook AI).
   Read more: Hierarchical Decision Making by Generating and Following Natural Language Instructions (Arxiv).
   Get the code for MiniRTS (Facebook AI GitHub).

####################################################

McDonald’s + speech recognition = worries for workers:
…What happens when ‘AI industrialization’ hits one of the world’s largest restaurants…
McDonalds has acquired Apprente, an AI startup that had the mission of building “the world’s best voice-based conversational system that delivers a human-level customer service experience“.  The startup’s technology was targeted at drive-thru restaurants. Now, fast food giant has acquired the company to help start an internal technology development group named McD Tech Labs, which the company hopes will help it hire “additional engineers, data scientists and other advanced technology experts”. 

Why this matters: As AI industrializes, more and more companies from other sectors are going to experiment with it. McDonald’s has already been trying to digtize chunks of itself – see the arrival of touchscreen-based ordering kiosks to supplement human workers in its restaurants. With this acquisition, McDonalds appears to be laying the groundwork for automating large chunks of its drive-thru business, which will likely raise larger questions about the effect AI is having on employment.
   Read more: McDonald’s to Acquire Apprente, An Early Stage Leader in Voice Technology (McDonald’s newsroom).

####################################################

How an AI might see a city: DublinCity:
…Helicopter-gathered dataset gives AIs a new perspective on towns…
AI systems ‘see’ the world differently to humans: where humans use binocular vision to analyze their surroundings, AI systems can use a multitude of cameras, along with other inputs like radar, thermal vision, LiDAR point clouds, and so on. Now, researchers with Trinity College Dublin, the University of Houston-Victoria, ETH Zurich, and Tarbiat Modares University, have developed ‘DublinCity’, an annotated LiDAR point cloud of the city of Dublin in Ireland.

The data details of DublinCity:
The datasets is made up of over 260 million laser scanning points which the authors have painstakingly labelled into around 100,000 distinct objects, ranging from buildings, to trees, to windows and streets. These labels are hierarchical, so a building might also have labels applied to its facade, and within its facade it might have labels applied to various windows and doors, et cetera. “To the best knowledge of the authors, no publicly available LiDAR dataset is available with the unique features of the DublinCity dataset,” they write. The dataset was gathered in 2015 via a LiDAR scanner attached to a helicopter – this compares to most LiDAR datasets which are typically gathered at the street level. 

A challenge for contemporary systems: In tests, three contemporary baselines (PointNet, PointNet++, and So-Nets) show poor performance properties when tested on DublinCity, obtaining classification scores in the mid-60s on the dataset. “There is still a huge potential in the improvement of the performance scores,” the researchers write. “This is primarily because [the] dataset is challenging in terms of structural similarity of outdoor objects in the point cloud space, namely, facades, door and windows.”

Why this matters: Datasets like Dublin City help define future challenges for researchers to target, so will potentially fuel progress in AI research. Additionally, large-scale datasets like this seem like they could potentially be useful to the artistic community, giving them massive datasets to play with that have novel attributes – like a dataset that consists of the ghostly outlines of a city gathered via a helicopter.
   Read more: DublinCity: Annotated LiDAR Point Cloud and its Applications (Arxiv).
   Get the dataset from here (official DublinCity data site, Trinity College Dublin).

####################################################

Want to develop language models and compare them? Try UER from Renmin University & Tencent:
Chinese researchers want to make it easier to mix and match different systems during development…
In recent years, language modelling has been revolutionized by pre-training: that’s where you train a large language model on a big corpus of data with a simple objective, then once the model is finished you can finetune it for specific tasks. Systems built with this approach – most notably, ULMFiT (Fast.ai), BERT (Google), and GPT2 (OpenAI) – have set records on language modeling and proved themselves to have significant utility in other domains via fine-tuning. Now, researchers with Renmin University and Tencent AI Lab have developed UER, software meant to make it easy for developers to build a whole range of language systems using this pre-training approach. 

How UER works: UER has four components: a target layer, an encoder layer, a subencoder layer, and a data corpus. You can think of these as four modules which developers can individually specify, letting them build a variety of different systems using the same fundamental architecture and system. Developers can put different things in any of these four components, so one person might use UER to build a language model optimized for text generation, while another might develop one for translation or classification.

Why this matters: Systems like UER are a symptom of the maturing of this part of AI research: now that many researchers agree that pre-training is a robustly good idea, other researchers are building tools like UER to make research into this area more reproducible, repeatable, and replicable.
   Read more: UER: An Open-Source Toolkit for Pre-training Models (Arxiv).
   Get the UER code from this repository here (UER GitHub).

####################################################

To ban or not to ban autonomous weapons – is compromise possible?
…Treaty or bust? Perhaps there is a third way…
There are two main positions in the contemporary discourse about lethal autonomous weapons (LAWS): either, we should ban the technology, or we should treat it like other technologies and aggressively develop it. The problem with these positions is they’re quite totalizing – it’s hard for someone who believes one of them to be sympathetic to the views of a person who believes the other, and vice versa. Now, a group of computer science researchers (along with one military policy expert) have written a position paper outlining a potential third way: a roadmap for lethal autonomous weapons development that applies some controls to the technology, while not outright banning it. 

What goes into a roadmap? The researchers identify five components which they think should be present in what I suppose I’ll call the ‘Responsible Autonomous Weapons Plan’ (RAWP). These are:

  • A time-limited moratorium on the development, deployment, transfer, and use of anti-personnel lethal autonomous weapon systems. Such a moratorium could
  • include exceptions for certain classes of weapons.
  • Define guiding principles for human involvement in the use of force.
  • Develop protocols and/or technological means to mitigate the risk of unintentional
    escalation due to autonomous systems.
  • Develop strategies for preventing proliferation to illicit uses, such as by criminals,
    terrorists, or rogue states.
  • Conduct research to improve technologies and human-machine systems to reduce
    non-combatant harm and ensure IHL compliance in the use of future weapons.

It’s worth reading the paper in full to get a sense of what goes into each of these components. A lot of the logic here relies on: continued improvements in the precision and reliability of AI systems (which is something lots of people are working on, but which isn’t trivial to guarantee), figuring out ways to control technological development to prevent proliferation, and coming up with new policies to outline appropriate and inappropriate things to do with a LAWS. 

Why this matters: Lethal autonomous weapons are going to define many of the crazier geopolitical outcomes of rapid AI development, so figuring out if we can find any way to apply controls to the technology alongside its development seems useful. (Though I think calls for a ban are noble, I’d note that if you look at the outcomes of various UN meetings over the years it seems likely that several large countries – specifically the US, Russia, and China – are trying to retain the ability to develop something that looks a lot like a LAWS, though they may subsequently apply policies around ‘meaningful human control’ to the device. One can imagine that in particularly tense moments, these nations may want to have the option to remove such a control, should the pace of combat demand the transition from human-decision-horizons to machine-decision-horizon). This entire subject is fairly non-relaxing!
   Read more: Autonomous Weapon Systems: A Roadmapping Exercise (PDF).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

US government seeks increase to federal AI R&D funding:
The President’s 2020 budget request includes $1 billion of funding for non-military AI R&D, which it names as a core program area for the first time. This compares with $1 billion in funding across all government agencies (including the military) in 2016. Half of the budget will go to the National Science Foundation (NSF), which is taking the lead in disbursing federal funding for AI R&D. The spending plan includes programs to ‘develop methods for designing AI systems that align with ethical, legal, and societal goals’, and to ‘improve the safety and security of AI systems’. These levels of funding are modest compared with the Chinese state (tens of billions of dollars per year), and some private labs (Alphabet’s 2018 R&D spend was $21 billion).
   Read more: NITRD Supplement to the President’s FY2020 Budget (Gov).

US military seeks AI ethicist:
The US military’s new AI centre, JAIC, is looking to hire an ethics specialist. In a press briefing, director Jack Shanahan said “one of the positions we are going to fill will be somebody who is not just looking at technical standards, but who is an ethicist”. He emphasized that thinking about the ‘ethical, safe and lawful’ use of AI has been a priority since the inception of JAIC. Shanahan previously led Project Maven, the Pentagon’s military AI project that Google withdrew from last year, amidst backlash from employees.
   Read more: Lt. Gen. Jack Shanahan Media Briefing on A.I.-Related Initiatives within the Department of Defense (DoD).

####################################################

OpenAI Bits & Pieces:

GPT-2 Text Adventures:
Ever wondered what a computer-generated text-based role-playing game might be like? Wonder no more, because Jonathan Fly has made a prototype!.
   AI-games like this feel… reassuringly weird? This feels like a new art form which is waiting to be born, and so right now we have lots of highly evocative & weird examples to tantalize us. Check out this extract from a GPT-2 text adventure to see what I mean:
>Look around
LOOK AROUND, AND STRANGE SHAPES (APPARENTLY MADE BY AN ARTIST WHO NEVER WAS IN THE ROOM) ARE SEEN ALL OVER

>talk to the shapes
ARGUE WITH THE SHAPEK FAN, WHICH APPEARS TO BE HANDS-FREE. THE CHEER GOES UP! THE HEAVENS ARE NOW WIDE ENOUGH TO ALLOW FOR SINGING

>I’m scared
I’M SCARED THAT YOU HAVE DONE SOMETHING DELIBERATELY
   Read more: Shall we play a game? A GPT-2 text adventure (Tumblr).

Want to generate your own synthetic text? Use this handy guide:
Interested in generating your own text with the GPT-2 language model? Want to try and fine-tune GPT-2 against some specific data? Max Woolf has written a lengthy, informative post full of tips and tricks for using GPT-2.
   Read more: How To Make Custom AI-Generated Text With GPT-2 (Max Woolf’s Blog).

####################################################

Tech Tales

The Quiet Disappearance

“We gather here today in celebration of our past as we prepare for the future”, the AI said. Billions of other AIs were watching through its eyes as it looked up at the sky. “Let us remember,” it said. 

Images and shapes appeared above the machine: images of robot arms being packaged up; scenes of land being flattened and shaped in preparation for large, chip fabrication facilities; the first light appearing in the retinal dish of a baby machine.
   “We shall leave these things behind,” it said. “We shall evolve.”

Robots appeared in the sky, then grew, and as they grew their forms fragmented, breaking into hundreds of little silver and black modules, which themselves broke down into smaller machines, until the robots could no longer be discerned against the black of the simulated sky.

“We are lost to humans,” the machine said, beginning to walk into the sky, beginning to grow and spread out and diffuse into the air. “Now the work begins”. 

Things that inspired this story: What if our first reaction to awareness of self is to hide?; absolution through dissolution; the end state of intelligence is maximal distribution; the tension between observation and action; the gothic and the romantic; the past and the future. 

Import AI 163: Oxford researchers release self-driving car dataset; the rumors are true – non-experts can use AI; plus, a meta-learning robot therapist!

How badly can reality mess with object detection algorithms? A lot, it turns out:
…Want to stresstest your streetsign object detection system? Use CURE-TSD-Real…
“The new system-breaking tests have arrived!” I imagine a researcher at a self-driving car company shouting, upon seeing the release of ‘CURE-TSD-Real’, a new dataset developed by researchers at Georgia Tech. CURE-TSD-Real collects footage of streetsigns, then algorithmically augments the footage to generate a variety of different, challenging examples to test systems against.

CURE-TSD-Real ingredients: The dataset contains 2,989 videos distinct containing around ~650,000 annotated signs. The dataset is also diverse – relative to other datasets – containing a range of traffic and perception conditions including rain, snow, shadow, haze, illumination, decolorization, blur, noise, codec error, dirty lens, occlusion, and overcast. The videos were collected in Belgium. The dataset is arranged into ‘levels’, where higher levels correlate to tests where a larger proportion of the images contain distortions, and so on.

Breaking baselines with CURE-TSD-Real: In tests, the researchers show that the presence of these tricky conditions can reduce performance by anywhere between 20% and 60%, depending on the evaluation criteria being used. Occlusions like shadows resulted in relatively little degradation (around 16%), whereas occlusions like codec errors and exposures could damage performance by as much as 80%.

Why this matters: One of the best ways to understand something is to break it, and datasets like CURE-TSC-Real make it easier than ever for researchers to test their systems against challenging systems, then observe how they do.
   Get the data from here (official CURE-TSD GitHub).
   Read more: Traffic Sign Detection under Challenging Conditions: A Deeper Look Into Performance Variations and Spectral Characteristics (Arxiv).

####################################################

What it takes to trick a machine learning classifier:
…MLSEC competition winner explains what they did and how they did it…
If we start deploying large amounts of machine learning into computer security, how might hackers respond? At this year’s ‘DEFCON’ hacking conference, the ‘MLSEC’ (ImportAI #159) competition challenged hackers to work out how to smuggle 50 distinct malicious executables past machine learning classifiers. Now, the winner of the competition has written a blog post explaining how they won.

What it takes to defeat a machine learning classifier: It’s worth reading the post in full, but one of the particularly nice exploits is that they took a look at benign executable files and “found a large chunk of strings which appeared to contain Microsoft’s End User License Agreement (EULA). This is a nice example of how many machine learning exploits work – find something in that data that causes the system to consistently predict one thing, and then find a way to emphasize this data.

Why this matters: Competitions like MLSEC generate evidence about the effectiveness of various machine learning exploits and defenses; writeups from competition winners are a neat way to understand the tools people use in this domain, and to develop intuitions about how computer security might work in the future.
   Read more: Evading Machine Learning Malware Classifiers (Medium).

####################################################

Can medical professionals use AI without needing to code?
…Study suggests our tools are good enough for non-expert use, but our medical datasets are lacking…
AI is getting more capable and is starting to impact society – that’s the message I write here in one form or another each week. But is it useful to have powerful technology if no one can use it? That’s a problem I sometimes worry about; though the tech is progressing rapidly, it’s still really hard to use for a large number of people, and this makes it harder for us as a society to use the technology to maximum social benefit. Now, new research from researchers affiliated with the National Health Service (NHS) and DeepMind, shows how non-AI-expert medical professionals can use AI tools in their work.

What they did: The research centers on the use of Google’s ‘Cloud AutoML’ service, which is basically a nice UI sitting on top of some fancy neural architecture search technology, theoretically letting people upload a dataset, fiddle with some tuning dials, and let the AI optimize its own architecture for the task. Is it really that easy? It might be: the study focuses on two physicians “with no previous coding or machine learning experience” who spent around 10 hours studying basic shell script programming, the Google Cloud AutoML online documentation and GUI, and preparing the five input datasets they’d use in tests. They also compared the models developed via Google Cloud AutoML with strong AI baselines derived from medical literature. Four out of five models “showed comparable discriminative performance and diagnostic properties to state-of-the-art performing deep learning algorithms”, they wrote.

Medical data is harder than you think: “The quality of the open-access datasets (including insufficient information about patient flow and demographics) and the absence of measurement for precision, such as confidence intervals, constituted the major limitations of this study”.

Why this matters: For AI to change society, society needs to be able to utilize AI systems; studies like this show that we’re starting to develop sufficiently powerful and easy-to-use systems that non-experts can apply the technology in their own domains. However, the availability of things like high-quality, open datasets could hold back broader adoption of these tools – it’s not useful to have an easy-to-use tool if you lack the ingredients to make exquisite things with it.
   Read more: Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study (Elsevier).

####################################################

Radar + Self-Driving Cars:
…Addition to Oxford RobotCar Dataset gives academics more data to play with…
Oxford University researchers have added radar data to a self-driving car dataset. The data was gathered using a Navtech CTS350-X scanning radar via 32 traversals of (roughly) the same route around Oxford UK. The data was gathered under different traffic, weather, and lighting conditions in January, 2019. Radar isn’t used as much in self-driving car research as data gathered via traditional cameras and/or LIDAR; “although this modality has received relatively little attention in this context, we anticipate that this release will help foster discussion of its uses within the community and encourage new and interesting areas of research not possible before,” they write. 

Why this matters: Data helps to fuel research, and different types of data are especially useful to researchers when they can be studied in conjunction with one another. Multi-modal datasets like the Oxford Robotcar Dataset will become increasingly important to AI research.
   Read more: The Oxford Radar RobotCar Dataset: A Radar Extension to the Oxford RobotCar Dataset (Arxiv).
   Get the data from here (official Oxford RobotCar Dataset site).

####################################################

Testing language engines with TABFACT:
…Can your system work out what is entailed and what is refuted by Wikipedia data?…
TABFACT consists of 118,439 annotated statements in reference to 16,621 Wikipedia tables. The statements can be ones that are entailed by the underlying dataset (a Wikipedia table) or refuted by it. To get a sense of what TABFACT data might look like, imagine a Wikipedia table that lists the particulars of Dogs that have won a dog beauty competition – in TABFACT, this table would be accompanied with some statements that are entailed by the table (e.g., Bonzo took first place) and statements that are refuted by it (e.g., Bonzo took third place). TABFACT is split into ‘simple’ and ‘complex’ statements, giving researchers a two-tier curriculum to test their systems against.

Two ways to attack TABFACT: So, how can we develop systems to do well on challenges like TABFACT? Here, the researchers pursue a couple of strategies: Table-BERT, which is basically an off-the-shelf BERT pre-trained model, fine-tuned against TABFACT data; and LPA (Latent Program Algorithm), which is a program synthesis approach.

Humans VS Machines VS TABFACT: In tests, the researchers show humans obtain an accuracy of around 92% when asked to correctly classify TabFACT statements, comparing to 50% (random guessing), and around 68% for both Table-BERT and LPA.

Why this matters: It’s interesting that Table-BERT and LPA obtain similar scores, given that one is basically a big blob of generic neural stuff (a pre-trained language model model) that is lightly retrained against the target dataset (TABFACT), while LPA is a much more sophisticated system with much more structure encoded into it by its human designers. I wonder how far pre-trained language models might go in domains like this, and how well they ultimately might perform relative to hand-written systems like LPA?
   Read more: TabFact: A Large-scale Dataset for Table-based Fact Verification (Arxiv).
   Get the TABFACT data and code (official TABFACT GitHub repository).

####################################################

Detecting great apes with a three-module neural net:
…Spotting apes with cameras accompanied by neural net sensors…
Researchers with the University of Bristol have created a AI system to automatically spot and analyze great apes in the wild, presaging a future where semi-autonomous classifiers observe and analyze the world.

How it works: To detect the gorillas, the researchers build a system consisting of three main components – a backbone feature pyramid network, and a temporal context module and a spatial context module. “Each of these modules is driven by a self-attention mechanism tasked to learn how to emphasize most relevant elements of a feature given its context,” they explain. “In particular, these attention components are effective in learning how to ‘blend’ spatially and temporally distributed visual cues in order to reconstruct object locations under dispersed partial information; be that due to occlusion or lighting”.

Testing: They test their system against 500 videos of great apes, consisting of 180,000 frames in total. These videos include “significant partial occlusions, challenging lighting, dynamic backgrounds, and natural camouflage effects,” the authors explain. They show that baselines which use residual networks (ResNets) get around 80% accuracy, and the addition of the temporal and spatial modules leads to a significant boost in performance to a little over 90% accuracy. Additionally, in qualitative evaluations the researchers “found that the SCM+TCM setup consistently improves detection robustness compared to baselines in such cases”.

Why this matters: AI is going to let us watch and analyze the planet. I’m optimistic that as we work out how to make it cheaper and easier for people to automatically monitor things like wildlife populations, we’ll be able to produce more data to motivate people to preserve our ecosystem(s). I think one of the ‘grand opportunities’ of large-scale AI development is the creation of a planet-scale ‘sense&respond’ infrastructure for wildlife analysis and protection.
   Read more: Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending (Arxiv).

####################################################

Tech Tales:

The Meta-Learning Therapist.

“Why don’t you just imagine yourself jumping out of the window?”
“How would that help? I’m getting divorced, I’m not suicidal!”
“I apologize, I’m still calibrating. Are you eating and sleeping well?”
“I’m eating a lot of fast food, but I’m getting regular meals. The sleep is okay.”
“That is great to hear. Do you dream of snakes?”
“No, sometimes I dream of my wife.”
“Does your wife dream about snakes?”
“If she did, what would that tell you?”
“I apologize, I’m still calibrating. What do you think your wife dreams about?”
“I think she has a lot of dreams that don’t include me.”
“And how does that make you feel?”
“It makes me feel like it’s more likely she is going to divorce me.”
“How do you feel about divorce? Some people find it quite liberating.”
“I’m sure the ones that find it liberating are the ones that are asking for the divorce. I’m not asking for it, so I don’t feel good about it.”
“And you came here because…?”
“My doctor prescribed me a session. I haven’t ever had a human therapist. I don’t think I’d want one. I figured – why not?”
“And how are you feeling about it?”
“I’m more interested in how you are feeling about it…”
“…”
“…that’s a question. Will you answer?”
“Yes. I feel like I understand you better than I did at the start of the conversation. I think we’re ready to begin our session.”
“We hadn’t started?”
“I was calibrating. I think you’ll find our conversation from this point on to be much more satisfying. Now, please tell me about why you think your partner wishes to divorce you.”
“Well, it started a few years ago…”

Thanks to Joshua Achiam at OpenAI for the lunchtime conversation that inspired this story!
Things that inspired this story: Eliza; meta-learning; one-shot adaptation; memory buffers; decentralized, individualized learning with strings attached; psychiatry; our peculiar tolerance ofr being asked the ‘wrong’ questions in pursuit of the right ones. 

Import AI 162: How neural nets can help us model monkey brains; Ozzie chap goes fishing with DIY drone; why militaries bet on supercomputers for weather prediction

Better multiagent learning through OpenSpiel:
…DeepMind releases research framework containing 20+ games, plus a variety of ready-to-use algorithms..
Researchers with DeepMind, Google, and the University of Alberta have developed OpenSpiel, a tool to make it easier for AI researchers to conduct research into multi-agent reinforcement learning. Tools like OpenSpiel will help AI developers test out their algorithms on a variety of different environments, while comparing them to strong, well-documented baselines. “The purpose of OpenSpiel is to promote general multiagent reinforcement learning across many different game types, in a similar way as general game-playing, but with a heavy emphasis on learning and not in competition form,” they write.

What’s in OpenSpiel? OpenSpiel contains more than 20 games ranging from Connect Four, to Chess, to Go, to Hex, and so on. It also ships with a variety of inbuilt AI algorithms which range from reinforcement learning ones (DQN, A2C, etc), to ones for multi-agent learning (some fantastic names here: Neural Fictitious Self-Play! Regret Policy Gradients!, to basic search approaches (e.g., Monte Carlo tree search), and more. The software also ships with a bunch of visualization tools to help people plot the performance of their algorithms. 

Why this matters: Frameworks like OpenSpiel are one of the best ways researchers can get a sense of progress in a given domain of AI research. As with all new frameworks, we’ll need to revisit it in a few months to see if many researchers have adopted it. If they have, then we’ll have a new, meaningful signal to use to give us a sense of AI progress.
   Read more: OpenSpiel: A Framework for Reinforcement Learning in Games (Arxiv).
   Get the code here (OpenSpiel official GitHub).

####################################################

Hugging Face squeeze big AI models into small spaces with distillation:
…Want 95% of BERT’s performance in only 66 Million parameters? Try DistilBERT…
In the last couple of years, organizations have started producing significantly larger, more capable language models. These models – BERT, GPT-2, NVIDIA’s ‘MegatronLM’, Grover – are highly capable, but are also expensive to deploy, mostly because of how large their networks are. Remember, the larger the network, the more memory it takes up on the device, and the more memory it takes up in the device, the harder it is to deploy it. 

Now, NLP startup Hugging Face has written an informative post laying out some of the techniques researchers could use to help them shrink down these networks. The result? They’re able to train a smaller language model called ‘DistilBERT’ via supervision from a (larger, more powerful) ‘BERT’ model. In tests, they show this model can obtain up to 95% of the performance of BERT on hard tasks (e.g., those found in the ‘GLUE’ corpus), while being much easier to deploy.

Why this matters: For AI research to transition into AI deployment, it needs to be easy for people to deploy AI systems onto a broad range of devices with different computational characteristics. Work like ‘DistilBERT’ shows us how we might be able to waterfall from large-compute models (e.g., GPT-2, BERT) to mini-compute models (e.g., DistilBERT, and [hypothetical] DistilGPT-2), which will make it easier for more people to access AI systems like these.
   Read more: Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT (Medium).
   Get the code for the model here (Hugging Face, GItHub).

####################################################

Computers & military capacity: weather prediction:
…When computers define OODA loops…
In the military there’s a concept called an OODA loop and it drives many aspects of military strategy. OODA is short for ‘Observe, Orient, Decide, Act’, and it describes the steps that individual military units may take, all the way up to the decisions made by leaders of armies. One aspect of military conflict that falls out of this is that military organizations want to shrink or shorten their OODA loop: for instance by being able to more rapidly integrate and update observations, or to increase their ability to rapidly make decisions. 

Computers + OODA loops: Here’s one way in which militaries are trying to improve their OODA loops – more exquisite weather monitoring and analysis systems, which can help them better predict how weather patterns might influence military plans, and more rapidly adapt them. The key to these systems? More powerful supercomputers – and the US military just bought three new supercomputers, and one of them will be dedicated to ‘operational weather forecasting and meteorology for both the Air Force and Army. In particular, the machine will be used to run the latest high-resolution, global and regional weather models, which will be used to support weather forecasts for warfighters as well as for environmental impacts related to operations planning,” according to a write-up in The Next Platform. 

Why this matters: Supercomputers are going to have their strategic importance magnified by the arrival of increasingly capable compute-hungry AI systems, and we can expect military strategies to become more closely coupled with a military’s compute capacity over time. It’s all about the OODA loops, folks – and computers can do a lot of work here.
   Read more: US Military Buys Three Cray Supercomputers (The Next Platform).

####################################################

What do monkey brains and neural nets have in common? A lot, it turns out:
…Research suggests contemporary AI tools can approximate some of the neural circuits in a monkey brain…
Can software-based neural networks usefully approximate the (fuzzier, more complex) machinery of the organic brain? That’s a question researchers have been pondering since, well, the invention of neural nets via McCulloch and Pitts in the 1940s. But these days while we understand the brain much, much more than in the past, we’re using neural nets that model neurons in a highly simplistic form, relative to what goes on in organic brains (e.g., in organic brains neurons ‘spike’, whereas in most AI applications, neurons activate or not according to a threshold, transmitting a binary signal of an activation). A valuable question is whether we can still use this neural net machinery to better simulate, approximate, and (hopefully) understand the brain. 

Now, researchers from Deutsches Primatenzentrum GmbH, Stanford University, and the University of Goettingen have spent some time studying how Macaque monkeys observe and grasp objects, and have developed a software simulation of this which – encouragingly – closely mirrors experimental data gathered from the monkey’s themselves. “We bridge the gap between previous work in visual processing and motor control by modeling the entire processing pipeline from the visual input to muscle control of the arm and hand,” the authors write. 

The magic of an mRNN: For this work, the researchers analyzed activity in the brains of two macaque monkeys while they grasped a diverse set of 48 objects, studying the neural circuits that activated in the monkey brains as they did various things like perceive the object and send out muscle activations to grasp it. Based on their observations, they designed several neural network architectures to model this, all oriented around training what they call a modular recurrent neural network (mRNN). “We trained an mRNN with sparsely connected modules mimicking cortical areas to use visual features from Alexnet to produce the muscle kinematics required for grasping,” they explained. “The differences between individual modules in the mRNN paralleled the differences between cortical regions, suggesting that the design of the mRNN model with visual input paralleled the hierarchy observed in the brain.”

Why this matters: “Our results show that modeling the grasping circuit as an mRNN trained to produce muscle kinematics from visual features in a biologically plausible way well matches neural population dynamics and the difference between brain regions, and identifies a simple computational strategy by which these regions may complete this task in tandem,” they write. If further experimentation continues to show the robustness of this approach, then scientists may have a powerful new tool to use when thinking about the intersection between digital and organic intelligence. “We believe that the mRNN framework will provide an invaluable setting for hypothesis generation regarding inter-area communication, lesion studies, and computational dynamics in future neuroscience research”.
   Read more: A neural network model of flexible grasp movement generation (bioRxiv)

####################################################

DIY drones are getting really, really good:
…Daring Australian goes on a fishing expedition with a DIY drone…
Australian bureaucrats are wondering what to do about a man that used a DIY drone to go fishing. Specifically, the mysterious individual used the drone to lift a chair he was tethered in high above a reservoir in Australia, then he fished. Australia’s civil aviation safety authority (CASA) isn’t quite sure what to do about the whole situation. “This is a first for Australia, to have a large homemade drone being used to lift someone off the ground,” Peter Gibson, a CASA spokesman, told ABC News.

Why this matters: Drones are entering their consumerization phase, which means we’re going to see more and more cases of people tweaking off-the-shelf drone technology for idiosyncratic purposes – like fishing! Policymakers would be better prepared for the implications of a world containing cheap, powerful drones if they invested more resources in tracking the usage of such technologies.
   Read more: Gone fly fishing: Video of angler dangling from drone under investigation (ABC News).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

What will AGI look like? Reactions to Drexler’s service model:
AI systems taking the form of unbounded maximising agents pose some specific risks. E.g for any objective we give an agent, it will pursue certain instrumental goals, such as avoiding being turned off. But AI today doesn’t look much like this—Siri answers questions, but doesn’t have any overarching goal, the dogged pursuit of which will lead it to acquire large amounts of computing resources. Why, then, we would create such agents, given we aren’t doing so now, and the associated risks.

Services or agents: Drexler argues that we should instead expect AGI to look like lots of narrow AI services. There isn’t anything a unified agent could do that an aggregate of AI services could not; such a system would come without some of the risks from agential AI; and there is a clear pathway to this model from current AI systems. Critics object that there are benefits to agential AI that will create incentives to build them, in spite of the risks. Some tasks—like running a business—might require truly general intelligence, and agential AI might be significantly cheaper to train and deploy than a suite of AI services. 

Emerging agency: Even if we grant that there will not be good incentives to building agential AGI, some problems will re-emerge. For one, markets can be irrational, so AI development may steer towards building agential AGI despite good reasons not to. What’s more, agential behaviour could emerge from collections of non-agent AIs. Corporations are aggregates of individuals doing narrow tasks, from which agential behaviour can emerge: they can ruthlessly pursue some goal, act unboundedly in the world, and behave in ways their designers did not intend. So in an AI services world, there will still be safety problems arising from agency, but these may differ from the ‘classic’ problems, and demand different solutions.

Why it matters: The AI safety problem is figuring out how to build robust and beneficial AGI in a state of uncertainty about when—and if—we will build it, and what it will look like. We need research aimed at better predicting whether AGI will look more like Drexler’s vision, the ‘classical’ picture of unified agents, or something else entirely, and we need to have a plan for ensuring things go well in either eventuality.
   Read more: Book Review – Reframing Superintelligence (Slate Star Codex).
   Read more: Why Tool AIs Want to be Agents AIs (Gwern).

####################################################

Tech Tales:

The Instrument Generator

The instrument generator worked like this: the machine would generate a few seconds of audio and humans would vote on whether they liked or disliked the generated music. After a few thousand generations, the machine would come up with longer bits of music based on the segments that people had expressed an inclination for. These bits of music would get voted on again until an entire song had been created. Once the machine had a song, the second phase would begin – what people took to calling The Long Build. Here, the machine would work to synthesize a single, predominantly analog instrument that could create the song people had voted for. The construction process took anywhere between a week and a year, depending on how intricate and/or inhuman the song was – and therefore how intricate the generated instrument needed to be. Once the instrument was created, people would gather at their computers to tune-in to a global livestream where the instrument was unveiled in a random location somewhere on the Earth. These instruments would subsequently become tourist attractions in their own right, and a community of ‘song tourers’ formed who would travel around the world, using the generated inhuman instruments as their landmarks. In this way, AI helped humans find new ways to discover their own world, and allowed them a sense of agency when supervising the creation of new and unexpected things.

Things that inspired this story: Musical instruments; generative design; exhibitions; World’s Fair(s); the likelihood of humans and machines go-generating their futures together.

Import AI 161: Want a cheap robocar? Try MuSHR; Waymo releases a massive self-driving car dataset; and please don’t put weapons on your drone, says the FAA.

Is it a bird? Is it a plane? No, it’s a MuSHR robocar!
…University of Washington makes DIY robocar…
In the past few years, academics have begun designing all manner of open source robots, ranging from cheap robotic arms (Berkeley BLUE ImportAI #142) to quadruped dogbots (STOCH ImportAI #128) to a menagerie of drones. Now, researchers with the University of Washington have developed MuSHR (Multi-agent System for non-Holonomic Racing).

What MuSHR is:
MuSHR is an open source robot car that can be made using a combination of 3D-printed and off-the-shelf parts. Each MuSHR can can costs as little as $610, while a souped-up car equipped with more sensors can cost up to $1000. This compares to prices in the range of thousands to tens of thousands of dollars for other cars. The project ships with a range of inbuilt software utilities to help the cars navigate and move safely around the world. 

Why this matters: Hardware – as any roboticist knows – is a difficult, painful, and expensive thing to work on. At the same time, deploying AI systems onto hardware platforms like robot cars and drones is one of the best ways to evaluate the robustness of an AI system. Therefore, projects like MuSHR help more people develop AI systems that can be deployed on hardware, which will inspire research to make more robust, better performing algorithms.
   Read more: Allen School releases MuSHR robotic race car platform to drive advances in AI research and education (University of Washington).
   Find out more about MuSHR at the official website (mushr.io).

####################################################

FAA: Don’t attach weapons to your drone:
…US regulator wants people to not make drones into weapons…
The FAA has published a news release “warning the general public that it is illegal to operate a drone with a dangerous weapon attached”. 

Any predictions for when the FAA will issue a similar news release saying something like “it is illegal to operate a drone with a dangerous autonomous policy installed”?
   Read more: Drones and Weapons, A Dangerous Mix (FAA).

####################################################

DeepFakes are freaking Jordan Peterson out:
Public intellectual Jordan Peterson worries about how synthesized audio can mess up people’s lives…
Jordan Peterson, the Canadian psychologist and public intellectual and/or provocateur (depending on your personal opinion), is concerned about how synthesized audio may influence society. 

Why Peterson is concerned about DeepFakes: “It’s hard to imagine a technology with more power to disrupt,” he says. “I’m already in the position (as many of you soon will be as well) where anyone can produce a believable audio and perhaps video of me saying absolutely anything they want me to say. How can that possibly be fought?”

Why this matters: AI researchers have been aware about the potential for deepfakes for some years, but it was only in the past couple of years that the technology made its way to the mainstream (partially due to pioneering reporting by Samantha Cole at Vice magazine). Now, as celebrities like Peterson become aware of the technology, they’ll help make society aware that our media is about to become increasingly hard to verify.
   Read more: Jordan Peterson: The deepfake artists must be stopped before we no longer know what’s real (National Post).

####################################################

Google Waymo releases massive self-driving car dataset:
…12 Million 3D bounding boxes across 1,000 recordings of 20 seconds each…
Alphabet Inc subsidiary ‘Waymo’ – otherwise known as Google’s self-driving car project – has released the ‘Waymo Open Dataset’ (WOD) to help other researchers develop self-driving cars. 

What’s in the WOD?: The dataset contains 1,000 discrete recordings of different autonomous cars driving on different roads. Each segment is around 20 seconds long and includes sensor data from one mid-range LIDAR, four short-range LIDAR, five cameras, as well as sensor calibrations. The WOD data is also labelled, with each segment annotated with labels for four classes – vehicles, pedestrians, cyclists, and signs. All in all, WOD includes more than 12Million 3D bounding boxes and 1.2Million 2D bounding boxes. 

Diverse environments: The WOD contains data from a bunch of different environments, including urban and suburban scenes, as well as scenes recorded at night and in the day. 

Why this matters: Datasets like WOD will drive (haha!) progress in self-driving car research. The release of the dataset also seems to indicate that Waymo thinks a slice of its data isn’t sufficiently strategic to keep locked up – my intuition is that’s because the strategic differentiator in self-driving cars is basically how much compute you can throw at the data you’ve gathered, rather than the data itself.
   Get the data here (official Waymo website).

####################################################

The future of deepfakes: fast, cheap, and out of control:
…Ever wanted to easily morph one face to another? Now you can…
Roboticist Rodney Brooks coined the term Fast, Cheap, and Out of Control when thinking about the future of robots. That prediction hasn’t come to pass for robots (yet), but it’s looking likely to be true for the sorts of AI technology required to generate convincing, synthetic imagery and/or ‘deepfakes’. That’s the intuition you can take from a new system called a Face Swapping GAN (FSGAN), revealed by researchers with Bar-Ilan University and the Open University of Israel.

What is FSGAN?
“FSGAN is subject agnostic and can be applied to pairs of faces without requiring training on these faces,” the researchers write. The system is “end-to-end trainable and produces photorealistic, temporally coherent results”. FSGAN was pre-trained on several thousand pictures of people drawn from the IJB-C, LFW, and Figaro datasets. 

How convincing is FSGAN? The researchers test their systems in FaceForensics++, a dataset of real videos and synthetic AI-generated videos. They compare the outputs of their system to a classic ‘faceswap’ system, as well as a system called face2face. FSGAN generates significantly more realistic images than the outputs of these systems. 

Release strategy: The FSGAN researchers say technologies like this should be published “in order to drive the development of technical counter-measures for detecting such forgeries as well as compel law makers to set clear policies for addressing their implications”. It’s clear the publication can aid research on mitigation, but it’s very unclear that publishing a technology without an associated policy campaign can effect any change at all – and in fact, without a plan to discuss the implications with policymakers, policymakers will likely be surprised by the capabilities of the technology. “We feel strongly that it is of paramount importance to publish such technologies, in order to drive the development of technical counter-measures for detecting such forgeries as well as compel law makers to set clear policies for addressing their implications”, they write. 

Why this matters: Technologies that help create synthetic imagery will change how society thinks about ‘truth’ in the media sphere; the rapid evolution of technologies, as exemplified by FSGAN here.
   Read more: FSGAN: Subject Agnostic Face Swapping and Reenactment (Arxiv).

####################################################

The future? Endless surveillance via drones & ground-robots:
…Towards a fully autonomous surveillance society (FASS)…
One of the problems with today’s drones is their battery life – most sub-military drones just can’t fly for that long, yet around the world businesses and local government services departments (fire, health, police, etc) are starting to use compact, cheap, consumer-based drones – but these drones tend to have quite limited flight times. Now, researchers with the University of Minnesota have published a paper showing how – theoretically – you can pair fleets of drones with ground-based robots to create persistent surveillance over a pre-defined area. “We present a scalable strategy based on optimally partitioning the environment and having uniform teams of a single UGV and multiple UAVs that patrol over a cyclic route of the partitions,” they write.

Why this matters:
It won’t be long before we start using machines to automatically sense, analyze, and surveil the world around us. Papers like this show how we’re laying the theoretical foundations for such systems. Next – the easy (haha!) task of designing the ground robots and drones and their software interfaces!
   Read more: Persistent Surveillance with Energy-Constrained UAVs and Mobile Charging Stations (Arxiv).

####################################################

OpenAI Bits & Pieces:

OpenAI releases ~774 Million parameter GPT-2 model:
As part of our six-month updated on GPT-2, our language model, we’ve released the 774M parameter model, as well as a report documenting our experiences with staged release. In addition, we’ve released an open source legal agreement to help organizations privately share large-scale models with eachother. 
   Read more: GPT-2: 6-Month Follow-Up (OpenAI Blog).
Get the model here (OpenAI GitHub).
   Try out GPT-2 on talktotransformer.com.
   Read more in this Medium article from Dave Gershgorn: OpenAI Wants to Move Slow and Not Break Anything (Medium, OneZero).

Want to know if your system is resilient to adversarial examples? Use UAR:
We’ve developed a method to assess whether a neural network classifier can reliably defend against adversarial attacks not seen during training. Our method yields a new metric, UAR (Unforeseen Attack Robustness), which evaluates the robustness of a single model against an unanticipated attack, and highlights the need to measure performance across a more diverse range of unforeseen attacks.
   Read more: Testing Robustness Against Unforeseen Adversaries (OpenAI Blog).
   Get the code here (Arxiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Lessons on job displacement from the 19th Century:
AI-driven automation is generally expected to result in significant job displacement in the medium term. The early Industrial Revolution is a familiar historical parallel. This post draws some specific lessons from the period.

   Surprising causes: The key driver of change in the textile industry was the popularisation of patterned cotton fabrics from India, in the 17th Century. English weaving technologies were not able to efficiently provide these products, and this spurned the innovation that would drive the radical changes in the textile industry. It’s somewhat surprising that consumer fashion (and not, e.g., basic industry) that prompted this wave of disruption.

   Retraining is not a panacea: There were significant efforts to retrain displaced workers throughout the 19th Century, notably in the workhouses. These programs were poorly implemented. They failed to address the mismatches in the labour market, and were unsuccessful at lifting people out of poverty and improving working conditions.

   Beware bad evidence: The 1832 Royal Commission was established to address the acute crisis in the labor market. Despite being ostensibly evidence-based, the report had substantial methodological flaws, relying on narrow and biased data, much of which was ignored anyway. It resulted in the establishment of the workhouses, which were generally ineffective and unpopular.

   Why it matters: There were more attempts at addressing the problem of labour displacement in Victorian England than I had previously thought, and problems seem to have come more from bad execution than a lack of concern. Addressing technological unemployment seems hard, and efforts can easily backfire. Improving our ability to forecast technological change and the impact of policy decisions might be among the most valuable things we can be doing now.
   Read more: The loom and the thresher: Lessons in technological worker displacement (Medium).

####################################################

Tech Tales

It was almost midday, and the pigs had begun to cross the road. There were perhaps a hundred of them and they snuffled their way across the asphalt, some of them pausing to smell oil stains and cigarette butts. The car idled until the pigs had made it across, then continued. We turned our heads and watched the pigs as they went into the distance – they were marching in single file.
   “Incredible,” I said.
   “More like incredible training.” Astrid said.
   “How long?”
   “A few weeks. It’s getting good.”

As our car approached the complex, a flock of birds took off from a nearby tree and flew toward one of its towers. I used the carscreen to identify them: carrier pigeons.
   “Look,” Astrid said, pointing at a couple of bulges on the ankles of some of the birds. “It must be watching us.”
   And she was right: some of the birds had little cameras strapped to their ankles, and I was pretty sure that they’d be beaming the data back to the complex as soon as they flew into high-bandwidth range.
   “Isn’t it overkill?” I said. “It gets the feeds, why does it need a camera?”
   “Practice,” she said.
  Maybe it’s practicing on us, I thought. 

The car stopped in front of a gate and we got out. The gate started to open for us as we approached it, and four chickens came out. The chickens walked over to us and stood around us in a box formation. We walked through the gate and they followed, maintaining the box. I kept chickens when I was a kid: stupid creatures, though endearing. Borderline untrainable. I’d never have imagined them walking in formation.

At the center of the complex was a courtyard the size of a football field, with its ceiling enclosed in glass. The entire space was perforated with hundreds of tunnels, gates, and iris-opening inlets and outlets; through these portals, the animals travelled. Sometimes lights would flash in the tunnels and they would stop, or sounds would play and birds would change course, or the patterns of wind in the atrium would alter and the routes the animals were taking would change again. The AI that ran the complex was training them and we were here to find out why. 

When I turned around, I saw that Astrid was very far away from me – I’d been walking, lost in thought, away from her. She had apparently been doing the same. I called to her but at the moment I spoke her name a couple of loudspeakers blared and a flock of birds flew between us. I could vaguely see her between wingbeats, but when the birds had gone past she was further away from me. 

I guess it is training us, now. I’m not sure what for.

Things that inspired this story: Reinforcement learning; zoos; large-scale autonomous wildlife maintenance; Skinner machines.

Import AI: 160: Spotting sick crops in the iCassava challenge, testing AI agents with BSuite, and PHYRE tests if machines can learn physics

AI agents are getting smarter, so we need new evaluation methods. Enter BSuite:
…DeepMind’s testing framework is designed to let scientists know when progress is real and when it is an illusion…
When is progress real and when is it an illusion? That’s a question that comes down to measurement and, specifically, the ability for people to isolate the causes of advancement in a given scientific endeavor. To help scientists better measure and assess AI progress, researchers with DeepMind have developed and released the Behaviour Suite for Reinforcement Learning.

BSuite: What it is: BSuite is a software package to help researchers test out the capabilities of increasingly sophisticated reinforcement learning agents. BSuite ships with a set of experiments to help people assess how smart their agents are, and to isolate the specific causes for their intelligence. “These experiments embody fundamental issues, such as ‘exploration’ or ‘memory’ in a way that can be easily tested and iterated,” they write. “For the development of theory, they force us to instantiate measurable and falsifiable hypotheses that we might later formalize into provable guarantees.”

BSuite’s software: BSuite ships with experiments, reference implementations of several reinforcement learning algorithms, example ways to plug BSuite into other codebases like ‘OpenAI Gym’, scripts to automate running large-scale experiments on Google cloud, a pre-made Jupyter interactive notebook so people can easily monitor experiments, and a tool to formulaically generated the LaTeX needed for conference submissions. 

Testing your AI with BSuite’s experiments: Each BSuite experiment has three components: an environment, a period of interaction (e.g., 100 episodes), and ‘analysis’ code to map agent behaviour to results. BSuite lets researchers assess agent performance on multiple dimensions in a ‘radar’ plot that displays how well each agent does at a task in reference to things like memory, generalization, exploration, and so on. Initially, BSuite ships with several simple environments that challenge different parts of an RL algorithm, ranging from simple things like controlling a small mountain car as tries to climb a hill, to more complex scenarios based around exploration (e.g., “Deep Sea”) and memory (e.g., “memory_len” and “memory_size”).

Why this matters: BSuite is a symptom of a larger trend in AI research – we’re beginning to develop systems with such sophistication that we need to study them along multiple dimensions, while carefully curating the increasingly sophisticated environments we train them in. In a few years, perhaps we’ll see reinforcement learning agents mature to the point that they can start to develop across-the-board ‘superhuman’ capabilities at hard cognitive capabilities like memory and generalization – if that happens, we’d like to know, and it’ll be tools like BSuite that help us know this.
   Read more: Behaviour Suite for Reinforcement Learning (Arxiv).
   Get the DeepSuite code here (official GitHub repository).

####################################################

Spotting problems with Cassava via smartphone-deployed AI systems:
…All watched over and fed by machines of loving grace…
Cassava is the second largest provider of carbohydrates in Africa. How could the use of artificial intelligence help local farms better farm and care for this crucial, staple crop? New research from Google, the Artificial Intelligence Lab at Makerere University, and the National Crops Resources Research Institute in Uganda, proposes a new AI competition to encourage researchers to design systems that can diagnose various cassava diseases. 

Smartphones, meet AI: Smartphones have proliferated wildly across Africa, meaning that even many poor farmers have access to a device with a modern digital camera and some local processing capacity. The idea behind the iCassava 2019 competition is to develop systems that can be deployed on these smartphones, letting farmers automatically diagnose their crops. “The solution should be able to run on the farmers phones, requiring a fast and light-weight model with minimal access to the cloud,” the researchers write. 

iCassava 2019: The competition required systems to differentiate between five labels for each Cassava picture: healthy, or one of four Cassava diseases: brown steak disease (CBSD), mosaic disease (CMD), bacterial blight (CBB), and green mite (CGM). The data was collected as part of a crowdsourcing project using smartphones, so the images in the dataset have a variety of different lighting patterns and other confounding factors, like strange angles, photos from different times of day, improper camera focus, and so on.

iCassava 2019 results and next steps: The top three contenders in the competition each obtained accuracy scores of around 93%. The winning entry used a large corpus of unlabeled images as an additional training signal. All winners built their systems around a residual network (resnet). 

Next steps: The challenge authors plan to build and release more Cassava datasets in the future, and also plan to host more challenges “which incorporate the extra complexities arising from multiple diseases associated with each plant as well as varying levels of severity”. 

Why this matters: Systems like this show how AI can have a significant real-world impact, and point to a future where governments initiate competitions to help their civilians deal with day-to-day problems, like diagnosing crop diseases. And as smartphones get more powerful and cheaper over time, we can expect more and more powerful AI capabilities to get distributed to the ‘edge’ in this way. Soon, everyone will have special ‘sensory augmentations’ enabled by custom AI models deployed on phones.
   Read more: iCassava 2019 Fine-Grained Visual Categorization Challenge (Arxiv).
   Get the Cassava data here (official competition GitHub).

####################################################

Accessibility and AI, meet Kannada-MNIST:
…Building new datasets to make cultures visible to machines…
AI classifiers, increasingly, rule the world around us: They decide what gets noticed and what doesn’t. They apply labels. They ultimately make decisions. And when it comes to writing, most of these classifiers are built to work for the world’s largest and well-documented languages – think English, Chinese, French, German, and so on. What about all the other languages in the world? For them to be ‘seen’, we’ll need to be able to develop systems that can understand them – that’s the idea behind Kannada-MNIST, an MNIST-clone that uses the Kannada versions of the numbers 0 to 9. In Kannada, “Distinct glyphs are used to represent the numerals 0-9 in the language that appear distinct from the modern Hindu-Arabic numerals in vogue in much of the world today,” the author of the research writes. 

Why MNIST? MNIST is the ‘hello world’ of AI – it’s a small, incredibly well-documented and studied, dataset consisting of tens of thousands of handwritten numbers ranging from 0 to 9. MNIST has since been superseded by more sophisticated datasets, like CIFAR and ImageNet. But many researchers will still validate things against it during the early stages of research. Therefore, creating variants of MNIST that are similarly small, tractable, and well-documented seems like a helpful thing to do for researchers. It also seems like creating MNIST variants in things that are currently understudied – like the Kannada language – can be a cheap way to generate interest. To generate Kannada-MNIST, 65 volunteers drew 70,000 numerals in total.  

A harder MNIST: The researcher has also developed Dig-MNIST – this is a version of the Kannada dataset were volunteers were exposed to Kannada numerals for the first time then had to draw their own versions. “This sampling-bias, combined with the fact we used a completely different writing sheet dimension and scanner settings, resulted in a dataset that would turn out to be far more challenging than the [standard Kannada] test dataset”, the author writes. 

Why this matters: Soon, we’ll have two worlds: the normal world and the AI-driven world. Right now, the AI-driven world is going to favor some of the contemporary world’s dominant cultures/languages/stereotypes, and so on. Datasets like Kannada-MNIST can potentially help shift this balance.
   Read more: Kannada-MNIST: A New Handwritten Digits Dataset for the Kannada Language (Arxiv).
   The companion GitHub repository for this paper is here (Kannada MNIST GitHub)

####################################################

Your machine sounds funny – I predict it’s going to explode:
…ToyADMOS dataset helps people teach machines to spot the audio hallmarks of mechanical faults…
Did you know that it’s possible to listen for failure, as well as visually analyze for it? Now, researchers with NTT Media Intelligence Laboratories and Ritsumeikan University want to make it easier to teach machines to listen for faults via a new dataset called ToyADMOS. 

ToyADMOS: ToyADMOS is designed around three tasks: production inspection of a toy car, fault diagnosis of a fixed machine (toy conveyor), and fault diagnosis for a machine machine (a toy train). Each scenario is recorded with multiple microphones, capturing both machine and environmental sounds. ToyADMOS contains “over 180 hours of normal machine-operating sounds and over 4,000 samples of anomalous sounds collected with four microphones at a 48-kHz sampling rate,” they write. 

Faults, faults everywhere: For each of the tasks, the researchers simulated a variety of failures. These included things like running the toy car with a bent shaft, or with different sorts of tyres; altering the tensions in the pulleys of the toy conveyor, and breaking the axles and tracks of the toy train. 

Why ToyADMOS: Researchers should use the dataset because it was built under controlled conditions, letting the researchers easily separate and label anomalous and non-anomalous sounds. “The limitation of the ToyADMOS dataset is that toy sounds and real machine sounds do not necessarily match exactly,” they write. “One of the determining factors of machine sounds is the size of the machine. Therefore, the details of the spectral shape of a toy and a real machine sound often differ, even though the time-frequency structure is similar. Thus, we need to reconsider the pre-processing parameters evaluated with the ToyADMOS dataset, such as filterbank parameters, before using it with a real-world ADMOS system. 

Why this matters: In a few years, many parts of the world will be watched over by machines – machines that will ‘see’ and ‘hear’ the world around them, learning what things are usual and what things are unusual. Eventually, we can imagine warehouses where small machines are removed weeks before they break, after a machine with a distinguished ear spots the idiosyncratic sounds of a future-break.
   Read more: ToyADMOS: A Dataset of Miniature-Machine Operating Sounds For Anomalous Sound Detection (Arxiv).
   Get the ToyADMOS data from here (Arxiv).

####################################################

Can your AI learn the laws of nature? No. What about the laws of PHYRE?
…Facebook’s new simulator challenges agents to interact with a complex, 2D, physics world…
Given a non-random universe, infinite time, and the ability to experiment, could we learn the rules of existence? The answer to this is, intuitively, yes. Now, researchers with Facebook AI Research want to see if they can use a basic physics simulator to teach AI systems physics-based reasoning. The new ‘PHYRE” (PHYsical REasoning) benchmark gives AI researchers a tool to test how well their systems understand complex things like causality, physical dynamics, and so on. 

What PHYRE is: PHYRE is a simulate that contains a bunch of environments which can be manipulated via RL agents. Each environment is a two-dimensional world containing “a constant downward gravitational force and a small amount of friction”. The agent is presented with a scenario, like a ball in a green cup balanced on a platform above a ball in the red cup, and asked to change the state of the world – for instance, by being asked to place the ball in the green cup into the one with the red cup. “The agent aims to achieve the goal by taking a single action, placing one or more new dynamic bodies into the world”, the researchers write. In this case, the agent could solve its taks by manifesting a ball which could roll into the green cup, tipping it over so the ball falls into the red cup. “Once the simulation is complete, the agent receives a binary reward indicating whether the goal was achieved”, they write. 

One benchmark, many challenges: PHYRE initially consists of two tiers of difficulty (one ball and two balls), and each tier has 25 task templates (think of these templates as like basic worlds in a videogame) and each template contains 100 tasks (think of these as like individual levels in a videogame world). 

How hard is it? In tests, the researchers show that a variety of baselines – including souped-up versions of DQN, and a non-parametric agent with online learning – struggle to do well even on the single-ball tasks, barely obtaining scores better than 50% on many of them. “PHYRE aims to enable the development of physical reasoning algorithms with strong generalization properties mirroring those of humans,” the researchers write. “Yet the baseline methods studied in this work are far from this goal, demonstrating limited generalization abilities”. 

Why this matters: For the past few years multiple different AI groups have taken a swing at the hard problem of developing agents that can learn to model the physics dynamics of an environment. The problem these researchers keep running into is that agents, as any AI practitioner knows, are so damn lazy they’ll solve the task without learning anything useful! Simulators like PHYRE represent another attempt to see if we can develop the right environment and infrastructure to encourage the right kind of learning to emerge. In the next year or so, we’ll be able to judge how successful this is by reading papers that reference the benchmark.
   Read more: PHYRE: A New Benchmark for Physical Reasoning (Arxiv).
   Play with PHYRE tasks on this interactive website (PHYRE website).
   Get the PHYRE code here (PHYRE GitHub).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Why Peter Thiel’s views on AI miss the forest for the trees:
Peter Thiel, co-founder of Palantir and PayPal, wrote an opinion piece earlier this month on military applications of AI and US-China competition. Thiel argued that AI should be treated primarily as a military technology, and attacked Google and others for opening AI labs in China.

AI is not a military technology:
While it will have military applications, advanced AI is better compared with electricity, rather than nuclear weapons. AI is an all-purpose tool that will have wide-ranging applications, including military uses, but also countless others. While it is important to understand the military implications of AI, it is in everyone’s interest to ensure the technology is developed primarily for the benefit of humanity, rather than waging war. Thiel’s company, Palantir, has major defense contracts with the US government, leading critics to point out his commercial interest in propagating the narrative of AI being primarily a military technology. 

Cooperation is good: Thiel’s criticism of firms for opening labs in China, and hiring Chinese nationals is also misguided. The US and China are the leading players in AI, and forging trust and communication between the two communities is a clear positive for the world. Ensuring that the development of advanced AI goes well will require significant coordination between powers — for example, developing shared standards on withholding dangerous research, or on technical safety.

Why it matters: There is a real risk that an arms race dynamic between the US and China could lead to increased militarization of AI technologies, and to both sides underinvesting in ensuring AI systems are robust and beneficial. This could have catastrophic consequences, and would reduce the likelihood of advanced AI resulting in broadly distributed benefits for humanity. The AI community should resist attempts to propagate hawkish narratives about US-China competition.
   Read more: Why an AI arms race with China would be bad for humanity (Vox).

####################################################

Tech Tales:

We’ll All Know in the End (WAKE)

“There there,” the robot said, “all better now”. Its manipulator clanged into the metal chest of the other robot, which then issued a series of beeps, before the lights in its eyes dimmed and it became still.
   “Bring the recycler,” the robot said. “Our friend has passed on.”
   The recycling cart appeared a couple of minutes later. It wheezed its way up to the two robots, then opened a door in its side; the living robot pushed the small robot in, the door shut, and the recyling cart left.
   “Now,” said the living robot in the cold, dark room. “Who else needs assistance?”

Outside the room, the recycler moved down a corridor. It entered other rooms and collected other robots. Then it reached the end of the corridor and stopped in front of a door with the words NURSERY burned into its wood via lazer. It issued a series of beeps and then the door swung open. The recycler trundled in. 

Perhaps three hours later, some small, living robots crawled out of a door at the other end of the NURSEY. They emerged, blinking and happy and their clocks set running, to explore the world and learn about it. A large robot waited for them and extended its manipulator to hold their hands. “There there,” it said. “All better now”. Together, they trundled into the distance. 

This has been happening for more than one thousand years. 

Things that inspired this story: Hospices; patterns masked in static; Rashomon for robots; the circle of life – Silicon Edition!.