Import AI: #65: Berkeley teaches robots to predict the world around them, AlphaGo Zero’s intelligence explosion, and Facebook reveals a multi-agent approach to language translation

by Jack Clark

Welcome to Import AI, subscribe here.

Facebook’s translators of the future could be little AI agents that teach eachother:
…That’s the idea behind new research where instead of having one agent try to learn correspondence between languages from a large corpus of text, you instead have two agents which each know a different language attempt to define images to one another. The approach works in simple environments today but, as with most deep learning techniques, can and will be scaled up rapidly for larger experiments now that it has shown promise.
…The experimental setup: “We let two agents communicate with each other in their own respective languages to solve a visual referential task. One agent sees an image and describes it in its native language to the other agent. The other agent is given several images, one of which is the same image shown to the first agent, and has to choose the correct image using the description. The game is played in both directions simultaneously, and the agents are jointly trained to solve this task. We only allow agents to send a sequence of discrete symbols to each other, and never a continuous vector.
…The results: For sentence-level precision, they train on the MS COCO dataset which contains numerous English<>Image pairs, and STAIR which contains Japanese captions for the same images, along with translations of German to English phrases and associated images, with the German phrases made by a professional translator. These results are encouraging, with systems trained in this way attaining competitive or higher BLEU scores than alternate systems.
…This points to a future where we use multiple, distinct learning agents within larger AI software, delegating increasingly complicated tasks to smart, adaptable components that are able to propagate information between and across eachother. (Good luck debugging these!)
…Read more: Emergent Translation in Multi-Agent Communication.

Sponsored: What does Intelligent Automation Adoption in US Business Services look like as of September 2017? The Intelligent Automation New Orleans Team is here to provide you with real-time data on the global IA landscape for business services, gathered from current IA customers and vendors by SSON Analytics.
Explore the interactive report.
…One stat from the report: 66.5% of recorded IA pilots/implementations are by large organizations with annual revenue >$10 Billion USD.

History is important, especially in AI:
Recently I asked the community of AI practitioners on Twitter what papers I should read that are a) more than ten years old and b) don’t directly involve Bengio/Hinton/Schmidhuber/Lecun.
…I was fortunate to get a bunch of great replies, spanning giants of the field like Minsky and Shannon, to somewhat more recent works on robotics, apprenticeship learning, and more.
Take a gander to the replies to my tweet here.
…(These papers will feed my suspicions that about half of the ‘new’ things covered in modern AI papers are just somewhat subtle reinventions and/or parallel inventions of ideas already devised in the past. Time is a recurrent network, etc, etc.

Intelligence explosions: AlphaGo Zero & self-play:
…DeepMind has given details on AlphaGo’s final form – a software system trained without human demonstrations, entirely from self-play, with few handcrafted reward functions. The software, named AlphaGo Zero, is able to beat all previous versions of itself and, at least based on ELO scores, develop a far greater Go capability than any other preceding system (or recorded human). The most intriguing part of AlphaGo Zero is how rapidly it goes from nothing to something via self-play. OpenAI observed a similar phenomena with the Dota 2 project, in which self-play catapulted our system from sub-human to super-human in a few days.
Read more here at the DeepMind blog.

Love AI? Have some spare CPUs? Want some pre-built AI algorithms? Then Intel has a framework for you!
…Intel has released Coach, an open source AI development framework. It does all the standard things you’d expect like letting you define a single agent then run it on many separate environments with inbuilt analytics and visualization.
…It also provides support for Neon (an AI framework developed by Intel following its acquisition of startup Nervana) as well as the Intel-optimized version of TensorFlow. Intel says it’s relatively easy to integrate new algorithms.
…Coach ships with 16 pre-made AI algorithms spread across policy optimization and value optimization approaches, including classics like DQN and Actor-Critic, as well as newer ones like Distributional DQN and Proximal Policy Optimization. It also supports a variety of different simulation environments, letting developers test out approaches on a variety of challenges to protect against overfitting to a particular target domain. Good documentation as well.
Read more about Coach and how it is designed here.

Training simulated self-driving cars (and real RC trucks) with conditional imitation learning:
…Imitation learning is a technique used by researchers to get AI systems to improve their performance by imitating expert actions, usually by studying demonstration datasets. Intuitively, this seems like the sort of approach that might be useful for developing self-driving cars – the world has a lot of competent drivers so if we can capture their data and imitate good behaviors, we can potentially build smarter self-driving cars. But the problem is that when driving a lot of information needed to make correct decisions is implicit from context, rather than made explicit through signage or devices like traffic lights.
…New research from Intel Labs, King Abdullah University of Science and Technology, and the University of Barcelona, suggests one way around these problems: conditional imitation learning. In conditional imitation learning you explicitly queue up different actions to imitate based on input commands, such as ‘turn left’, ‘turn right’, ‘straight at the next intersection’, and ‘follow the road’. By factoring in this knowledge the researchers show you can learn flexible self-driving car policies that appear to generalize well as well.
…Adding in this kind of command structure isn’t trivial – in one experiment the researchers try to have the imitation learning policy factor the commands into its larger learning process, but this didn’t work reliably as there was no guarantee the system would always perfectly condition on the commands. To fix this, the researchers structure the system so it is fed a list of all the possible commands it may encounter, and is told to initiate a new branch of itself for dealing with each command, letting it learn separate policies for things like driving forward, or turning left, etc.
Results: The system works well in the test-set of simulated townes. It also does well on a one-fifth scale remote controlled car deployed in the real world (brand: Traxxas Maxx, using an NVIDIA TX2 chip for onboard inference, and Holybro Pixhawk flight controller software to handle the command setting and inputs).
Evocative AI of the week: the paper includes a wryly funny description of what would happen if you trained expert self-driving car policies without an explicit command structure. “Moreover, even if a controller trained to imitate demonstrations of urban driving did learn to make turns and avoid collisions, it would still not constitute a useful driving system. It would wander the streets, making arbitrary decisions at intersections. A passenger in such a vehicle would not be able to communicate the intended direction of travel to the controller, or give it commands regarding which turns to take,” they write.
…Read more here: End-to-End Driving via Conditional Imitation Learning.

Basic Income trial launches in Stockton, California
..Stockton is launching a Basic Income trial that will give $500, no strings attached, to each resident of the struggling Californian town.
…One of the main worries of the AI sector is that its innovations will lead to a substantial amount of short-term pain and disruption for those whose jobs are automated. Many AI researchers and engineers put forward basic income as a solution to the changes AI will bring to society. But a major problem with the discourse around basic income is the lack of data. Pilots like the Stockton one will change that (though let’s be clear: the average rent for a one bedroom apartment in Stockton is around $900 a month, so this scheme is relatively small beer compared to the costs most residents will face).
…Read more here at BuzzFeed: Basic Income Isn’t Just About Robots, Says Major Who Just Launched Pilot Program.

Faking that the robots are already among us with the ‘Wizard of Oz’ human feedback technique:
Research from the US Army Research Lab outlines a way to collect human feedback for a given task in a way that is eventually amenable to AI techniques. It uses a Wizard of Oz (WoZ) methodology (called this because the participants don’t know who is ‘behind the curtain’ – whether human or machine). The task involves a person giving instructions to a WoZ dialogue interface which relays instructions to a WoZ robot, which carries out the instructions and performs back.
…In this research, both components of the WoZ system were accomplished by humans. The main contribution of this type of research is that it a) provides us with ways to design systems that can eventually be automated when we’ve developed sufficiently powerful AI algorithms and, b) it generates the sorts of data ultimately needed to build systems with these sorts of capabilities.
…Read more here: Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz Interface for Collecting Human-Robot Dialogue.
AI archeological curiosity of the week: This system “was adapted from a design used for WoZ prototyping of a dialogue system in which humans can engage in time-offset interaction with a WWII Holocaust survivor (Artstein et al. 2015). In that application, people could ask a question of the system, and a pre-recorded video of the Holocaust survivor would be presented, answering the question.”

Follow the birdie! Berkeley researchers build better predictive models for robots:
…Prediction is an amazingly difficult problem in AI, because once you try to predict something you’re essentially trying to model the world and roll it forward – when we do this as humans we implicitly draw on most of the powerful cognitive machinery we’re equipped with, ranging from imagination, object modeling and disentanglement, intuitive models of physics, and so on. Our AI algorithms mostly lack these capabilities. That means when we try to do prediction we either have to train on large enough sets of data that we can deal with other, unseen situations that are still within the (large) distribution carved out by our training set. Or we need to invent smarter algorithms to help us perform certain cognitively difficult tasks.
…Researchers with the University of California at Berkeley and the Technical University of Munich, have devised a way to get robots to be able to not only identify objects in a scene but also remember roughly where they are, letting them learn long-term correspondences that are robust to distractors (aka: crowded scenes) and also the actions of the robot itself (which can sometimes clutter up the visual space and confuse traditional classifiers.) The approach relies on what they call a ‘Skip-Connection Neural Advection Model’.
…The results: “Visual predictive models trained entirely with videos from random pushing motions can be leveraged to build a model-predictive control scheme that is able to solve a wide range multiobjective pushing tasks in spite of occlusions. We also demonstrated that we can combine both discrete and continuous actions in an action-conditioned video prediction framework to perform more complex behaviors, such as lifting the gripper to move over objects.”
…Systems using SNA outperform previous systems, and fall within the standard of deviation of scores of a prior system augmented with a planning cost devices alongside SNA.
…Further research is required to let this approach handle more complex tasks and to handle things that require multiple discrete steps of work, they note.
…Read more here: Self-Supervised Visual Planning with Temporal Skip-Connections.

PlaidML hints at a multi-GPU, multi-CPU AI world:
AI Startup Vertex.ai has released PlaidML, a programming middleware stack that lets you run Keras on pretty much anything that runs OpenCL. This means the dream of ‘write once, run anywhere’ programming for AI has got a little bit closer. Vertex claim that Plaid only adds a relatively small amount of overhead to programming operations compared to stock TensorFlow. At launch it only supports Keras – a programming framework that many AI developers use because of its expressivity and simplicity. Support for TensorFlow, PyTorch, and deeplearning4j, is coming eventually, Vertex says.
Read more here on the Vertex.ai blog.
Get the code here.
…Want to run and benchmark it right now? sudo pip install plaidml plaidml-keras / git clone https://github.com/plaidml/plaidbench / cd plaidbench / pip install -r requirements.txt / python plaidbench.py mobilenet

Google releases AVA video dataset for action recognition:
…Google has released AVA, the Atomic Video Actions dataset, consisting of 80 individual actions represented by ~210,000 distinct labels that cover ~57,000 distinct video clips.
…Video analysis is the new frontier of AI research, following the success of general image recognition techniques on single, static datasets; given enough data, it’s usually possible to train a highly accurate classifier, and the current research challenge is more about scaling-up techniques and improving their sample efficiency, rather than getting to something capable of interesting (aka reasonably high-scoring) behavior.
…Google is also able to perform analysis on this dataset to discover actions that are frequently combined with one another, as each video clip tends to be sliced from a larger 15-minute segment of a single video, allowing the dataset to feature numerous co-occurrences, that could be used by researchers in the future to model even longer range temporal dynamics.
…No surprises that some of the most frequently occurring action labels include ‘hitting’ and ‘martial arts’; ‘shovel’ and ‘digging’; ‘lift a person’ and ‘play with kids’; and ‘hug’ and ‘kiss’ among others (aww!).
Read more here on the Google Research Blog.
Arxiv paper about AVA here.
Get the data directly from Google’s AVA website here.

AI regulation proposals from AI Now:
AI Now, a research organization founded by Meredith Whittaker of Google and Kate Crawford of Microsoft Research, has released its second annual report.
…One concrete proposal is that “core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g “high stakes” domains) should no longer use ‘black box’ AI and algorithmic systems.” If this sort of proposal got picked up it would lead to a significant change in the way that AI algorithms are programmed and deployed, making it more difficult for people to deploy deep learning based solutions unless able to satisfy certain criteria relating to the interpretability of deployed systems.
…There are also calls for more continuous testing of AI systems both during development and following deployment, along with recommendations relating to the care and handling and inspection of data. It also calls for more teeth in the self-regulation of AI, arguing that the community should develop accountability mechanisms and enforcement techniques to ensure people have an incentive to follow standards.
…Read a summary of the report here.
Or skip to the entire report in full (PDF).
Another specific request is that companies, conferences, and academic institutions should “release data on the participation of women, minorities and other marginalized groups within AI research and development”. The AI community has a tremendous problem with diversity. At the NIPS AI conference this year there is a session called ‘Black in AI’, which has already drawn critical comments from (personal belief: boneheaded, small-minded) people who aren’t keen on events like this and seem to refuse to admit there’s a representation problem in AI.
Read more about the controversy in this story from Bloomberg News.
Read more about the workshop at NIPS here.

Universities try to crack AI’s reproducability crisis:
AI programs are large, interconnected, modular bits of software. And because of how their main technical components work they have a tendency to fail silently and subtly. This, combined with a tendency among many researchers to either not release code, or release hard-to-understand ‘researcher code’, makes it uniquely difficult to reproduce the results found in many papers. (And that’s before we even get to the tendency for the random starting seed to significantly determine the performance of any given algorithm.)
…Now, a coalition of researchers with a variety of universities are putting together a ‘reproducability challenge’, which will challenge participants to take papers submitted to the International Conference on Learning Representations (ICLR) and try to reproduce their results.
…”You should select a paper from the 2018 ICLR submissions, and aim to replicate the experiments described in the paper. The goal is to assess if the experiments are reproducible, and to determine if the conclusions of the paper are supported by your findings. Your results can be either positive (i.e. confirm reproducibility), or negative (i.e. explain what you were unable to reproduce, and potentially explain why).”
…My suspicion is that the results of this competition will be broadly damning for the AI community, highlighting just how hard it is to reproduce systems and results – even when (some) code is available.
…Read more here: ICLR 2018 Reproducability Challenge.

OpenAI Bits&Pieces:

Randomness is your friend…sometimes:
If you randomize your simulator enough then you may be able to train models that can rapidly generalize to real-world robots. Along with randomizing the visual appearance of the scene it’s also worth randomizing the underlying dynamics – torques, frictions, mass, and so on – to build machines that that can adjust to unanticipated forces encounted in the real world. Worth doing if you don’t mind spending the additional compute budget to run the simulation(s).
…Read more here: Generalizing from Simulation.

Why AI safety matters:
…Op-ed from OpenAI’s Ilya Sutskever and Dario Amodei in The Wall Street Journal about AI safety, safety issues, and why intelligence explosions from self-play can help us reason about futuristic AI systems.
…Read the op-ed here. Protecting Against AI’s Existential Threat.

Tech Tales:

Administrative Note: A few people have asked me this lately so figured I should make clear: I am the author of all the sci-fi shorts in tech tales unless otherwise indicated (I’ve run a couple of ‘guest post’ stories in the past). At some point I’m going to put together an e-book / real book, if people are into that. If you have comments, criticisms, or suggestions, I’d love to hear from you: jack@jack-clark.net

[2025: Boston, MIT, the glassy Lego-funded cathedral of the MIT Media Lab, a row of computers.]

So I want to call it Forest Virgil
Awful name. How about NatureSense
Sounds like a startup
ForestFeel?
Closer.
What’s it doing today?
Let’s see.

Liz, architect of the software that might end up being called ForestFeel, opens the program. On screen appears a satellite view of a section of the Amazon rainforest. She clicks on a drop-down menu that says ‘show senses’. The view lights up. The treetops become stippled with pink. Occasional blue globs appear in the gaps between the tree canopy. Sometimes these blobs grow in size, and others blink in and out rapidly, like LED lights on dance clothing. Sometimes flashes of red erupt, spreading a web of sparkling veins over the forest, defining paths – but for what is only known to the software.

Liz can read the view, having spent enough time developing intuitions to know that the red can correspond to wild animals and the blue to flocks of birds. Bright blues are the births of things. Today the forest seems happy, Liz thinks, with few indications of predation. ForestFeel is a large-scale AI system trained on decades of audio-visual data, harvested from satellites and drones and sometimes (human) on-the-ground inspectors. All of this data is fed into the ForestFeel AI stack, which fuses it together and tries to learn correspondences and patterns too deep for people to infer on their own. Following recent trials, ForestFeel is now equipped with neurological data gleaned from brain-interface implants installed in some of the creates of the forest.

Call it an art project or call it a scientific experiment, but what everyone agrees on is that Liz, like sea captains who can smell storms at distance or military types who intuit the differences between safe and dangerous crowds, has developed a feel for ForestFeel, able to read its own analysis more deftly than other things – either human or software.

So one month when Liz texts her labmates: SSH into REDACTED something bad is happening in the forest, she gets a big audience. Almost a hundred people from across the commingled MIT/Harvard/Hacker communities tune in. And what they see is red and purple and violet fizzing across the forest scene, centered around the yellow industrial machines brought in by illegal loggers. ForestFeel automatically files a report with the local police, but it’ll take them hours to reach the site, and there’s a high chance they have been bribed to be either late or lost.

No one needs Liz to narrate this scene for them. The reds and purples and blues and their coruscating connections, flickering in and out, are easy to decode: pain. Anguish. Fear. A unified outcry from the flora and fauna of the forest, of worry and anxiety and confusion. The trees are logged. A hole is ripped in the world.

ForrestFeel does do some good, though: Liz is able to turn the playbacks from the scene of destruction into posters and animated gifs and movies, which she seeds across the ‘net, hoping that the outcries of an alien and separate mind are sufficient to stir the world into action. When computers can feel a pain that’s deeper and more comprehensive than that of humans, can they lead to a change in human behavior? Will the humans listen? Liz thinks, finding her emotions more evocative of those found in the software she has built than those of her fellow organic kin.