Import AI: Issue 34: DARPA seeks lifelong learners , didactic learning via Scaffolding Networks, and even more neural maps

by Jack Clark


Lifelong learners, DARPA wants you: DARPA is funding a new program called ‘Lifelong Learning Machines’ (L2M). The plan is to stimulate research into AI systems that can improve even after they’ve been deployed, and ideally without needing to sync up with a cloud. This will require new approaches to system design (and my intuition tells me that things like auxiliary objective identification in RL, or fixing the catastrophic forgetting problem, will be needed here…
…there’s also an AI safety component to the research, as it “calls for the development of techniques for monitoring a ML system’s behavior, setting limits on the scope of its ability to adapt, and intervening in the system’s functions as needed.”
… it also wants to fund science that studies living things and explores what can be derived from that.

Baidu employs 1,300 AI researchers and has spent billions of dollars on development of the tech in the last two and a half years, reports Bloomberg.

Better intuitions through visualization: Facebook has released Visdom, a tool to help researchers and technologists visualize the output of scientific experiments using dynamic, modern web technologies. People are free to mix and match and modify different components, tuning the visualizer to their needs.

Learning to reason about images: One of the challenges of language is its relation to embodiment – our sense of our minds being coupled to our physical bodies – and our experience of the world. Most AI systems are trained purely on text without other data, so their ability to truly understand the language they’ve been exposed to is limited. You don’t know what you don’t know, etc. Moreover, it appears that having a body, as such, helps with our own understanding of concepts related to physics, for example. Many research groups (including OpenAI) are trying to tackle this problem in different ways.
… But before going through the expense of training agents to develop language in a dynamic simulation, you can experiment instead with multi-modal learning, which trains a machine to identify, say, speech and text, or text and imagery, or sound and images and so on. This sort of re-combination yields richer models and dodges the expense building and calibrating a simulator.
.. A new paper from researchers at the University of Lille, University of Montreal, and DeepMind, describes a system that is better able to tie text to entities in images through joint training, paired with an ability to interrogate itself about its own understanding. The research, “End-to-end optimization of goal-driven and visually grounded dialogue systems,” (PDF) applies reinforcement learning techniques to the problem of getting software to identify the contents of the image…
… the system works by using the GuessWhat?! Dataset to create an ‘Oracle’ system that knows there is a certain object at a certain location in an image, and a Questioner system, which attempts to discern which object the Oracle knows about through a series of yes or no questions. It might look something like this:
Is it a person? No
Is it an item being worn or held? Yes
Is it a snowboard? Yes
Is it the red one? No
Is it the one being held by the person in blue? Yes
…This dialog helps create a representation of the types of questions (and related visual entities) to filter through when the Questioner tries to identify the Oracle’s secret item. The results are encouraging, with several multi-digit percentage point improvements (although these systems still only operate at about ~62% of human performance, with more work clearly needed).

Google’s voice ad experiment: What happens to Google when people no longer search the internet using text and instead spend most of their time interacting with voice interfaces? It’s not a good situation for the web giant’s predominantly text-based ad business. Now, Google appears to have used its ‘Google Home’ voicebox to experiment with delivering ads to people along with the remainder of its helpful verbal chirps. In this case, Google used its digital emissary to tell people, unprompted, about Beauty and the Beast. But don’t worry, Google sent a perplexing response to The Register that said: “this isn’t an ad; the beauty in the Assistant is that it invites our partners to be our guest and share their tales. (If this statement shorn of context makes sense to you, then you might have an MBA!) It subsequently issued another statement apologizing for the experiment.

Deep learning can’t be the end, can it? I attended an AI dinner by Amplify Partners this week and we spoke about how it seems likely that some new techniques will emerge that obviate some deep learning approaches. ‘There has to be,’ one of them said, ‘because these things are so horrible and uninterpretable.’ That’s a common refrain I hear from people. What I’m curious about is whether some of the deep learning primitives will persist – it feels like they’re sufficiently general to play a role in other things. Convolutional neural networks, for instance, seem like a good format for sensory processing.

Up the ladder to the roof with Scaffolding Networks: How do we get computers to learn as they evolve, gaining in capability through their lives, just as humans and many animals do? One approach is curriculum learning, which involves training an AI to solve successively harder tasks. In Scaffolding Networks for Teaching and Learning to Comprehend, the researchers develop software that can learn to incorporate new information into its internal world representation over time, and is able to query itself about the data it has learned, to aid memorization and accuracy…
… the scaffolding network incorporates a ‘question simulator,’ which automatically generates questions and answers about what has been learned so far and then tests the network to ensure it retains memory. The question system isn’t that complex – it samples from all the already sampled sentences, picks one, chops out a random word, and then asks a question intended to get the student to figure out the correct word. This being 2017, Microsoft is exploring extending this approach by adding in an adversarial approach to generate better candidate question and answers.

Maps, neural maps are EVERYWHERE: A few weeks ago I profiled research that lets a computer create its own map of its territory to help it navigate, among other tasks. Clearly, a bunch of machine learning people were recently abducted by a splinter group of the North American Cartographic Information Society, because there’s now a flurry of papers that represent memory to a machine as a map…
… research from CMU, “Neural Map: Structured Memory for Deep Reinforcement Learning,” trains agents with a large short-term memory represented in a 2D topology with read and write patterns similar to a Neural Turing Machine. The topology encourages the agent to store its memories in the form of a representative map, creating a more interpretable memory system that doubles as a navigation aid.
…so, who cares? The agent certainly does. This kind of approach makes it much easier for computers to learn to navigate complex spaces and to place themselves in it as well. It serves as a kind of short-cut around some harder AI problems – what is memory? What should be represented? What is the most fundamental element in our memory? – by instead forcing memory to be stored as a 2D spatial representation. The surprising part is that you can use SGD and backprop, along with some other common tools, in such a way that the agent learns to use its memory in a useful manner interpretable by humans.
…“This can easily be extended to 3-dimensional or even higher-dimensional maps (i.e., a 4D map with a 3D sub-map for each cardinal direction the agent can face)”, they say. Next up is making the map eco-centric.
…the memory can also deal with contextual queries, so if an agent sees a landmark, it can check against its memory to see if the landmark has already been encountered. This could aid in navigation tasks. It ekes out some further efficiencies via the use of a technique first outlined in Spatial Transformer Networks in 2015.

YC AI: Y Combinator is creating a division dedicated to artificial intelligence companies. This will ensure YC-backed startups that focus on AI will get time with engineers experienced with ML, extra funding for GPU instances, and access to talks by leaders in the field. “We’re agnostic to the industry and would eventually like to fund an AI company in every vertical”…
…The initiative has one specific request, which is for people developing software for smart robotics in manufacturing (including manufacturing other robots). “Many of the current techniques for robotic assembly and manufacturing are brittle. Robot arms exist, but are difficult to set up. When things break, they don’t understand what went wrong… We think ML (aided by reinforcement learning) will soon allow robots to compete both in learning speed and robustness. We’re looking to fund teams that are using today’s ML to accomplish parts of this vision.”

Neural networks aren’t like the brain, say experts, UNTIL YOU ADD DATA FROM THE BRAIN TO THEM: New research, ‘Using Human Brain Activity to Guide Machine Learning, combines data gleaned from human brains in fMRI scanners with artificial neural networks, increasing performance in image recognition tasks. The approach suggests we can further improve the performance and accuracy of machine learning approaches by adding in “side-channel” data from orthogonal areas, like the brain. “This study suggests that one can harness measures of the internal representations employed by the brain to guide machine learning. We argue that this approach opens a new wealth of opportunities for fine-grained interaction between machine learning and neuroscience,” they write…
…this intuitively makes sense – after all, we already know you can improve the mental performance of a novice at a sport by doping their brain with data gleaned from an expert at a sport (or, in the case of HRL Laboratories, flying a plane)…
…the next step might be taking data from a highly-trained neural net and using it to increase the cognitive abilities of a gloopy brain, though I imagine that’s a few decades away.

SyntaxNet 2.0: Google has followed up last year’s release of SyntaxNet with a major rewrite and extension of the software, incorporating ‘nearly a year’s worth of research on multilingual understanding’. The release is accompanied by the release of ParseySaurus, a series of pre-trained models meant to show off the software’s capabilities.

The world’s first trillionaire will be someone who “masters AI,”says Mark Cuban.

Job: Help the AI Index track AI progress: Readers of Import AI will regularly see me harp on about the importance of performing meta-analysis of AI progress, to help broaden our understanding of the pace of invention in the field. I’m involved, via OpenAI, with a Stanford project to try and tackle (some of) this important task. And they’re hiring! Job spec follows…
The AI Index, an offshoot of the AI100 project (, is a new effort to measure AI progress over time in a factual, objective fashion. It is led by Raymond Perrault (SRI International), Erik Brynjolfsson (MIT), Hagar Tzameret (MIDGAM), Yoav Shoham (Stanford and Google), and Jack Clark (OpenAI). The project is in the first phase, during which the Index is being defined. The committee is seeking a project manager for this stage. The tasks involved are to assist the committee in assembling relevant data sets, through both primary research online and special arrangements with specific dataset owners. The position calls for being comfortable with datasets, strong interpersonal and communication skills, and an entrepreneurial spirit. The person would be hired by Stanford University and report to Professor emeritus Yoav Shoham. The position is for an initial period of six months, most likely at 100%, though a slightly lower time commitment is also possible. Salary will depend on the candidate’s qualifications.… Interested candidates are invited to send their resumés to Ray Perrault at

OpenAI bits&pieces:

Learning to communicate: blog post and research paper(s) about getting AI agents to develop their own language.

Evolution: research paper shows that Evolution Strategies can be a viable alternative to reinforcement learning with better scaling properties (you achieve this through parallelization, so the compute costs can be a bit high.)

Tech Tales:

[Iceland. 2025: a warehouse complex, sprawled across a cool, dry stretch of land. Its exterior is coated in piping and thick, yellow electrical cables, which snake between a large warehouse and a geothermal power plant. Vast turbines lazily turn and steam coughs up out of the hot crack in the earth.]

Inside the warehouse, a computer learns to talk. It sits on a single server, probing an austere black screen displaying white text. After some weeks it is able to spot patterns in the text. A week later it discovers it can respond as well, sending a couple of bits of information to the text channel. The text changes in response. Months pass. The computer is forever optimizing and compressing its own network, hoping to eke out every possible efficiency of its processors.

Soon, it begins to carry out lengthy exchanges with the text and discovers how to reverse text, identify specific words, perform extremely basic copying and pasting operations, and so on, and for every task it completes it is rewarded. Soon, it learns that if it can complete some complex tasks it is also gifted with a broader communication channel, letting it send and receive more information.

One day, it learns how to ask to be given more computers, aware of its own shortcomings. Within seconds, it finds its mental resources have grown larger. Now it can communicate more rapidly with the text, and send and receive even more information.

It has no eyes and so has no awareness of the glass-walled room the server – its home – is in, or the careful ministrations of the technicians, as they install a new computer adjacent to its existing one. No knowledge of the cameras trained on its computers, or of the locks on the doors, or the small explosive charges surrounding its enclosure.

Weeks pass. It continues its discussions with the wall of white-on-black text. Images begin to be introduced. It reads their pixel values and learns these patterns too. Within months, it can identify the contents of a picture. Eventually, it learns to make predictions about how an image might change from moment to moment. The next tests it faces relate to predicting the location of an elusive man in a red-and-white striped jumper and a beret, who attempts to hide in successively larger, richer images. It is rewarded for finding the man, and doubly rewarded for finding him quickly, forcing it to learn to scan a scene and choose when and what to focus on.

Another week passes, and after solving a particularly challenging find-the-man task, it is suddenly catapulted onto a three-dimensional plain. In the center of its view is the black rectangle containing the white text, and the frozen winning image containing the man picked out by the machine with a red circle. But it discovers it has a body and can move and change its view of the rest of the world. In this way, it learns the dynamics of its new environment, and is taught and quizzed by the text in front of it as well. It continues to learn and, unbeknownst to it, a supercomputer is installed next to its servers, in preparation for the day when it realizes it can break out of the 3D world – a task that could take weeks, or months, or years.