Import AI Issue 47: Facebook’s AI agents learn to lie, OpenAI and DeepMind use humans to train safe AI, and what TensorFlow’s new release says about future AI development

by Jack Clark

Facebook research: Misdirection for NLP fun and profit:
New research from Facebook shows how to teach two opposing agents to bargain with one another — and along the way they learn to deceive each other as well.
…”For the first time, we show it is possible to train end-to-end models for negotiation, which must learn both linguistic and reasoning skills with no annotated dialogue states. We also introduce dialogue rollouts, in which the model plans ahead by simulating possible complete continuations of the conversation, and find that this technique dramatically improves performance,” they write.

Images of the soon-to-be normal:
This photograph of a little food delivery robot blocking traffic is a herald of something that will likely become much more commonplace.

Predicting Uber rides with.. Wind speed, rider data, driver data, precipitation data, temperature, and more…
Uber has given details on the ways it is using recurrent neural networks (RNNs) to help it better predict demand for its services (and presumably cut its operating costs along the way).
…The company trained a model using five years of data from numerous US cities. The resulting RNN  has good predictive abilities when tested across a corpus of data consisting of trips taken across multiple US cities over the course of seven days before, during, and after major holidays like Christmas Day and New Year’s Day. (Though there are a couple of real-world spikes that seem so drastic its predictions low-ball them, suggesting it hasn’t seen enough of those incidents to recognize their warning indicators.)
…Uber’s new system is significantly better at dealing with spiky holiday days like Christmas Day and New Year, and it slightly improves accuracy on other days such as MLK Day and Independence Day.
…Components used: TensorFlow, Keras. Lots of computers.

Job alert!
The Berkman Klein Center for Internet & Society at Harvard University is seeking a project coordinator to help it with its work on AI, autonomous systems, and related technologies. Apply here. (Also, let’s appreciate the URL for this job and how weird it could have seemed to someone a hundred years ago – cyber.harvard.edu/…./AIjob )

AI video whiz moves from SF, USA, to Amsterdam, Netherlands. But why…?
…Siraj Ravel has moved from the US to Amsterdam for a change of scene. Now that he’s settled in he has started a new video course (available on YouTube) called The Math of Intelligence. Check it out.
…I asked Siraj what his impressions were of the AI community in Amsterdam and he said this (emphasis mine): “The AI community is absolutely thriving in Amsterdam, specifically the research portion. I’ve met more researchers at Meetups here than I have for years in SF. I also briefly visited Berlin and met some amazing data scientists there. The bigger trend is that governments in the EU (France, Netherlands, Germany) are heavily investing in tech R&D and the brightest minds are taking notice. I am the son of immigrants to the USA, but I am not afraid to myself immigrate if necessary. Progress can’t wait, and the Netherlands understands this.” Sounds nice, and the pancakes are great as well.

Googlers create a single multi-sensory network: One Model To Rule Them All.
Welcome to the era of giant frankenAIs:
… Researchers from Google have figured out how to bake knowledge about a broad spectrum of domains into a single neural network and then train it in an end-to-end way.
…”In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains.,” they write. “The key to success comes from designing a multi-modal architecture in which as many parameters as possible are shared and from using computational blocks from different domains together. We believe that this treads a path towards interesting future work on more general deep learning architectures”.
..Prediction: As this kind of research becomes viable we’ll see people gather huge datasets and train single models together with a broad range of discriminative abilities. The next shoe to drop will be innovations in fundamental neural network building block components to create finer-grained classification and inference abilities in these neural network models and encourage more cases of transfer learning.
Notable: Others are thinking along similar lines – last week’s Import AI covered a new MIT research paper that blends sound and vision and text into a single meta-network. 

Pay attention to Google’s new attention paper:
Google researchers have attained state-of-the-art results in areas like English-to-German translation with a technique that is claimed to be significantly simpler than its forebears.
…The paper, Attention is All You Need, proposes: “the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output.”
…In other words, the researchers have figured out a way to reduce the number of discrete ingredients that go into the network, swapping out typical recurrent and convolutional mapping layers with ones that use intention instead.
…”We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video. Making generation less sequential is another research goal of ours.”
…It seems that research into things like this will create further generic neural network building blocks that can be plugged into larger, composite models – just like the above ‘One Model to Rule Them All’ approach. Watch for collisions!

Long-brewing research from Vicarious: Learning correspondences via (for now) hand-tuned feature extractors:
…One puzzle reinforcement learning researchers struggle with is how algorithms end up evolving to over-fit their environment. What that means in practice is if you suddenly were to, say, change the geometry of the Go board AlphaGo was trained on, or alter the placement of enemies and obstacles in Atari games, the AI might fail to generalize.
…Now, research from Vicarious – An AI startup with backing from people like Jeff Bezos, Mark Zuckerberg, Founders Fund, ABB, and others – proposes a way to ameliorate this flaw. This marks the second major paper from Vicarious this year.
…Their approach relies on what they call Schema Networks, which lets their AI learn the underlying dynamics of the environment it is exposed to. This means, for instance, that you can alter the width of a paddle in Atari Game breakout, or change the block positions, and the trained algorithm can generalize quickly to the new state, preserving its underlying understanding of the dynamics of the world built up during training. Traditional RL algorithms tend to struggle with this as they’ve learned a predictive model of the world as it is and struggle with learning more abstract links.
…There’s a small catch with Vicarious’s approach – the researchers had to do the object segmentation and identification themselves then feed that to the AI. In reality, one of the greatest challenges computer vision researchers face is accurately mapping and segmenting non-stationary images (and its even harder as they get deployed in the chaotic real world, as they need to link parts of a flat 2D image to messy 3D objects. I’m keen to see what happens when this algorithm can do the feature isolation itself.
Noteable: Meanwhile, DeepMind have published Relational Networks (claiming SOTA and superhuman performance) and Visual Interaction Networks, two philosophically similar research papers that hew closer to traditional deep learning approaches. Just as you and I use abstract logic to let us reason about the world, it seems likely AI will need the same capabilities.

Just what the heck does a career in AI policy look like?
…Twitter’s AI paper tsar Miles Brundage has published an exhaustive document outlining a Guide to Working in AI Policy and Strategy up on 80,000 hours. (And watch out for the nod to Import AI – thanks Miles. I’ll do my best!)

(Mildly) Controversial Microsoft/Maluuba research paper: Using rewards is easy, finding them is hard:
…A new research paper from Microsoft’s recent Canadian AI acquisition Maluuba, Hybrid Reward Architecture for Reinforcement Learning, shows how to definitively beat Ms. PacMan (clocking over a million points.). Ms PacMan, along with Montezuma’s Revenge, is one of the games that people have found consistently quite challenging, so it’s a notable result – though not as encouraging as on first look, when you work out what is required for the process to work.
..When you go and analyze its Hybrid Reward Architecture- you see that the approach is distributed, with Microsoft splitting up the task into many discrete sub-tasks which numerous reinforcement learning agents try to solve, while feeding their opinions up into a single meta-agent that helps to take decisions. Though it scores highy, the approach involves a lot of human specification, including hand-labeling different reward penalties and rewards for different entities in the game. As with the Vicarious paper, the technique is interesting, but it feels like it’s missing a key component – unsupervised extraction of entities and reward levels/penalties.

What TensorFlow v1.2 says about devices versus clouds:
Google has released version 1.2 of TensorFlow. There’s a ton of fixes and tweaks (eg, for RNN functionality), but buried in the release details is the note that Google will stop directly supporting GPUs on Mac systems (though will continue to accept patches from the community). There are likely a couple of reasons for this: one, the lack of much of an NNVIDIA ecosystem around macs (c. f Apple’s new external GPU for the Mac Pro runs AMD cards, which are yet to develop as much of a deep learning ecosystem.)
…Another way of looking at this is that the cloud wins AI. For now at least AI benefits from parallelization and the usage of large numbers of CPUs and GPUs together, with most developers either using a laptop paired with an external pool of cloud resources, and/or running their own Linux deep learning rig in a desktop tower.
Details: Tensorflow v1.2 on GitHub.

Snapchat’s first research paper: mobile-friendly neural nets with full-fat brains.
Researchers with Snap Inc. and the University of Iowa have published SEPNETs: Small and Effective Pattern Networks.
…tl;dr: a way to shrink trained models then recover them to restore some accuracy.
…It tackles one of the problems AI’s success has led to: the creation of increasingly large, deep models that have tremendous performance but take up a lot of space when deployed. Ultimately, the ideal scenario for AI development is to be able to train a single gigantic model on a nearby football-field filled with computers, then be able to have a little slider to shrink the trained model for deployment on various end-user devices, whether phones, or computers, or VR headsets, or something else. How do you do that? One idea is to try to smartly compress these trained models, either by lopping away at chunks of the neural network, or by scaling them down in a more disciplined way. Both methods see you tradeoff overall accuracy for speed, so fixing this requires new research. The Snapchat paper represents one contribution:
The details: First they use a technique called pattern binarization to shrink a pre-trained network (for instance, a tens-of-millions-of-parameters VGG or Inception model) into a smaller version of itself, at the cost of it losing some discriminative capabilities. They propose to fix this with a new neural network component they call a Pattern Residual Block. This component can sometimes help offset the changes wrought on the numbers its dealing with via the binarization process.They then use Group-Wise Convolution to further winow down the various components of the network. Shrinking it.
…Results:.Google MobileNet: 1.3 million params, 5.2mb bytes, accuracy 0.637
…Results:SEP-NET-R(Small) 1.3 million params, 5.2mb bytes, 0.658

Free pre-trained models, get your pre-trained mobile-friendly models right here!
…Google unfurls MobileNets to catch intelligence on the phone:
In possibly related news Google has released MobileNets, a collection of “mobile-first computer vision models for TensorFlow”.
…”MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. ”
…The available models vary from bite-size ones of 0.47 million parameters to larger ones of 4.24 million, with image accuracies ranging from 66.2 to 89.9% for the larger models.
Github repo here.

Speeding up open access reviews:  There’s a suggestion that Open Review – a platform that makes feedback and evaluation of papers public – is considering layering some aspect of its system over Arxiv, letting us not only publish preprints rapidly, but potentially review them as well.
…Note: None of this is meant to say that double blind reviewing is bad – it’s good, especially for significant papers with particularly controversial claims. But I think due to the breakneck speed at which AI moves at it’s necessary to try and speed things up if possible. This suggests one way to more rapidly gather better feedback on new ideas.
…How it might be used: Last week Hochreiter & co published the SELU paper. It’s gathered a lot of interest with numerous people running their own tests, chiming in with comments, or going through its 90+ page appendix. It’d be very convenient if there was a layer that let us put all this stuff in the same place.

Dollars for Numpy: Numpy has been given a little over half a million dollars from the Gordon Moore foundation to fund improvements to the Python scientific computing library. Numpy is absolutely crucial to neural networks within Python.

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to david@amplifypartners.com.

Tech Tales:
[2035: The North East Canadian wilderness.]

It’s wet. There’s moss. The air has that peculiar clarity that comes from being turned by wind and freshened by water and replenished by the quiet green and brown things of nature. You breathe deeply as you walk the rarely used trail. Your feet compress the nested pine needles beneath you, sending up gusts of scented air. In the distance, you hear the sound of roiling running water and, beneath it, bells tolling.

You keep going. The sound of the water and of the bells gets louder. The bells echo out little sonorous rhythms that seem intertwined with the sounds of gushing river water. One of the bells is off – its timing discordant, cutting against the others. You begin to crest a small hill, and as your head clears it the sound rushes at you. The bells clang and the water thrums – their interplay not exactly abrasive – for the off bell is one of many – but somehow more mournful than you recall the sound being before.

The bells are housed in a small concrete tower, about 5 feet high that sits by the riverbank at a point where there’s a kink in the river. It has three walls, with the fourth left open to the elements, facing the river, broadcasting the sounds of the bells. You run your hands over the cold, mottled, lichen-stippled exterior as you approach its entrance. Close your eyes. Open them when you’re in front of the shrine. You study the 12 bells of the dead, able to make out the inscribed names of the gone, despite the movement of the bells. Now you just need to diagnoze why one of the bells seems to have fallen out of alignment with the others.

As you sit, studying the wiring in the shrine and watching the bells, it’s impossible not to think of your friends and how they are now commemorated. You all work for the government on geographic survey. As the climate has been changing your teams have been pushing further north for more of the year, trading safety for exploration (and the possibility of data valuable to resource extraction companies). You were at home, laid up with a broken leg, when the team of 12 went out. They were doing a routine mapping hike, away from camp, when the storm came in – it strengthened far more rapidly than their computer models anticipated and, due to a set of delicate occurrences, it brought snow and ice with it. Temperatures plunged. Snow-cladded everything. Rain was either flecked with ice or snow or a contributor to a sheet of frozen fog that lay over the land. Your colleagues died due to about 50 things going wrong in a very precise sequence. These things, as hysterical as it seems, happen.

The bells are set to dance to the rhythm of the river. Their loops are determined by observations from cameras atop the shrine, pointed at the writhing river. This visual information is then fed into an algorithm that is forever trying to find a pattern in the infinite noise of the river. After an hour you have the sense to give the cameras more than a cursory look and you discover that a spider has made a small nest near the sensor bulge, and one thick strand of webs is slung in front of one of the camera lenses. This, you figure, has injected a kind of long-term stability into part of the feed of data that the algorithm sees, swapping a patch of the frothing slate and white and dark blue and brown of the river-water with something altogether more stagnant.  Fixing it would be as simple as putting on a glove and carefully removing the spiderweb, then polishing the lens. You hold your hand up in front of the web to get a sense of how it would be to remove it and as your hand passes in front of the cameras the bells change their rhythm, some stuttering to a stop and others speeding up, driven to a frenzy by the changed vision. You put your hand down and the bells go back to their tolling, with the one that seems to be affected by the spiderweb still acting out of order.

When you file your report you say reports of odd sounds appear to be erroneous and you discovered no such indicators during your visit. You take comfort in knowing that the bells will continue to ring, driven increasingly by the way the world grows and breaks around them, and less by the prescribed chaos of the river.

Technologies that inspired this story: Attention, generative models, joint neural networks, long-short term memory

OpenAI Bits&Pieces:

OpenAI and DeepMind train reward functions via human feedback: A new research collaboration between DeepMind and OpenAI on AI safety sees us train a AI agents to perform behaviors that they think humans will approve of. This has implications for AI safety and has promising sample efficiency as well.

OpenAI audiopodcast about reinforcement learning by Sam Charrington with OpenAI/UC Berkeley robot chap Pieter Abbeel.