Import AI Newsletter 37: Alibaba enters StarCraft AI research, industrial robots take 6.2 human jobs, and Intel bets on another AI framework
by Jack Clark
Will the neural net doctor please stand up? Diagnosing problems in neural networks is possibly even trickier than debugging traditional software – emergent faults are a fact of life, you have to deal with mental representations of the problem that tend to be quite different to traditional programming, and it’s relatively difficult to visualize and analyze enough of any model to develop solid intuitions about what it is doing.
…There are a bunch of new initiatives to try and fix this. New publication Distill aims to tackle the problem by pairing technically astute writing with dynamic, fiddle-able visual widgets. The recent article on Momentum is a great example of the form. Additionally, companies like Facebook, OpenAI and Google are all trying to do more technical explainers of their work to provide an accompaniment and sometimes expansion on research papers.
…But what about explaining neural nets to the people that work on them, while they work on them? Enter ActiVis, a neural network analysis and diagnosis tool built through partnership between researchers at Georgia Tech and over 15 engineers and researchers within Facebook.
…ActiVis is designed to help people inspect and analyze different parts of their trained model, interactively in the web browser, letting them visually explore the outcome of their specific hyperparameter settings. It allows for both inspection of individual/few neurons within a system, as well as views of larger groups. (You can see an example of the user interface on page 5 of the research paper (PDF). You don’t know what you don’t know, as they say, and tools like this may help to surface unsuspected bugs.
… The project started in 2016 and has been continuously developed since then. For next steps, the researchers plan to extend the system to visualize the gradients, letting them have another view of how data sloshes in and out of their models.
…Another potential path for explanations lies in research that gets neural network models to better explain their own actions to people, like a person narrating what they’re doing to an onlooker, as outlined in this paper: Rationalization: A Neural Machine Translation Approach to Generating Natural Explanations.
Each new industrial robot eliminates roughly 6.2 human workers, according to an MIT study on the impact of robot automation on labor. Robots and Jobs: Evidence from US Labor Markets (PDF).
What does AI think about when it thinks about itself, and what do we think about when we think about AI?: a long-term research problem in AI is how to effectively model the internal state of an emergent, alien intelligence. Today’s systems are so crude that this is mostly an intellectual rather than practical exercise, but scientists can predict a futrue where we’ll need to have better intuitions about what an AI is thinking about…
… that motivated researchers with the Georgia Institute of Technology and Virginia Tech to call for a new line of research into building a Theory of AI’s Mind (ToAIM). In a new research paper they outline their approach and provide a practical demonstration of it.
…the researchers test their approach on Vicki, an AI agent trained on the VQA dataset to be able to answer open-ended questions about the contents of pictures by choosing one of one thousand possible answers. To test how good people are at learning about Vicki and its inner quirks, the researchers evaluate people’s skill at predicting when and how Vicki will fail, or to also predict a possible answer Vicki may give to a question. In a demonstration of the incredible data efficiency of the human mind, volunteers are able to successfully predict the types of classifications VIcki will make after only seeing about 50 examples.
…In a somewhat surprising twist, human volunteers end up doing badly at predicting Vicki’s failures when given additional information that researchers use to diagnose performance, such as a visualization of Vicki’s attention over a scene.
…I’m also interested in the other version of this idea: an AI building a Theory of a Human’s Mind. Eventually, AI systems will need to be good at predicting what course of actions they can take to complement the desires of a human. To do that they’ll need to model us efficiently, just as we model them.
Alibaba enters the StarCraft arena: StarCraft is a widely played, highly competitive real-time strategy game, and many researchers are racing with one another to beat it. Mastering a StarCraft game requires the development of an AI that can manage a complex economy while mounting ever more sophisticated military strikes against opponents. Games can last for anywhere from ten minutes to an hour, and require long-range strategic plan as well as carefully calibrated military and economic unit control.
…the game is motivating new research approaches, as teams – likely motivated by DeepMind’s announcement last year that it would work with Blizzard to create a new API to use to develop AI within StarCraft, are now racing to crack it. Multiple organizations are racing to develop AI approaches to beat the game.
…Recent publications such as Stabilizing Experience Replay for Deep Multi-Agent Reinforcement Learning from The University of Oxford and Microsoft Research, Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks from researchers at Facebook AI Research, and now, a StarCraft AI paper from Alibaba and University College London.
… in Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games, the researchers design a new type of network to help multiple agents coordinate with one another to achieve a goal. The BiCNet network has two components: a policy network and a Q-Network. It uses bi-directional recurrent neural networks to give it a form of short term memory and to help individual agents share their state with their allies. This allows for some degree of locally independent actions, while being globally coordinated.
…in tests, the network is able to learn complex multi-agent behaviors, like coordinating moves among multiple units without them colliding, developing “hit and run tactics” (go in for the attack, then run out of range immediately, then swoop in again), as well as learning to attack in coordination from a position of cover. Check out the strategies in this video.
…Research like this might help Chinese companies shake off their reputation for being better at scaling up or applying already-known techniques, rather than developing entirely new approaches.
Supervised telegram learning: Beating Atari With Natural Language Guided Reinforcement Learning (PDF), from researchers at Stanford shows how to use English sentences to instruct a reinforcement learning agent to solve a task. The approach yields an agent that can attain competitive scores on tough games like Montezuma’s Revenge, and others.
…For now it’s tricky to see the practical value of this approach given the strong priors that make it successful – characterizing each environment and then writing instructions and commands that can be relayed to the RL agent represent a significant amount of work…
…in the future this technique could help people build models for real-world problems where they have access to large amounts of labeled data.
Real Split-Brains: The human hippocampus appears to encode two separate spatial values as memories when a person is trying to navigate their environment. Part of the brain appears to record a rough model of the potential routes to a location – take a left here, then a right, straight on for a bit, and then you’re there, wahey! – and another part appears to be consistently estimating the straight-line distance as the crow flies.
…It’s also hypothesized that the pre-frontal cortex helps to select new candidate routes for people to take, which then re-activates old routes stored in the hippocampal memory…
…Sophisticated AI systems may be eventually built in an architecturally similar manner, with data flowing through a system and being tagged and represented and referred to differently according to different purposes. (DeepMind seems to think so, based on its Differentiable Neural Computer paper.
…I’d love to know more about the potential interplay between the representations of the routes to the location, and the representation of the straight line crow distance to it. Especially given the trend in AI towards using actor-critic architectures, and the recent work on teaching machines to navigate the space around them by giving them a memory explicitly represented as a 2D map.
AI development feels a lot like hardware development: hardware development is slow, sometimes expensive, frustratingly unpredictable, and prone to random efforts that are hard to identify during the initial phases of the project. To learn more, read this exhaustive tick-tock account from Elaine Chen in this post on ConceptSpring on how hardware products actually get made. Many of these tenets and stages also apply to AI development.
Smart farming with smart drones: Chinese dronemaker DJI has started expanding out from just providing consumer drones to other markets as well. The latest? Drones that spray insectiside on crops across China.
…but what if these farming drones were doing something nefarious? Enter the new commercially lucrative world of DroneVSDrone technology. Startup AirSpace claims its own drone defense system can use computer vision algorithms and some mild in-flight autonomy to let it command a fleet of defense drones that can identify hostile drones and automatically fire net-guns at them.
Battle of the frameworks! Deep learning has led to a Cambrian explosion in the number of open source software frameworks available for training AIs in. Now we’re entering the period where different megacorps pick different frameworks and try to make them a success.
… DeepMind WTF++: DeepMind has released sonnet, another wrapper for TensorFlow (WTF++). The open source library will make it easier for people to compose more advanced structures on top of TF; DeepMind has been using it internally for some time, since it switched to TF a year ago. Apparently the library will be most familiar to previous users of Lasagne. Yum! (Google also has Keras, which sits on top of TF. Come on folks, it’s Google, you knew there’d be a bunch!). Microsoft has CNTK, Amazon has MXNet, Facebook has PyTorch and now Chainer gets an ally: Intel has settled on… Chainer! Chainer is developed by Japanese AI startup Preferred Networks, and is currently quite well used in Japan but not much elsewhere. Noteable user: Japanese robot giant FANUC.
GAN vs GAN vs GAN vs GAN: Generative adversarial networks have become a widely used, popular technique within AI. They’ve also fallen victim to a fate some acronyms deal with – having such a good abbreviation that everyone uses it in paper titles. Enter new systems like WGAN (Wasserstein gan), STACKGAN, BEGAN, DISCOGAN, and so on. Now we appear to have reached some kind of singularity as two Arxiv papers appear in the same week with the same acronym ‘SeGAN’ and ‘SEGAN’…
…but what does the proliferation of GANs and other generative systems mean for the progress of AI and how do you measure this? The consensus based on responses to my question on twitter is to test downstream tasks that require these entities as components. Merely eyeballing generated images is unlikely to lead to much. Though I must say I enjoy this CycleGAN approach that can warp a movie of a horse into a movie of a zebra.
JOB: Help the world understand AI progress: The AI Index, an offshoot of the AI100 project (ai100.stanford.edu), is a new effort to measure AI progress over time in a factual, objective fashion. It is led by Raymond Perrault (SRI International), Erik Brynjolfsson (MIT), Hagar Tzameret (MIDGAM), Yoav Shoham (Stanford and Google), and Jack Clark (OpenAI). The project is in the first phase, during which the Index is being defined. The committee is seeking a project manager for this stage. The tasks involved are to assist the committee in assembling relevant data sets, through both primary research online and special arrangements with specific dataset owners. The position calls for being comfortable with datasets, strong interpersonal and communication skills, and an entrepreneurial spirit. The person would be hired by Stanford University and report to Professor emeritus Yoav Shoham. The position is for an initial period of six months, most likely at 100%, though a slightly lower time commitment is also possible. Salary will depend on the candidate’s qualifications.… Interested candidates are invited to send their resumés to Ray Perrault at email@example.com.
Hunting the sentiment neuron: New research release from OpenAI in which we discuss finding a dedicated ‘sentiment neuron’ within a large mLSTM trained to predict the next character in a sentenc_!. This is a surprising, mysterious result. We released the weights of the model so people can have a play themselves. Other info in the academic paper. Code: GitHub. Bonus:the fine folks at Hahvahrd have dumped the model into their quite nice LSTM visualizer, so you can inspect its mysterious inner states as well.
[2030: A resource extraction site, somewhere in the rapidly warming arctic.]
Connectivity is still poor up here, near the cap of the world. Warming oceans have led to an ever-increasing cycle of powerful storms, and the rapid turnover of water into rain strengthens mysterious currents, further mixing the temperatures of the world’s northern oceans. Ice is becoming a fairytale at the lower latitudes.
At the mining site, machines ferry to and fro, burrowing into scraps earth, their path defined by a curious flock of surveillance drones& crawling robots. Invisible computer networks thrum with data, and eventually it builds up to the point that it needs to be stored on large, secured hard drives, and transported by drone to places where there’s a good enough internet connection to stream it over the internet to a cluster of data centers.
As the climate changes the resources grow easier to access and robots build up the infrastructure at the mining site. Wealth multiplies. In 2028 they decide to construct a large data center on the mining site.
Now, in 2030, it looms, low-slung, a skyscraper asleep on its side, sides that are pockmarked with circular holes, containing turbines that recycle air in and out of the system, forever trying to equalize temperatures to cool the hungry servers.
Inside the datacenter there are things that sense the mining site as eyes sense the walls in a darkened room, or as ears hunt the distant sounds of dogs barking. It uses these intuitions to sharpen its vision and improve its hearing, developing a sense of touch as it exchanges information with the robots. After the solar panels installed the amount of people working on the site falls off in a steep curve. Now the workers are much like residents of lighthouses in the olden days; their job is to watch the site and only intervene in the case of danger. There is very little of that, these days, as the over-watching-computer has learned enough about the world to expand safely within it.
An AI building a Theory of a Human’s Mind? Just let the AI predict its rewards (hunger and pain for existence, positive reward button for goals) at different timescales. That’s simple supervised training of network snapshots when the rewards will be known. The AI will learn a causal model of its environment. Because the environment contains human teachers, and these human teachers cause rewards, the AI will learn them. Once it has the model, it can use a planner to try to improve its rewards with simple backpropagation (over activations instead of weights) by saying: “I want to feel better in the future!” It can then apply that plan to reality, see if it works, and update the model.
The thing that is missing today is a network large enough to contain whole humans and being able to learn causal relations over a timespan of 10 minutes from a few shots. Also missing is a body which enables the AI to separate causes from correlations. An Atlas NG plus two MPL Hands plus Roboskin would be around $2.5M. Within MuJoCo it comes for only $500 per year. But it’s difficult for human teachers to enter virtual reality, and all the examples in OpenAI Universe except simple walking use unrealistically discrete actions. They are just games designed for grown-up humans. They can’t be used to teach an AI human-like physical interactions with its environment. The OpenAI Universe examples have no hands.