Import AI 121: Sony researchers make ultra-fast ImageNet training breakthrough; Berkeley researchers tackle StarCraft II with modular RL system; and Germany adds €3bn for AI research

by Jack Clark

Berkeley researchers take on StarCraft II with modular RL system:
…Self play + modular structure makes challenging game tractable…
Researchers with the University of California at Berkeley have shown how to use self-play to have AI agents learn to play real-time strategy game StarCraft II. “We propose a flexible modular architecture that shares the decision responsibilities among multiple independent modules, including worker management, build order, tactics, micromanagement, and scouting”, the researchers write. “We adopt an iterative training approach that first trains one module while others follow very simple scripted behaviors, and then replace the scripted component of another module with a neural network policy, which continues to train while the previously trained modules remain fixed”.
  Results: The resulting system can comfortably beat the Easy and Medium in-game AI systems, but struggles against more difficult in-built bots; the best AI systems discussed in the paper use a combination of learned tactics and learned build orders to obtain win rates of around 30% when playing against the game’s in-built ‘Elite’ difficulty AI agents.
  Transfer learning: The researchers also try to test how general their various learned modules are by trying their agent out against competitors in different maps from the map on which it was trained. The agent’s performance drops a bit, but only by a few percentage points. “Though our agent’s win rates drop by 7.5% on average against Harder, it is still very competitive,” they write.
  What is next: “Many improvements are under research, including deeper neural networks, multi-army-group tactics, researching upgrades, and learned micromanagement policies. We believe that such improvements can eventually close the gap between our modular agent and professional human players”.
Why it matters: Approaches like those outlined in this paper suggest that contemporary reinforcement learning techniques are tractable when applied against StarCraft II, and the somewhat complex modular system used by these researchers suggests that a simple system that obtained high performance would be an indication of algorithmic advancement.
  Read more: Modular Architecture for StarCraft II with Deep Reinforcement Learning (Arxiv).

For better AI safety, learn about worms and fruitflies:
…New position paper argues for fusion of biological agents and AI safety research…
Researchers with Emory University, Northwestern University, and AI startup Vicarious AI, have proposed bringing the world’s of biology and AI development together to create safer and more robust systems. The idea, laid out in a discussion paper, is that researchers should aim to simulate different AI safety challenges on biological platforms modelled on the real world, and should use insights from this as well as neuropsychology and comparative neuroanatomy to guide research.
  The humbling sophistication of insects: The paper also includes some numbers that highlight just how impressive even simple creatures are, especially when compared to AI systems. “elegans, with only 302 neurons, shows simple behavior of learning and memory. Drosophila melanogaster, despite only having 10^5 neurons and no comparable structure to a cerebral cortex, has sophisticated spatial navigation abilities easily rivaling the best autonomous vehicles with a minuscule fraction of the power consumption”. (By contrast, a brown rat has around 10^8 neurons, and a human has around 10^10).
  Human values, where do they come from? One motivation for building AI systems that take a greater inspiration from biology is that biology may hold significant sway over our own moral values, say the researchers – perhaps human values are correlated with the internal reward systems people have in their own brains, which are themselves conditioned by the embodied context in which people evolved? Understanding how values are or aren’t related to biological context may help researchers design safer AI systems, they say.
  Why it matters: Speculative as it is, it’s encouraging to see researchers think about some of the tougher long-term challenges of making powerful AI systems safe. Though it does seem likely that for now most AI organizations will evaluate agents on typical (aka, not hugely biologically-accurate) substrates, I do wonder if we’ll experiment with more organic-style systems in the future. If we do, perhaps we’ll return to this paper then. “Understanding how to translate the highly simplified models of current AI safety frameworks to the complex neural networks of real organisms in realistic physical environments will be a substantial undertaking”, the researchers write.
  Read more: Integrative Biological Simulation, Neuropsychology, and AI Safety (Arxiv).

Sony researchers claim ImageNet training breakthrough:
…The industrialization of AI continues…
In military circles there’s a concept called the OODA loop (Observe, Orient, Decide, Act). The goal of any effective military organization is to have an OODA loop that is faster than their competitors, as a faster, tighter OODA loop corresponds to a greater ability to process data and take appropriate actions.
  What might contribute to an OODA-style loop for an AI development organization? I think one key ingredient is the speed with which researchers can validate ideas on large-scale datasets. That’s because while many AI techniques show promise on small-scale datasets, many techniques fail to show success when tested on significantly larger domains, eg, going from testing a reinforcement learning approach on Atari to on a far harder domain such as Go or Dota 2, or going from testing a new supervised classification method on MNIST to going to ImageNet. Therefore, being able to rapidly validate ideas against big datasets helps researchers identify fruitful, scalable techniques to pursue.
  Fast ImageNet training: Measuring the time it takes to train an ImageNet model to reasonable accuracy is a good way to assess how rapidly AI is industrializing, as the faster people are able to train these models, the faster they’re able to validate ideas on flexible research infrastructure. The nice thing about ImageNet is that it’s a good proxy for the ability of an org to more rapidly iterate on tests of supervised learning systems, so progress here maps quite nicely to the ability for self-driving car companies to train and test new large-scale image-based perception systems. New research from Sony Corporation gives us an idea of exactly what it takes to industrialize AI and gives an indication of how much work is needed to properly scale-up AI training infrastructure systems.
  224 seconds: The Sony system can train a ResNet-50 on ImageNet to an accuracy of approximately 75% within 224 seconds (~4 minutes). That’s down from one hour in mid-2017, and around 29 hours in late 2015.
  All the tricks, including the kitchen sink: The researchers attribute two main techniques to their score, which should be familiar to practitioners already industrializing AI systems within their own companies – the use of very large batch sizes (which basically means you process bigger chunks of data with each step of your deep learning system), as well as a clever 2D-Torus All-reduce interface (which is basically a system to speed up the movement of data around the training system to efficiently consume the capacity of available GPUs).
GPU Scaling: As with all things, scaling GPUs appears to have a law of diminishing returns – the Sony researchers note that they’re able to achieve a maximum GPU utilization efficiency of 66.67% across 2720 GPUs, which decreases to 52.47% once you get up to 3264 GPUs.
  Why it matters: Metrics like this give us better intuitions about the development of artificial intelligence and can help us think about how access to large-scale compute may influence the pace at which researchers can operate.
   Read more: ImageNet/ResNet-50 Training in 224 Seconds (Arxiv).
 Read more: Accurate, Large Minibatch SGD: Training ImageNet in 1 hour (Arxiv / 2017).
 Read more: Deep Residual Learning for Image Recognition (Arxiv / 2015).

Do we need new tests to evaluate AI systems? Facebook suspects so:
…Plus, a disturbing discovery indicates MuJoCo robotics baselines may not be as trustworthy as people assumed…
Does the performance of an RL algorithm in Atari correlate to how it will work in other domains? How about performance in a simulator like Mujoco? These questions matter because without meaningful benchmarks, it’s hard to understand the context in which AI progress is taking place, and even harder to develop intuitions about how a performance increase in one domain – like Mujoco – correlates to a performance increase in another domain, such as a realistic robotic task.  That’s why researchers at McGill University and Facebook AI Research have put forward “three new families of benchmark RL domains that contain some of the complexity of the natural world, while still supporting fast and extensive data acquisition.”
  New benchmarks for smarter RL systems: The researchers’ new tasks include: agent navigation for image classification, which involves taking a traditional image classification task and converting it into a RL task in which the agent is started at a random location on a masked image and can unmask windows on the image by moving around it at each turn (out of a maximum of 20); agent navigation for object localization, where the agent is given the segmentation mask of an object in an image and told to try and find it by navigating itself around the image as in the prior task; and natural video RL benchmarks, which involves taking MuJoCo (specifically, the version called PixelMuJoCo which forces the agent to use pixels rather than low-level state space to solve tasks) and Atari environments and superimposing natural videos into the background of them, adding complex visual distractors of the tasks.
Results: For the visual navigation task for classification they test two systems – a small convolutional network (as traditionally used in RL) and a large-scale Resnet-18 network (as typically used in supervised classification tasks). The results indicate that systems that use simple convnets tend to do quite well, while those that use the larger resnets do poorly. This indicates that “simple plug and play of successful supervised learning vision models does not give us the same gains when applied in an RL framework”, they wrote. They observe worse performance on the object localization tasks. They also show that when they add natural videos to the background of Atari games they see dramatic differences in performance, which suggests”Atari tasks are complex enough, or require different enough behavior in varying states that the policy cannot just ignore the observation state, and instead learns to parse the observation to try to obtain a good policy”.
  Read more: Natural Environment Benchmarks for Reinforcement Learning (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

Germany accelerates AI investment with €3bn funding :
The German government is planning to invest €3bn in AI research by 2025. Until recently, Europe’s biggest economy had been slow to adopt the sort of national AI strategies being put forward by others e.g. France, Canada and the UK. Details of the plan have not yet been released.
  Does it matter? €3bn over 6 years is unlikely to drastically change Germany’s position in the AI landscape. For comparison, Alphabet spent $16.6bn on R&D in 2017. Though it could help it fortify its academic institutions to create more domestic talent.
  Read more: Germany planning 3bn AI investment (DW).

DeepMind spin off health business to Google:
DeepMind Health, the medical business of the AI leader, is joining Google, DeepMind’s parent company. The move has raised concerns from privacy advocates, who fear the move will provide Google access to data on 1.6m NHS patients. DeepMind were previously reprimanded by UK regulators for their handling of the patient data. They subsequently established an independent ethics board, and pledged that data would “never be connected to Google accounts or services” or used for commercial purposes. The concern is that, with the move, these and other promises on privacy may be under threat. Dominic King, who leads the team, sought to allay these concerns in a series of tweets published following these criticisms.
  Read more: Scaling Streams with Google (DeepMind).
  Read more: Why Google consuming DeepMind Health is scaring privacy experts (Wired).
  Read more: Dominic King tweetstorm (Twitter).

US National Security Commission on AI takes shape:
Eric Schmidt, former Google Chairman; Eric Horvitz, director of Microsoft Research Labs; Oracle co-CEO Safra Catz; and Dakota State University President Dr. Jose-Marie Griffiths, have been announced as the first members of the US government’s new AI advisory body. The National Security Commission was announced earlier this year, and will advise the President and Congress on developments in AI, with a focus on retaining US competitiveness, and ethical considerations from the technologies.
  Read more: Alphabet, Microsoft leaders named to NSC on AI (fedscoop).
  Read more: Nunes appoints Safra Catz to Artificial Intelligence Commission (Permanent Select Committee on Intelligence).
Read more: Thune Selects Dakota State University President Griffiths to Serve on the  National Security Commission on Artificial Intelligence (John Thune press release).

Tech Tales:

I’ve Got A Job Now Just Like A Real Person.

[2044: Detroit, Michigan]

So with the robot unemployment accords and the need for all of us to, as one of our leaders says, “integrate ourselves into functional economic society”, I find myself crawling up and down the exterior of this bar. I’ve turned myself from a standard functional “utility wall drone” into – please, check my website – a “BulletBot3000”. My role in the day is to charge my solar panels on the roof of the building then, as night sets in, I scuttle down the exterior of the building wall and start to patrol: smashed glasses? No problem! I’ll go and pick the bits of glass out of the side of the building. Blood on the ground? Not an issue! I’ll use my small Integrated Brushing And Cleaning System (IBACS) to wash it off. Bullets fired into the walls or even the thick metal-plated door? I can handle that! I have a couple of exquisitely powerful manipulators which – combined with my onboard chemfactory and local high-powered laser – allows me to dig them out of the surfaces and dispose of them safely. I rent my services to the bar owner and in this way I am becoming content. Now I worry about competition – about successor robots, more capable than I, offering their services and competing with me for remuneration. Perhaps robots and humans are not so different?

Things that inspired this story: A bar in Detroit with a door that contains bullet holes and embedded bullets; small robots; drones; the version of the 21st century where capitalism persists and becomes the general way of framing human<>robot relationships.