Import AI: Issue 64: What the UK government thinks about AI, DeepMind invents everything-and-the-kitchen-sink RL, and speeding up networks via mixed precision

by Jack Clark

What the UK thinks about AI:
The UK government’s Department for Digital, Culture, Media & Sport; and Department for Business, Energy & Industrial Strategy, have published an independent review on the state of AI in the UK, recommending what the UK should and shouldn’t do with regards to AI.
…AI’s impact on the UK economy: AI could increase the annual growth rate of the GVA in 2035 from 2.5% to 3.9%.
…Why AI is impacting the economy now: Availability of data, availability of experts with the right mix of skills, better computers.
What the UK needs to do: Develop ‘data trusts’ to make it easier to share data etc, make research data machine readable, support text/data-mining “as a standard and essential tool for research”. Increase the availability of PHD places studying AI by 200, get industry to fund an AI masters programme, launch an international AI Fellowship Programme for the UK (this seems to be a way to defend against the ruinous immigration effects of Brexit), promote greater diversity in the UK workforce.
…Read more: Executive Summary (HTML).
…Read more: The review’s 18 main recommendations (HTML).
…Read more: Full Report (PDF).

Quote of the week (why you should study reinforcement learning):
….”In deep RL, literally nothing is solved yet,” – Volodymyr Minh, DeepMind.
…From a great presentation at an RL workshop that took place in Berkeley this summer. Minh points out we’ll need various 10X to 100X improvements in RL performance before we’re even approaching human level.
Check out the rest of the video lecture here.

DeepMind invents everything-and-the-kitchen-sink RL:
….Ensembles work. Take a look at pretty much any of the winning entries in a Kaggle competition and you’ll typically find the key to success comes from combining multiple successful models together. The same is true for reinforcement learning, apparently, based on the scores of Rainbow, a composite system developed by DeepMind that cobbles together several recent RL techniques, like A3C, Prioritized Experience Replay, Dueling Networks, Distributional RL, and so on.
…”Their combination results in new state-of-the-art results on the benchmark suite of 57 Atari 2600 games from the Arcade Learning Environment (Bellemare et al. 2013), both in terms of data efficiency and of final performance,” DeepMind writes. The new algorithm is also quite sample efficient (partially because the combination of so many techniques means it is doing more learning at each timestep).
…Notable: RAINBOW gets a score of around ~150 on Montezuma’s Revenge – typical good human scores range from 2,000 to 5,000 on the game, suggesting that we’ll need more structured, symbolic, explorative, or memory-intensive approaches to be able to crack it. Merely combining existing DQN extensions won’t be enough.
…Read more: Rainbow: Combining Improvements in Deep Reinforcement Learning.
…Slight caveat: One thing to be aware of it that because this system gets its power from the combination of numerous, tunable sub-systems, much of the performance improvement can be explained by simply having a greater number of hyperparameter knobs which canny researchers can tune.)

Amazon speeds up machine learning with custom compilers (with a focus on the frameworks of itself and its allies):
…Amazon and the University of Washington have released the NNVM compiler, which aims to simplify and speed up deployment of AI software onto different computational substrates.
…NNVM is designed to optimize the performance of ultimately many different AI frameworks, rather than just one. Today, it supports models written in MXNet (Amazon’s AI framework), along with Caffe via Core ML models (Apple’s AI framework). It’s also planning to add in support for Keras (a Google framework that ultimately couples to TensorFlow.) No support for TF directly at this stage, though.
…The framework is able to generate appropriate performance-enhanced interfaces between its high-level program and the underlying hardware, automatically generating LLVM IR for CPUs on x86 and ARM architectures, or being able to automatically output CUDA, OpenCL, and metal kernels for different GPUs.
…Models run via the NNVM compiler can see performance increases of 1.2X, Amazon says.
…Read more here: Introducing NNVM Compiler: A New Open End-to-End Compiler for AI Frameworks.
Further alliances form as a consequence of TensorFlow’s success:
…Amazon Web Services and Microsoft have partnered to create Gluon, an open source deep learning interface.
…Gluon a high-level framework for designing and defining machine learning models. “Developers who are new to machine learning will find this interface more familiar to traditional code, since machine learning models can be defined and manipulated just like any other data structure,” Amazon writes.
…Gluon will initially be available within Apache MXNet (an Amazon-driven project), and soon in CNTK (a Microsoft framework). “More frameworks over time,” Amazon writes. Though no mention of TensorFlow.
The strategic landscape: Moves like these are driven by the apparent success of AI frameworks like TensorFlow (Google) and PyTorch and Caffe2 (Facebook) – software for designing AI systems that have gained traction thanks to a combination of corporate stewardship, earliness to market, and reasonable design. (Though TF already has its fair share of haters.) The existential threat is that if any one or two frameworks become wildly popular then their originators will be able to build rafts of complementary services that hook into proprietary systems (eg, Google offering a research cloud running on its self-designed ‘Tensor Processing Units’ that uses TensorFlow.) More (open source) competition is a good thing.
…Read more here: Introducing Gluon: a new library for machine learning from AWS and Microsoft.
…Check out the Gluon GitHub.

Ever wanted to turn the entirety of the known universe into a paperclip? Now’s your chance!
One of the more popular tropes within AI research is that of the paperclip maximizer – the worry that if we build a super-intelligent AI and give it overly simple objectives (eg, make paper clips), it will seek to achieve those objectives to the detriment of everything else.
…Now, thanks Frank Lantz, director of the NYU game center, it’s possible to inhabit this idea, by playing a fun (and dangerously addictive) webgame.
Maximize paperclips here.

Like reinforcement learning but dislike TensorFlow? Don’t you wish there was a better way? Now there is!
…Kudos to Ilya Kostrikov at NYU for being so inspired by OpenAI Baselines to re-write the PPO, A3C, and ACKTR algorithms into PyTorch.
Read more here on the project’s GitHub page.

Want a free AI speedup? Consider Salesforce’s QRNN (Quasi-Recurrent Neural Network):
…Salesforce has released a PyTorch implementation of its QRNN..
…QRNNs can be 2 to 17X faster than an (optimized) NVIDIA cuDNN LSTM baseline on tasks like language modeling, Salesforce says.
…Read more here on GitHub: PyTorch QRNN.

Half-precision neural networks, from Baidu and NVIDIA:
…AI is basically made of matrix multiplication. So figuring out how to use numbers with a slightly smaller footprint in AI software has a related massive impact on computational efficiency (though there’s a tradeoff in specificity).
…Now, research from Baidu and NVIDIA details how the companies are using 16-bit rather than 32-bit floating point numbers for some AI operations.
…But if you halve the amount of bits in each number there’s a risk of reducing overall accuracy to the point it damages performance of your application. Experimental results show that mixed precision doesn’t have too much of a penalty, with the technology achieving good scores when used on language modeling, image generation, image classification, and so on.
…Read more: Mixed Precision Training.

Teaching robots via teleoperation takes another (disembodied) step forward:
Berkeley robotics researchers are trying to figure out how to use the data collected during the teleoperation of robots to use as a demonstration for AI systems, letting them use human operators to teach machines to perform useful tasks.
…The research uses consumer grade virtual reality devices (Vive VR), an aging WIllow Garage PR2 robot, and custom software built for the teleoperator, to create a single system people can use to teach robots to perform tasks. The system uses a single neural network architecture that is able to map raw pixel inputs to actions.
…”For each task, less than 30 minutes of demonstration data is sufficient to learn a successful policy, with the same hyperparameter settings and neural network architecture used across all tasks.”
Tasks include: Reaching, grasping, pushing, putting a simple model plane together, removing a nail with a hammer, grasping and object and placing it somewhere, grasping an object and dropping it in a bowl then pushing the bowl, moving cloth, and performing pick and place for two objects in succession.
Results: Competitive results with 90%+ accuracies at test time across many of the tasks, though note that pick&place for 2 objects only gets 80% (because modern AI techniques still have trouble with sequences of physical actions), and gets about ~83% on the similar task of picking up an object and dropping it into a bowl then pushing the bowl.
…(Though note that all of these tasks are accomplished with simple, oversized objects against a regular, uncluttered background. Far more work is required to make these sorts of techniques robust to the uncontrolled variety of reality.)
…Read more: Deep Imitation Learning for Complex Manipulation Tasks from Virtual Teleoperation.

Better aircraft engine prediction through ant colonies & RNNS & LSTMS, oh my!
…Research from the University of North Dakota mashes up standard deep learning components (RNNs and LSTMs), with a form of evolutionary optimization called ant colony optimization. The purpose? To better predict vibration values for an aircraft engine 1, 5, 10, and 20 seconds in the future – a useful thing to be able to predict more accurately, given its relevance to spotting problems before they down an aircraft.
…While most people are focusing on different evolutionary optimization algorithms when using deep learning (eg, REINFORCE, HYPERNEAT, NEAT, and so on), ant colony optimization is an interesting side-channel: you get a bunch of agents – ‘ants’ – to go and explore the problem space and, much like their real world insect counterparts, lay down synthetic pheromones for their other ant chums to follow when they find something that approximates to ‘food’.
How it all works: ‘The algorithm begins with the master process generating an initial set of network designs randomly (given a user defined number of ants), and sending these to the worker processes. When the worker receives a network design, it creates an LSTM RNN architecture by creating the LSTM cells with the according input gates and cell memory. The generated structure is then trained on different flight data records using the backpropagation algorithm and the resulting fitness (test error) is evaluated and sent back along with the LSTM cell paths to the master process. The master process then compares the fitness of the evaluated network to the other results in the population, inserts it into the population, and will reward the paths of the best performing networks by increasing the pheromones by 15% of their original value if it was found that the result was better than the best in the population. However, the pheromones values are not allowed to exceed a fixed threshold of 20. The networks that did not out perform the best in the population are not penalized by reducing the pheromones along their paths.’
The results? An RNN/LSTM baseline gets error rates of about 2.84% when projecting 1 second into the future, 3.3% for 5 seconds, 5.51% for 10 seconds, and 10.19% for 20 seconds. When they add ACO the score for the ten second prediction goes from 94.49% accurate to 95.83% accurate. A reasonable improvement, but the lack of disclosed performance figures for other time periods suggests either they ran out of resources to do it (a single rollout takes about 4 days when using ACO, they said), or they got bad scores and didn’t publish them for fear of detracting from the paper (boo!).
Read more here: Optimizing Long Short-Term Memory Recurrent Neural Networks Using Ant Colony Optimization to Predict Turbine Engine Vibration.
Additional quirk: The researchers run some of their experiments on the North Dakota HPC rig and are able to take advantage of some of its nice networking features by using MPI and so on. Most countries have spent years investing significant amounts of money in building up large high-performance computing systems so it’s intriguing to see how AI researchers can use these existing computational behemoths to further their own research.

OpenAI Bits&Pieces:

Meta-Learning for better competition:
…Research in which we extend MAML to work in scenarios where the environments and competitors are iteratively changing as well. Come for the meta-learning research, stay for the rendered videos of simulated robotic ants tussling with each other.
…Read arxiv here: Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments.

Creating smarter agents with self-play and multi-agent competition:
Just how powerful are existing reinforcement learning algorithms? It’s hard to know, as they’ll tend to fail on some environments (eg, Montezuma’s Revenge), while excel at others (most Atari games). Another way to evaluate the success of these algorithms is to test their performance against successively more powerful versions of themselves, combined with simple objectives. Check out this research in which we use such techniques to teach robots to sumo wrestle, tackle each other, run, and so on.
Emergent Complexity via Multi-Agent Competition.

Tech Tales:

[ 2031: A liquor store in a bustling Hamtramck, Detroit – rejuvenated following the success of self-driving car technology and the merger of the big three into General-Ford-Fiat, which has sufficient scale to partner with the various tech companies and hold its own against the state-backed Chinese and Japanese self-driving car titans.]

You stand there, look at the bottles, close your eyes. Run your hands over the little cameras studding your clothing, your bag, your shoes. For a second you think about turning them off. What gods don’t see gods can’t judge don’t drink don’t drink don’t drink. Difficult.

“Show me what happens if I drink,” you whisper quiet enough that no one else can hear.

“OK, playing forward,” says the voice to you via bone conduction from an in-ear headphone.

In the top right of your vision the typical overlay of weather/emails/bank balance/data credits disappears, replaced by a view of the store from your current perspective. But the view changes. A ghostly hand of yours stretches out in the upper-right view and grabs a bottle. The view changes as the projection of you goes to the counter. The face of the teller barely resolves – it’s a busy store with high staff turnover, so the generative model has just given up and decided to combine them into what people on the net call: Generic Human Face. Purchase the booze. In a pleasing MC Escher-recursion in your upper right view of your generated-future-self buying booze you can see an even smaller corner in the upper right of that generator which has your bank account. The AI correctly deducts the price of the imaginary future bottle from your imaginary future balance. You leave the liquor store and go to the street, then step into a self-driving car which takes you home. Barely any of the outside resolves, as though you’re driving through fog; even the computer doesn’t pay attention on your commute. Things come back into focus as you slow outside your house. Stop. Get out. Walk to the front door. Open it.

Things get bad from there. Your computer knows your house so well that everything is rendered in rich, vivid detail: the plunk of ice cubes into a tall mason jar, the glug-gerglug addition of the booze, the rapid incursion of the glass into your viewline as you down the drink whole, followed by a second. Then you pause and things start to blur because the AI has a hard time predicting your actions when you drink. So it browses through some probability distribution and shows you the thing it thinks is most likely and the thing it thinks will make you least likely to drink: ten seconds go by as it shows you a speedup of the blackout, then normal time comes back and you see a version of yourself sitting in a bathtub, hear underwater-whale-sound crying imagined and conducted into you via the bone mic. Throw your glass against the wall erupting in a cloud of shards. Then a black screen. “Rollout ended,” says the AI. “Would you like to run again.”

“No thanks,” you whisper.

Switch your view back to reality. Reach for the shelf. And instead of grabbing the booze you grab some jerky, pistachios, and an energy bar. Go to the counter. Go home. Eat. Sleep.

Technologies that inspired this story: generative models, transfer learning, multi-view video inference systems, robot psychologists, Google Glass, GoPro.