Import AI: #81: Trading cryptocurrency with deep learning; Google shows why evolutionary methods beat RL (for now); and using iWatch telemetry for AI health diagnosis
by Jack Clark
DeepMind’s IMPALA tells us that transfer learning is starting to work:
…Single reinforcement learning agent with same parameters solves a multitude of tasks, with the aid of a bunch of computers…
DeepMind has published details on IMPALA, a single reinforcement learning agent that can master a suite of 30 3D-world tasks in ‘DeepMind Lab’ as well as all 57 Atari games. The agent displays some competency at transfer learning, which means it’s able to use knowledge gleaned from solving one task to solve another, increasing the sample efficiency of the algorithm.
The technique: The Importance Weighted Actor-Learner Architecture (IMPALA) scales to multitudes of sub-agents (actors) deployed on thousands of machines which beam their experiences (sequences of states, actions, and rewards) back to a centralized learner, which uses GPUs to derive insights which are fed back to the agents. In the background it does some clever things with normalizing the learning of individual agents and the meta-agent to avoid temporal decoherence via a new off-policy actor-critic algorithm called V-trace. The outcome is an algorithm that can be far more sample efficient and performant than traditional RL algorithms like A2C.
Datacenter-scale AI training: If you didn’t think compute was the strategic determiner of AI research, then read this paper and consider your assumptions: IMPALA can achieve throughput rates of 250,000 frames per second via its large-scale, distributed implementation which involves 500 CPUS and 1 GPU assigned to each IMPALA agent. Such systems can achieve a throughput of 21 billion frames a day, DeepMind notes.
Transfer learning: IMPALA agents can be trained on multiple tasks in parallel, attaining median scores on the full Atari-57 dataset of as high as 59.7% of human performance, roughly comparable to the performance of single-game trained simple A3C agents. There’s obviously a ways to go before IMPALA transfer learning approaches are able to rival fine-tuned single environment implementations (which regularly far exceed human performance), but the indications are encouraging. Similarly competitive transfer-learning traits show up when they test it on a suite of 30 environments implemented in DeepMind Lab, the company’s Quake-based 3D testing platform.
Why it matters: Big computers are analogous to large telescopes with very fast turn rates, letting researchers probe the outer limits of certain testing regiments while being able to pivot across the entire scientific field of enquiry very rapidly. IMPALA is the sort of algorithm that organizations can design when they’re able to tap into large fields of computation during research. “The ability to train agents at this scale directly translates to very quick turnaround for investigating new ideas and opens up unexplored opportunities,” DeepMind writes.
Read more: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures (Arxiv).
Dawn of the cryptocurrency AI agents: research paths for trading crypto via reinforcement learning:
…Why crypto could be the ultimate testing ground for RL-based trading systems, and why this will require numerous fundamental research breakthroughs to succeed…
AI chap Denny Britz has spent the past few months wondering what sorts of AI techniques could be applied to learning to profitably trade cryptocurrencies. “It is quite similar to training agents for multiplayer games such as DotA, and many of the same research problems carry over. Knowing virtually nothing about trading, I have spent the past few months working on a project in this field,” he writes.
The face-ripping problems of trading: Many years ago I spent a few years working around one of the main financial trading centers of Europe: Canary Wharf in London, UK. A phrase I’d often hear in the bars after work would be one trader remarking to another something to the nature of: “I got my face ripped off today”. Were these traders secretly involved in some kind of fantastically violent bloodsport, known only to them, my youthful self wondered? Not quite! What that phrase really means is that the financial markets are cruel, changeable, and, even when you have a good hunch or prediction, they can still betray you and destroy your trading book, despite you doing everything ‘right’. In this post former Google Brain chap Denny Britz does a good job of cautioning the would-be AI trader that cryptocurrencies are the same: even if you have the correct prediction, exogenous shocks beyond your control (trading latency, liquidity, etc), can destroy you in an instant. “What is the lesson here? In order to make money from a simple price prediction strategy, we must predict relatively large price movements over longer periods of time, or be very smart about our fees and order management. And that’s a very difficult prediction problem,” he writes. So why not invent more complex strategies using AI tools, he suggests.
Deep reinforcement learning for trading: Britz is keen on the idea of using deep reinforcement learning for trading because it can further remove the human from needing to design many of the precise trading strategies needed to profit in this kind of market. Additionally, it has the promise of being able to operate at shorter timescales than those which humans can take actions in. The catch is that you’ll need to be able to build a simulator of the market you’re trading in and try to make this simulator have the same sorts of patterns of data found in the real world, then you’ll need to transfer your learned policy into a real market and hope that you haven’t overfit. This is non-trivial. You’ll also need to develop agents that can model other market participants and factor predictions about their actions into decision-making: another non-trivial problem.
Read more here: Introduction to Learning to Trade with Reinforcement Learning.
Google researchers: In the battle between evolution and RL, evolution wins: fow now:
…It takes a whole datacenter to raise a model…
Last year, Google researchers caused a stir when they showed that you could use reinforcement learning to get computers to learn how to design better versions of image classifiers. At around the same time, other researchers showed you could use strategies based around evolutionary algorithms to do the same thing. But which is better? Google researchers have used their gigantic compute resources as the equivalent of a big telescope and found us the answer, lurking out there at vast compute scales.
The result: Regularized evolutionary approaches (nicknamed: ‘AmoebaNet’) yield a new state-of-the-art on image classification on CIFAR-10, parity with RL approaches on ImageNet, and marginally higher performance on the mobile (aka lightweight) ImageNet. Evolution “is either better than or equal to RL, with statistical significance “when tested on “small-scale” aka single-CPU experiments. Evolution also increases its accuracy far more rapidly than RL during the initial stages of training. For large-scale experiments (450 GPUs (!!!) per experiment) they found that Evolution and RL do about the same, with evolution approaching higher accuracies at a faster rate than reinforcement learning systems. Additionally, evolved models make a drastically more efficient use of compute than their RL variants and obtain ever-so-slightly higher accuracies.
The method: The researchers test RL and evolutionary approaches on designing a network composed of two fundamental modules: a normal cell and a reduction cell, which are stacked in feed-forward patterns to form an image classifier. They test two variants of evolution: non-regularized (kill the worst-performing network at each time period) and regularized (kill the oldest network in the network). For RL, they use TRPO to learn to design new architectures. They tested their approach on the small-scale (experiments that could run on a single CPU) as well as large-scale ones (450 GPUs each, running for around 7 days).
What it means: What all this means in practice is threefold:
– Whoever has the biggest computer can perform the largest experiments to illuminate potentially useful datapoints for developing a better theory of AI systems (eg, the insight here is that both RL and Evolutionary approaches converge to similar accuracies.)
– AI research is diverging into into distinct ‘low compute’ and ‘high compute’ domains, with only a small number of players able to run truly large (~450 GPUs per run) experiments.
– Dual Use: As AI systems become more capable they also become more dangerous. Experiments like this suggest that very large compute operators will be able to explore potentially dangerous use cases earlier, letting them provide warning signals before Moore’s Law means you can do all this stuff on a laptop in a garage somewhere.
– Read more: Regularized Evolution for Image Classifier Architecture Search (Arxiv).
Rise of the iDoctor: Researchers predict medical conditions from Apple Watch apps:
…Large-scale study made possible by a consumer app paired with Apple Watch…
Deep learning’s hunger for large amounts of data has so far made it tricky to apply it in medical settings, given the lack of large-scale datasets that are easy for researchers to access and test approaches on. That may soon change as researchers figure out how to use the medical telemetry available from consumer devices to generate datasets orders of magnitude larger than those used previously, and do so in a way that leverages existing widely deployed software.
New research from heart rate app Cardiogram and the Department of Medicine at the University of California at San Francisco uses data from an Apple Watch, paired with the Cardiogram app, to train an AI system called ‘DeepHeart’ with data donated by ~14,000 participants to better predict medical conditions like diabetes, high blood pressure, sleep apnea, and high cholesterol.
How it works: DeepHeart ingests the data via a stack of neural networks (convnets and resnets) which feed data into bidirectional LSTMs that learn to model the longer temporal patterns associated with the sensor data. They also experiment with two forms of pretraining to try to increase the sample efficiency of the system.
Results: Deepheart obtains significantly higher predictive results than those based on other AI methods like multi-layer perceptrons, random forests, decision trees, support vector machines, and logistic regression. However, we don’t get to see comparisons with human doctors, so it’s not obvious how these AI techniques rank against widely deployed flesh-and-blood systems. The researchers report that pre-training has let them further improve data efficiency. Next, the researchers hope to explore techniques like Clockwork RNNs and Phased LSTMs and Gaussian Process RNNs to see how they can further improve these systems by modeling really large amounts of data (like one year of data per tested person).
Why it matters: The rise of smartphones and the associated fall in cost of generic sensors has effectively instrumented the world so that humans and things that touch humans will generate ever larger amounts of somewhat imprecise information. Deep learning has so far proved to be an effective tool to use from large quantities of imprecise data. Expect more.
Read more: DeepHeart: Semi-Supervised Sequence Learning for Cardiovascular Risk Prediction (Arxiv).
‘Mo text, ‘mo testing: Researchers released language benchmarking tool Texygen:
…Evaluation and testing platform ships with multiple open source language models…
Researchers with Shanghai Jiao Tong University and University College London have released Texygen, a text benchmarking platform implemented as a library for Tensorflow. Texygen includes a bunch of open source implementations of language models, including Vanilla MLE, as well as a menagerie of GAN-based methods (SeqGAN, MaliGAN, RankGAN, TextGAN, GSGAN, LeakGAN.) Texygen incorporates a variety of different evaluation methods, including BLEU as well as newer techniques like NLL-oracle, and so on. The platform also makes it possible to train with synthetic data as well as real data, so researchers can validate approaches without needing to go and grab a giant dataset.
Why it matters: Language modelling is a booming area within deep learning so having another system to use to test new approaches against will further help researchers calibrate their own contributions against that of the wider field. Better and more widely available baselines make it easier to see true innovations.
Why it might not matter: All of these proposed techniques incorporate less implicit structure than many linguists know language contains, so while they’re likely capable of increasingly impressive feats of word-cognition, it’s likely that either orders of magnitude more data or significantly stronger priors in the models will be required to generate truly convincing facsimiles of language.
Read more: Texygen: A Benchmarking Platform for Text Generation Models (Arxiv).
Scientists map Chinese herbal prescriptions to tongue images:
…Different cultures mean different treatments which mean different AI systems…
Researchers have used standardized image classification techniques to create a system that predicts a Chinese herbal prescription from the image of a tongue. This is mostly interesting because it provides further evidence of the breadth and pace of adoption of AI techniques in China and the clear willingness of people to provide data for such systems.
Dataset: 9585 pictures of tongues from over 50 volunteers and their associated Chinese herbal prescriptions which span 566 distinct kinds of herb.
Read more: Automatic construction of Chinese herbal prescription from tongue image via convolution networks and auxiliary latent therapy topics (Arxiv).
How’s my driving? Researchers create (slightly) generalizable gaze prediction system:
…Figuring out what a driver is looking at has implications for driver safety & attentiveness…
One of the most useful (and potentially dangerous) aspects of modern AI is how easy it is to take an existing dataset, slightly augment it with new domain-specific data, then solve a new task the original dataset wasn’t considered for. That’s the case for new research from the University of California at San Diego, which proposes to better predict the locations that a driver’s gaze is focused on, by using a combination of ImageNet and new data. The resulting gaze-prediction system beats other baselines and vaguely generalizes outside of its training set.
Dataset: To collect the original dataset for the study the researchers mounted two cameras inside and one camera outside a car; the two inside cameras capture the driver’s face from different perspectives and the external one captures the view of the road. They hand-label seven distinct regions that the driver could be gazing at, providing the main training data for the dataset. This dataset is then composed of eleven long drives split across ten subjects driving two different cars, all using the same camera setup.
Technique: The researchers propose a two-stage pipeline, consisting of an input pre-processing pipeline that performs face detection and then further isolates the face through one of four distinct techniques. These images are then fed into the second stage of the network, which consists of one of four different neural network approaches (AlexNet, VGG, ResNet, and SqueezeNet) for fine-tuning.
Results: The researchers test their approach against one state-of-the-art baselines(random forest classifier with hand-designed features) and find that their approach attains significantly better performance at figuring out which of seven distinct gaze zones (forward, to the right, to the left, the center dashboard, the rearview mirror, the speedometer, eyes closed/open) the driver is looking at at any one time. The researchers also tried to replicate another state-of-the-art baseline that used neural networks. This system used the first 70% of frames from each drive for training and the next 15% for validation and last 15% for testing. In other words, the system would train on the same person and car and (depending on how much the external terrain varies) broad context as what it was subsequently tested on. When replicating this the researchers got “a very high accuracy of 98.7%. When tested on different drivers, the accuracy drops down substantially to 82.5%. This clearly shows that the network is over-fitting the task by learning driver specific features,” they write.
Results that make you go ‘hmmm’: The researchers found that a ‘SqueezeNet’-based network displayed significant transfer and adaptation capabilities, despite receiving very little prior data about the eyes of the person being studied: ‘the activations always localize over the eyes of the driver’, they write, and ‘the network also learns to intelligently focus on either one or both eyes of the driver’. Once trained, this network attains an accuracy of 92.13% at predicting what the gaze links to, a lower score than those set by other systems, but on a dataset that doesn’t let you test on what is essentially your training set. The system is also fast and reasonably lightweight: “Our standalone system which does not require any face detection, performs at an accuracy of 92.13% while performing real time at 166.7 Hz on a GPU,” they write.
Generalization: The researchers tested their trained system on a completely separate dataset: the Columbia Gaze Dataset. This dataset applies to a different domain, where instead of cars, a variety of people are seated and asked to look at specific points on an opposing wall. The researchers’ took their best performing model from the prior dataset and applied it to the new data and tested its predictive ability. They detected some level of generalization, with it able to correctly predict certain basic traits about gaze like orientation and direction. This (slight) generalization is another sign that the dataset and testing regime they employed for their own dataset aided generalization.
Read more: Driver Gaze Zone Estimation using Convolutional Neural Networks: A General Framework and Ablative Analysis (Arxiv).
OpenAI Bits & Pieces:
Discovering Types for Entity Disambiguation:
Ever had trouble disentangling the implied object from the word as written? This system simplifies this. Check out the paper, code, and blogpost (especially the illustrations, which Jonathan Raiman did along with the research, the talented fellow!).
Read more: Discovering Types for Entity Disambiguation (OpenAI).
CNAS Podcast: The future of AI and National Security:
AI research is already having a significant effect on national security and research breakthroughs are both influencing future directions of government spending as well as motivating the deployment of certain technologies for offense and defence. To help provide information for such a conversation I and the Open Philanthropy Project’s Helen Toner recently did a short podcast with the Center for a New American Security to talk through some of the issues motivated by recent AI advances.
Listen to the podcast here (CNAS / Soundcloud).
They took inspiration from a thing humans once called ‘demoscene’. It worked like this: take all of your intelligence and try to use it to make the most beautiful thing you can in an arbitrary and usually very small amount of space. One kilobyte. Two kilobytes. Four. Eight. And so on. But never really a megabyte or even close. Humans used these constraints to focus their creativity, wielding math and tonal intuition and almost alchemy-like knowledge of graphics drivers to make fantastic, improbable visions of never-enacted histories and futures. They did all of this in the computational equivalent of a Diet, Diet, Diet Coke.
Some ideas last. So now the AIs did the same thing but with entire worlds: what’s the most lively thing you can do in the smallest amount of memory-diamond? What can you fit into a single dyson sphere – the energy of one small and stately sun? No black holes. No gravitational accelerators. Not even the chance of hurling asteroids in to generate more reaction mass. This was their sport and with this sport they made pocket universes that contained pocket worlds on which strode small pocket people who themselves had small pocket computers. And every time_period the AIs would gather around and marvel at their own creations, wearing them like jewels. How smart, they would say to one another. How amazing are the thoughts these creatures in these demo worlds have. They even believe in gods and monsters and science itself. And merely with the power of a mere single sun? How did you do that?
It was for this reason that Planck Lengths gave the occasional more introspective and empirical AIs concern. Why did their own universe contain such a bounded resolution, they wondered, spinning particles around galactic-center blackholes to try and cause reactions to generate a greater truth?
And with only these branes? Using only the energy of these universes? How did you do this? a voice sometimes breathed in the stellar background, picked up by dishes that spanned the stars.