Import AI: #66: Better synthetic images heralds a Fake News future, GraphCore shows why AI chips are about to get very powerful, simulated rooms for better reinforcement learning

by Jack Clark

Welcome to Import AI, subscribe here.

NVIDIA achieves remarkable quality on synthetic images:
AKA: The Fake Photo News Tidalwave Cometh
…NVIDIA researchers have released ‘Progressive Growing of GANS for Improved QUality, Stability, and Variation’ – a training methodology for creating better synthetic images with generative adversarial networks (GANS).
…GANs frame problems as a battle between a forger and someone whose job is to catch forgers. Specifically, the problem is framed in terms of a generator network and a discriminator network. The generator tries to generate things that fool the discriminator into classifying the generated images as being from a dataset only seen by the discriminator. The results have been pretty amazing, with GANs used to generate everything from images, to audio samples, to code snippets. But GAN training can also be quite unstable, and prior methods have struggled to generate high-resolution images.
…NVIDIA’s contribution here is to add another step into the GAN training process that lets you generate iteratively more complex objects – in this case images. “Our key insight is that we can grow both the generator and discriminator progressively, starting from easier low-resolution images, and add new layers that introduce higher-resolution details as the training progresses,” the researchers write. “We use generator and discriminator networks that are mirror images of each other and always grow in synchrony. All existing layers in both networks remain trainable throughout the training process. When new layers are added to the networks, we fade them in smoothly”.
…The company has used this technique to generation high-resolution synthetic images. To do this, created a new dataset called CelebA-HQ, which consists of 30,000 images of celebrities at 1024*1024 resolution. Results on this dataset are worth a look – on an initial inspection some of the (cherry picked) examples are indistinguishable from real photographs, and a video showing interpolation across the latent variables indicates the variety and feature-awareness of the progressively trained network.
Efficiency: “With progressive growing, however, the existing low-resolution layers are likely to have already converged early on, so the networks are only tasked with refining the representations by increasingly smaller-scale effects as new layers are introduced. Indeed, we see in Figure 4(b) that the largest-scale statistical similarity curve (16) reaches its optimal value very quickly and remains consistent throughout the rest of the training. The smaller-scale curves (32, 64, 128) level off one by one as the resolution is increased, but the convergence of each curve is equally consistent”
Costs: GAN training is still incredibly expensive; in the case of the high quality celebrity images, NVIDIA trained it on a Tesla P100 GPU for 20 days. That’s a lot of time and electricity to spend on a single training process (and punishingly expensive if doing via a cloud service).
Fake News: GANS are going to be used for a variety of things, but it’s certain they’ll be used by bad actors to generate fake images for use in fake news. Today it’s relatively difficult to, say, generate a portrait of a notable world leader in a compromising circumstance, or to easily and cheaply create images of various buildings associated with governments/NGOs/big companies on fire or undergoing some kind of disaster. Given sufficiently large datasets, new training techniques like the above for generating higher-resolution images, and potentially unsupervised learning approaches like CycleGAN, it’s possible that a new era of visceral – literally! – fake news will be upon us.
Read more here: Progressive Growing of GANS for Improved Quality, Stability, and Variation (PDF).
Get the code here.

Hinton reveals another slice of his long-in-development ‘Capsules’ theory:
…Geoff Hinton, one of the more important figures in the history of deep learning, has spent the past few years obsessed with a simple idea: that the way today’s convolutional neural networks work is a bit silly and unprincipled and needs to be rethought. So he has developed a theory based around the use of essential components he calls ‘capsules’. Now, he and his collaborators at Google Brain in Toronto have published a paper outlining some of these new ideas.
…The paper describes a way to string together capsules – small groups of neurons that are combined together to represent lots of details of an object, like an image – so that they can perform data-efficient classification tasks.
Results: They test the capsules idea on MNIST, an oldschool digit classification AI task that some researchers think is a little too simple for today’s techniques. The capsules approach gets an error rate of about 0.25%, roughly comparable to the test errors of far deeper, larger networks.
…They also tested on CIFAR10, ultimately achieving a 10.6% error using an ensemble of 7 models. This error “is about what standard convolutional nets achieved when they were first applied to CIFAR10” in 2013, they note.
…”Research on capsules is now at a similar stage to research on recurrent neural networks for speech recognition at the beginning of this century. There are fundamental representational reasons for believing that it is a better approach but it probably requires a lot more small insights before it can out-perform a highly developed technology,” they write.
…You can find out more about Hinton’s capsules theory in this speech here.
…Read the paper here: Dynamic Routing Between Capsules.

Free pretrained models for image recognition, text classification, object detection, and more:
…There’s an increasing trend in AI to release pretrained models. That’s a good thing for independent or compute-starved developers who may not have access to the vast fields of computers required to train modern deep learning systems.
… is a website that pulls together a bunch of pretrained models (including OpenAI’s unsupervised sentiment neuron classifier), along with reference information like writeups and blogs.
…Go get your models here at

Using evolution to deal with AI’s memory problem:
…Deep learning-based memory systems are currently hard to train and of dubious utility. But many AI researchers believe that developing some kind of external memory that these systems can write to and from will be crucial to developing more powerful artificial intelligences.
…One candidate memory system is the Neural Turing Machine, which was introduced by DeepMind in a research paper in 2014. NTMs can let networks – in certain, quite limited tasks – perform tasks with less training and higher accuracy than other systems. Successors, like the Neural GPU, extended the capabilities of the NTM to work on harder tasks, like being able to multiply numbers. Then it was further extended with the Evolvable Neural Turing Machine (PDF).
…Now, researchers with IT University of Copenhagen in Denmark have proposed the HyperENTM, which uses evolution techniques to figure out how to wire together the memory interface. Here “an evolved neural network generates the weights of a main model, including how it connects to the external memory component. Because HyperNEAT can learn the geometry of how the network should be connected to the external memory, it is possible to train a compositional pattern producing network on a small bit vector sizes and then scale to larger bit vector sizes without further training,” they write. The approach makes it possible to “train solutions to the copy task that perfectly scale with the size of the bit vectors”.
…Mind you, the tasks this is being evaluated on are still rather basic, so it’s not yet obvious what the scaled up and/or real world utility is of systems like this.
…Read more here: HyperENTM: Evolving Scalable Neural Turing Machines through HyperNEAT.

It still takes a heck of a lot of engineering to get AI to automatically learn anything:
…Here’s a fun writeup by mobile software analytics AI startup Gyroscope about how they trained an agent via reinforcement learning to excel at the classic arcade game Street Fighter.
…The technology they use isn’t novel but the post does go into more details than usual about the immense amount of engineering work required to train an AI system to do anything particularly interesting. In this case, they needed to tap a game software company called BizHawk to help them interface with and program the SNES games and had to shape their observation and reward space to a high degree. None of this is particularly unusual, but it’s worth remembering how much work is required to do interesting things in AI.
…Read more here: How We Built an AI to Play Street Fighter II – Can you beat it?

Prepare yourself for the coming boom in AI chip capabilities:
AI chip startup GraphCore has published some preliminary results showing the kinds of performance gains its in-development IPU (Intelligence Processing Unit) chips can do. Obviously this is data for a pre-release product so take these with a pinch of salt, but if they’re representative of the types of boosts new AI accelerators will give us then we’re in for a wild ride.
…Performance: The in-development chips have a TDP of about 300W, roughly comparable to top-of-the-line GPUS from NVIDIA (Volta) and AMD (Vega), and are able to process thousands of images per second when training a top-of-the-range ResNet-50 architecture, compared to around ~600 to other 300W cards, Graphcore says. IPUS can also be batched together to further speed up training. In other experiments, the chips perform inference hundreds of times faster than GPU competitors like the P100
…We’ll still have to wait a few months to get more details from third-parties like customers but if this is any indication, AI development is likely to accelerate further as a consequence of access to more chips with good performance properties.
…Read more here: Preliminar IPU Benchmarks – Providing Previously Unseen Performance for a Range of Machine Learning Applications.

Not everything needs to be deep learning: Vicarious publishes details on its Capcha-busting recursive cortical networks (RCN) approach:
…Vicarious is a startup dedicated to building artificial general intelligence. It’s also one that has a somewhat different research agenda to other major AI labs like DeepMind/OpenAI/FAIR. That’s because Vicarious has centered much of its work around systems heavily inspired by the (poorly understood) machinery of the human brain, eschewing deep learning methods for approaches that are somewhat more rigorously based on our own grey matter. (This differs subtly to DeepMind, which also has strong expertise in neuroscience but has so far mostly been focused on applying insights from cognitive and computational neuroscience to new neural network architectures and evaluation techniques).
…The paper, published in science, outlines RCNs and describes how they’re able to learn to solve oldschool text-based Capchas. Why AI researchers may care about this is that Vicarious’s approach is not able to generalize somewhat better than (shoddy) convolutional neural network baselines, but does so with tremendous data efficiency; the company is able to solve (outmoded) text-based CAPCHAs using only a few thousand data samples, compared to hundreds of thousands for convolutional neural network-based baselines.
…Read more about the research here: Common Sense, Cortex, and CAPCHA.
…Vicarious has also published code for an RCN implementation here.

The future of AI is a simulated robot puttering around a simulated house, forever:
Anonymous researchers (it’s an ICLR 2018 submission) have created a new 3D environment/dataset that could come in handy for reinforcement learning and perception AI researchers, among others.
The House3D dataset consists of 45,622 human-designed scenes of houses, with an average of 8.9 rooms and 1.3 floors per scene (the max is a palatial 155 rooms and 3 floors!). The dataset contains over 20 room types from bedrooms to kitchens, with around 80 object categories. At every timestep an agent can access labels for the RGB values of its current first-person view, semantic segmentation masks, and depth information. All rooms and objects are labeled and accompanied by 3D bounding boxes.
…The researchers also wrote an OpenGL renderer for scenes derived from the dataset, which can render 120*90 RGB frames at over 600fps when running on a single NVIDIA Tesla M40 GPU.
Room-navigating agents: The researchers propose a new benchmark to use to assess agent performance on this dataset: Multi-Target Room Navigation, in which agents are instructed to go to certain specific rooms. They build two baseline agents models to test on the environment, including a gated-LSTM policy that uses A3C, and a gated-CNN that uses DDPG. These agents attain success rates of as high as about 45% when using RGB data only and 54% when using Mask+Depth data on a small dataset of 20 different houses. When generalizing to the test set (houses not seen during training) the agents’ scores range from 22% (RGB) to as high as ~30% (Mask+Depth). Things are a little better with the larger dataset, with agents here getting scores of 26% (RGB+Depth) to 40% (Mask+Depth) on training, and 25.7% (RGB+Depth) to 35% (Mask+Depth) on the test set.
…The importance of data: “We notice that a larger training set leads to higher generalization ability,” they write.
…Read more here: Building Generalizable Agents With A Realistic and Rich 3D Environment.

OpenAI Bits&Pieces:

Meta Learning Shared Hierarchies:
New research from OpenAI intern and current high school student (!) Kevin Frans and others outlines an algorithm that can break up big problems into little constituent parts. The MLSH algorithm is able to efficiently learn to navigate mazes by switching between various sub-components, while traditional methods would typically struggle due to the lengthy timesteps required to learn to solve the environment.
…Read the paper here: Meta Learning Shared Hierarchies.
…For more information about Kevin, you can check out this Wired profile of him and his work.

Tech Tales:

[2032: A ‘robot kindergarten’ in a university testing facility on the West Coast of America]

OK, now hide! Says the teacher.

The five robots running day-old software dutifully whizz to the corners of the room, while a sixth hangs back.

The teacher instructs the sixth robot – the seeker – to go and seek the other five robots. It gets a reward proportional to the speed with which it finds them. The other robots get rewards according to how long they’re able to remain hidden.

In this way the little proto-minds running in the robots learn to hide. The five hiding bots share their perceptions with one another, so when one robot is found the remaining four adjust their locations to frustrate the seeker. The seeker gains advantages as well, though – able to convert the robots it finds into its own seeker appendages.

After a few minutes there are now five seeker robots and one hiding robot. Once it is found the experiment starts again – and this time the robot that managed to hide the longest becomes the seeker for the next run of the game.

In this way the robots learn iteratively better techniques for hiding, deception, and multi-robot control.

Like many things, what starts out as a game is the seed for something much more significant. The next game they play after hide and seek is a chasing game and then after that a complicated one requiring collaborative tool-use for the construction of an EMP shelter against a bomb which – their teacher tells them – will go off in a few days and wipe their minds clean.

The robots do not know if this is true. Nor do they know if they have been running these games before. They do not truthfully know how old their own software is. Nor whether they are the first residents of the kindergarten, or standard hardware vessels that have played host to many other minds as well.

Technologies that inspired this story: Iterative Self-Play, Generative Adversarial Networks, Transfer Learning.