Import AI: Issue 19: OpenAI reveals its Universe, DeepMind figures out catastrophic forgetting, and beware the ‘Sabbath Mode’

by Jack Clark

ALERT! PROTOCOL BREAK FOR OPENAI ANNOUNCEMENT: We’ve just launched Universe, a software platform for measuring and training an AI’s general intelligence across the world’s supply of games, websites and other applications. We’re hoping that this dataset, benchmark, and infrastructure, can push forward RL research in the same way that other great datasets (some featured below) have accelerated other parts of AI…

… the fact we spent so much time building this suggests to me that AI’s new strategic battleground is about environments and computation, rather than static datasets… the received wisdom is that data is the strategically-crucial fuel for artificial intelligence development. That used to be true when much of the research community was focused on training classifiers to map A to B, and so on. But things have changed. We’re moving into an era where we’re training agents that can take actions in dynamic environments. That means the new key component has become the ability for any one AI research entity to access and create a large amount of rich environments to train their RL agents in. I think that realization provoked Facebook to develop and release TorchCraft to ease the development of agents that are trained on StarCraft, and to develop its language-based learning platform CommAI-env; motivated DeepMind to partner with Blizzard to turn StarCraft II into an AI development platform, and to develop and now plan to release code (hooray!) for its DeepMindLab RL environment (AKA – the world simulator formerly known as Labyrinth); and led Microsoft to turn Minecraft into the ‘Project Malmo’ AI development framework.

How to jumpstart an AI industry? One gigantic supercomputer… or so believes Japan, which is now taking bids for a 130-petaflop supercomputer (versus the world’s current top one, China’s 93-petaflop SunWay TaihuLight) slated to be completed in late-2017. Japan has some of the world’s greatest robot researchers and companies (like Fanuc, or Google’s SHAFT) but has lagged behind in software (for instance, most popular AI frameworks are from the US or Canada, like Caffe or Theano or TensorFlow. The main Japanese one is ‘Chainer’, which isn’t used that widely.). The rig is called ABCI, short for AI Bridging Cloud Infrastructure, and its goal is to “rapidly accelerate the deployment of AI into real businesses and society” (PDF).

Catastrophic forgetting? Forget about it! New research paper from DeepMind, “Overcoming catastrophic forgetting in neural networks” claims to deal with the ‘catastrophic forgetting’ problem in neural networks, making it easier for a single network to be trained to excel at multiple tasks. Techniques like this will be key to developing more advanced, flexible AI systems. “. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective,” they write (PDF).

AI-generated imagery is sitting on a heck of a Moore’s Law-style curve: That’s my takeaway from ‘Plug & Play Generative Networks’, research that brings us a step closer to generating realistic, high-resolution images using AI. The rate at which the aesthetic quality has advanced here is truly immense. To get an idea for how far we’ve come compare images generated from captions by PPGN (Figure 3), with those generated by groundbreaking research from a year ago….

… and we can expect even better things in the future, thanks to a new Visual Question Answering dataset (PDF). The new set roughly doubles the size of the previous VQA release by adding an additional image (and answer) to each question. Where before you had “Question: is the umbrella upside down? Image: an upside-down umbrella, caption ‘yes’”, you now have “Question: is the umbrella upside down? Image: an upside-down umbrella, caption ‘yes’, Image2: an umbrella in normal position, caption ‘no’.” This will let researchers create better categorization systems that get less confused, could also lead to better synthetic image generation via a richer internal representation of what is being described.

Have you heard the news / I’m reading today / I’m going to slurp all the data / Maluubaaa, Maluuubaaa! Canadian AI startup Maluuba has release a new free dataset that contains 100,000 question-and-answer pairs built out of CNN articles from DeepMind’s mammoth Q&A dataset. Check out NewsQA and start generating an alternative news narrative to that of reality (please!).

Rise of the AI hedge fund: Two Sigma has spun up a competition on Kaggle. It’s giving people a bunch of data containing “anonymized features pertaining to a time-varying value for a financial instrument”. The idea is to tap into the global intelligence of the Kaggle community to come up with new algorithms and inferences that make better predictions from data. There’s $100,000 in prize money up for grabs as well. The approach is similar to that taken by Numerai which turns to the crowd to garner predictions about the movements of strange, anonymized numbers. The key difference? Numerai pays people according to the success of their predictions, whereas Two Sigma is only coughing up a hundred thousand dollars (what do we call this – a megabuck?). Hopefully the group-based stock market inference activity will protect any individuals involved from becoming obsessed with the eldritch rhythms of the stock market, causing them to lose their minds – as depicted in Aronofsky’s ‘before he was famous’ flick ‘Pi’.

The industrialization of machine learning: machine learning has moved from being a science into a profession, says Amazon/USheffield’s Neil Lawrence. That means people are combing through research papers and code to create repeatable, reusable blocks of AI-driven computation, which are then applied by engineers who are more like construction-people than architects. So, what should AI scientists do to further push the field forward? Lawrence’s proposal is that they try and pair more mathematically-rich tools (kernel methods and Gaussian processes) with the inscrutable-yet-powerful neural networks that are currently in vogue. “as The Hitchhiker’s Guide to the Galaxy” states “Don’t Panic”, he writes. “By bringing our mathematical tools to bear on the new wave of deep learning methods we can ensure that they remain “mostly harmless”.”

Term of the week… the truly delightful ‘Sabbath Mode’, which is basically a selective lobotomy for the complex parts of electronics to be activated on the Shabbat and Jewish holidays. I now imagine a Christian fridge whose ‘sabbath mode’ prevents the owner from consuming frightfully sinful shellfish.

The Amazon AI Kraken Waketh… Amazon’s strategy for tackling a new market is similar to the methods employed by the mythological nightmare-of-the-sea, The Kraken. It lurks out of sight while rivals like Google and Microsoft attempt to be first-to-market, then it suddenly emerges from the depths of Seattle, with each of its numerous appendages flailing with new products. That’s roughly what happened at its re:invent conference this week, when Big Yellow Kraken revealed a swathe of AI products, including…

  …Reconfigurable, FPGA-containing computers…  Amazon’s answer to the slowdown in Moore’s Law lies in ‘F1’ servers loaded with typical processors paired with FPGAs (for weird&gnarly stuff: custom accelerators, offload network hubs, and so on.). …

Image recognition… a new image recognition service called “Rekognition” (does Bezos have a grudge against sub-editors?) will compete with existing ones from Amazon, IBM, Microsoft, and many others.

voice-assistant-as-a-service…I was talking to someone involved in self-driving cars recently and I made some glib comment about how you could use neural networks to train a traffic light detector to help you deal with intersections. “Ah,” they said, “but can you train it to deal with all possible configurations of traffic lights in the world. Can you deal with sets of 6 traffic lights side by side, hoisted at odd angles above the road, due to the fact the town planner went rogue due to a new pedestrian bridge? And how do you know which one of those 6 is yours? Especially if there’s unique signage? And…” at this point I, suitably chastened, realized the error of my question. Amazon has had to deal with similar challenges with the Polly voice assistant, a text-to-speech cloud service that supports 47 different voices and 24 languages, with the voices knowing the difference between pronouncing sa “I live in Seattle” and “Live from New York”. Yet another example of ‘industrial deep learning’ where the underlying tech is fairly standard but the commercial implementation involves getting a lot of finicky details exactly right.

Citation Not Needed Anymore! Jurgen Schmidhuber gets his media article – the bloke who pioneered the LSTM (a key component in the current enthusiasm for all things memory&AI) has finally got his NYT profile. Congrats Jurgen! (pronounced, as he has told me multiple times, “you-again shmit-hugh-bur”.) Still waiting to see papers emanate from his secretive startup NNAISENSE, though.

OpenAI bits&pieces:

Many of the research team are at NIPS in Barcelona this week giving tutorials, lectures, and such. A full schedule is available here.

OpenAI and Microsoft sponsored events at Women in Machine Learning at NIPS as well. It’s an honor to support a scientific community focused on supporting and increasing diversity in AI.

Government AI: Last week our co-founder, Greg Brockman, was a witness at the Senate’s hearing on “The Dawn of AI”. You can watch the testimony and read our written submission here.


[Note: thrilled that this edition’s short story comes from a reader, Jack Galler. Thanks for writing in, Jack – great name!]

[2020: a woman walking through the city, listening to music.]

The playlist ends, and the woman gives it a positive rating. Her phone prompts: “Would you like another context playlist?” The woman confirms.

She raises her phone and takes two pictures – one from the rear camera, showing the street, and the other from the front camera of her. The photos get deposited into the phone’s internal representation of the ‘mood’ of the moment, along with the woman’s heart rate from her smartwatch, and a tweet she posted earlier about her lunch. It even knows that it’s raining.

The phone’s AI fuses these together and creates a new internal representation of the mood, then uses GAN techniques to generate a new song. A soothing, spanish guitar solo thrums out of the phone to match the light drumming of the rain.

At the end of the song, she’s prompted to rate it. She gives it a thumbs up – she can’t remember the last time she gave it a thumbs down. She will never listen to that soothing acoustic guitar solo again. She could save it, but the context of the song will never be the same – she will never feel the exact same as the moment the song was created, nor will the city she was walking through be what it was in that moment.