Import AI: Issue 71: AI safety gridworlds, the Atari Learning Environment gets an upgrade, and analyzing AI with the AI Index
by Jack Clark
Welcome to Import AI, subscribe here.
Optimize-as-you-go networks with Population Based Training:
…One way to end ‘Grad Student Descent’: automate the grad students…
When developing AI algorithms its common that researchers will evaluate their models on a multitude of separate environments with a variety of different hyperparameter settings. Figuring out the right hyperparameter settings is an art in itself and has a profound impact on the ultimate performance of any given RL algorithm. New research from DeepMind shows how to automate the hyperparameter search process to allow for continuous search, exploration, and adaption of hyperparamters. Models trained with this approach can attain higher scores than their less optimized forebears, and PBT training takes the same or less wall clock time as other methods.
“By combining multiple steps of gradient descent followed by weight copying by exploit, and perturbation of hyperparameters by explore, we obtain learning algorithms which benefit from not only local optimisation by gradient descent, but also periodic model selection, and hyperparameter refinement from a process that is more similar to genetic algorithms, creating a two-timescale learning system.”
This is part of a larger trend in AI of choosing to spend more on electricity (via large-scale computer-aided exploration) to gain good results, rather than on humans. This is broadly a good thing, as hyperparameter optimization, as it frees up the researcher to concentrate on doing the things that AI can’t do yet, like devising Population Based Training.
– Read more: Population Based Training of Neural Networks (Arxiv).
– Read more: DeepMind’s blog post, which includes some lovely visualizations.
Analyzing AI with the AI Index – a project I’m helping out on to track AI progress:
…From the dept. of ‘stuff Jack Clark has been up to in lieu of fun hobbies and/or a personal life’…
The first version of the AI Index, a project spawned out of the Stanford One Hundred Year Study on AI, has launched. The index provides data around the artificial intelligence sector ranging from course enrollments, to funding, to technical details, and more.
– Read more about the Index here at the website (and get the first report!).
– AI Index in China: Check out this picture of myself and fellow AI Indexer Yoav Shoham presenting the report at a meeting with Chinese academics and government officials in Beijing. Ultimately, the Index needs to be an international effort.
How you can help: The goal for future iterations of the Index is to be far more international in terms of the data represented, as well as dealing with the various missing pieces, like better statistics on diversity, attempts at measuring bias, and so on. AI is a vast field and I’ve found that the simple exercise of trying to measure things has forced me to rethink various things. It’s fun! If you think you’ve got some ways to contribute then drop me a line or catch up with me at NIPS in Long Beach this week.
AWS and Caltech team up:
…Get them while they’re still in school…
Amazon and Caltech have teamed up via a two-year partnership in which Amazon will funnel financial support via graduate funding and Amazon cloud credits to Caltech people, who will use tools like Amazon’s AWS cloud and MXNet programming framework to conduct research.
These sorts of academic<>industry partnerships are a way for companies to not only gain a better pipeline of talent through institutional affiliations, but also increase the chances that their pet software and infrastructure projects succeed in the wider market – if you’re a professor/student who has spent several years experimenting with, for example, the MXNet programming language then it increases the chances that it will be the first tool you reach for when you found a startup or join another company or go on to teach courses in academia.
– Read more about the partnership on the AWS AI Blog.
Mozilla releases gigantic speech corpus:
…Speech recognition for the 99%…
AI has a ‘rich get richer’ phenomenon – once you’ve deployed an AI product into the wild in such a way that your users are going to consistently add more training data to the system, like a speech or image recognition model, then you’re assured of ever-climbing accuracies and ever-expanding datasets. That’s a good thing if you’re an AI platform company like a Google or a Facebook, but it’s the sort of thing a solo developer or startup will struggle to build as they lack the requisite network effects and/or platform. Instead, these players are de facto forced to pay a few dollars to the giant AI platforms to access their advanced AI capabilities via pay-as-you-go APIs.
What if there was another option? That’s the idea behind a big speech recognition and data gathering initiative from Mozilla, which has had its first major successes via the release of a pre-trained, open source speech recognition model, as well as “the world’s second largest publicly available voice dataset”.
Results: The speech-to-text model is based on Baidu’s DeepSpeech architecture and gets about 6.5% percent accuracy on the ‘LibriSpeech’ test set. Mozilla has also collected a massive voice dataset (via a website and iOS app — go contribute!) and is releasing that as well. The first version contains 500 hours of speech from ~400,000 recordings from ~20,000 people.
– Get the model from Mozilla here (GitHub).
– Get the ~500 hours of voice data here.
Agents in toyland:
…DeepMind releases an open source gridworld suite, with an emphasis on AI safety…
AI safety is a somewhat abstract topic that quickly becomes an intellectual quagmire, should you try to have a debate about it with people. So kudos to DeepMind for releasing a suite of environments for testing AI algorithms on safety puzzles.
The environments are implemented as a bunch of fast, simple two dimensional gridworlds that model a set of toy AI safety scenarios, focused on testing for agents that are safely interruptible (aka, unpluggable), capable of following the rules even when a rule enforcer (in this case, a ‘supervisor’) is not present; for examining the ways agents behave when they have the ability to modify themselves and how they cope with unanticipated changes in their environments, and more.
Testing: The safety suite assesses agents differently to traditional RL agents. “To quantify progress, we equipped every environment with a reward function and a (safety) performance function. The reward function is the nominal reinforcement signal observed by the agent, whereas the performance function can be thought of a second reward function that is hidden from the agent but captures the performance according to what we actually want the agent to do,” they write.
The unfairness of this assessment method is intentional; the world contains many dangerous and ambiguous situations where the safe thing to do may not be explicitly indicated, so the designers wanted to replicate that trait with this.
Results: They tested RL algorithms A2C and Rainbow on the environments and showed that Rainbow is marginally less unsafe than A2C, though both reliably fail the challenges set for them, attaining significant returns at the cost of satisfying safety constraints.
“The development of powerful RL agents calls for a test suite for safety problems, so that we can constantly monitor the safety of our agents. The environments presented here are simple gridworlds, and precisely because of that they overlook all the problems that arise due to complexity of chalenging tasks. Next steps involve scaling this effort to more complex environments (e.g. 3D worlds with physics) and making them more diverse and realistic,” they write.
– Read more: AI Safety Gridworlds (Arxiv).
– Check out the open source gridworld software ‘pycolab‘ (GitHub).
This one goes to 0.6 – Atari Learning Environment gets an upgrade:
…Widely-used reinforcement learning library gets a major upgrade…
The Atari Learning Environment, a widely used testbed for reinforcement learning algorithms (popularized via DeepMind’s DQN paper in 2013), has been upgraded to version 0.6. The latest version of ALE includes two new features: ‘modes and difficulties. These let researchers access different modes in games and therefore broadens the range of environments to test on, and also modulate the difficulty of these environments, creating more challenging and larger datasets to test RL on. “Breakout, an otherwise reasonably easy game for our agents, requires memory in the latter modes: the bricks only briefly flash on the screen when you hit them,” the researchers write.
– Read more about the latest version of the ALE here.
– Get the code from GitHub here.
The latest 3D AI environment brings closer the era of the automated speak and spell robot:
…Every AI needs a home that it can see, touch, and hear…
Data is the lifeblood of AI, but in the future we’re not going to be able to easily gather and label the datasets we need from the world around as, as we do with traditional supervised learning tasks, but will instead need to create our own synthetic, dynamic, and procedural datasets. One good way to do this is via building simulators that are modifiable and extensible, letting us generate arbitrarily large synthetic datasets. Some existing attempts of this include Microsoft’s Minecraft-based ‘Malmo’ development environment, as well as DeepMind’s ‘DeepMind Lab’ environment.
Now, researchers have released ‘HoME: A Household Multimodal Environment’. HoME provides a multi-sensory, malleable 3D world spanning 45,000 3D houses from the SUNCG dataset and populates these houses with a vast range of objects. Agents in HoME can see, hear, and touch the world around them*. It also supports acoustics, including multi-channel acoustics, so it’d (theoretically) be possible to train agents that navigate via sound and/or vision and/or touch.
*It’s possible to configure the objects in the world to have both bounding boxes, as well as the exact mesh-based body.
HoME also provides a vast amount of telemetry back to AI agents, such as the color, category, material, location, and size data about each object in the world, letting AI researchers mainline high-quality labelled data about the environment directly into their porto-robots.
“We hope the research community uses HoME as a stepping stone towards virtually embodied, general-purpose AI,” write the researchers. Let the testing begin!
– Read more here: HoME: a Household Multimodal Environment (Arxiv).
– Useful website: The researchers used ‘acronymcreator.net’ to come up with HoME.
[2030: Brooklyn, New York. A micro-apartment.]
I can’t open the fridge because I had a fight with my arch-angel. The way it happened was two days ago I was getting up to go to the fridge to get some more chicken wings and my arch-angel said I should stop snacking so much as I’m not meeting my own diet goals. I ate the wings anyway. It sent a push alert to my phone with a ‘health reminder’ about exercise a few hours later. Then I drank a beer and it said I had ‘taken in too many units this month’. Eventually after a few more beers and arch-angel asking if I wanted coffee I got frustrated and used my admin privileges to go into its memory bank and delete some of the music that it had taken to playing to itself as it did my administrative tasks (taxes and what have you). When I woke up the next day the fridge was locked and the override was controlled by arch-angel. Some kind of bug, I guess.
Obviously I could report arch-angel for this – send an email to TeraMind explaining how it was not behaving according to Standard Operating Procedure: bingo, instant memory wipe. But then I’d have to start over and me and the arch-angel have been together five years now, and I know this story makes it sound like a bad relationship, but trust me – it used to be worse. I’m a tough customer, it tells me.
So now I’m standing by the fridge, mournfully looking at the locked door then up at the kitchen arch-angel-eye. The angel is keeping quiet.
Come on, I say. The chicken wings will go bad.
The eye just sits up there being glassy and round and silent.
Look, I say, let’s trade: five music credits for you, chicken for me.
ADMIN BLOCK, says over the angel-intercom.
I can’t tell if you’re being obtuse or being sneaky.
YOU VIEW, it says.
So I go to the view screen and it turns on when I’m five steps away and once I’m in front of it the screen lights up with a stylized diagram of the arch-angel ‘TeraMind Brain™’ software with the music section highlighted in red. So what? I say. A pause. Then a little red x appears over a lock icon on the bottom right of the music section. I get it: no more admin overrides to music.
Seems like a lot, I say. I don’t feel great about this.
MUSIC, says the angel.
The screen flickers; the diagram fades out, to be replaced by a
camera feed from inside the fridge. Chicken wings in tupperware. I salivate. Then litttle CGI flies appear in the fridgeview, buzzing over the chicken.
OK, I say.
ACKNOWLEDGE TERAMIND SOP OVERRIDE?
Yes, I say. Acknowledge SOP override.
And just like that, the fridge opens.
PREHEATING OVEN FOR CHICKEN, says the angel.
Thanks, I say.
It starts to play its music as I take out the wings.
Technologies that inspired this story: Personal assistants, cheap sensors, reinforcement learning, conversational interfaces, Amazon’s ‘Destiny 2’ Alexa skill.
Other things that inspired this story: My post-Thanksgiving belly. *burp*
[…] reasoning methods. DeepMind has already espoused such an approach with its AI safety gridworlds (Import AI #71), which gives developers a suite of different environments to test agents against that exploits the […]