Import AI 236: EfficientNet++; why robots are hard; AI2 makes a harder ARC

What’s standing between us and smart robots? AI experts lay out laundry list of needed research:
…But if we can make progress on these problems, very good things will happen…
I want robots. You probably want robots as well. But today’s robots are hopelessly dumb and limited. To create smart robots that people want to buy, we’ll need to surmount a bunch of challenging AI research problems. Now, some of the world’s foremost experts at AI&Robots have laid out the technical hurdles to building robots that can learn efficiently via reinforcement learning. In a paper, people who’ve spent time working on robots at Google, including at Stanford University and Berkeley, list the issues.

What stands between us and more capable robots?
The major challenges holding back RL being applied to robotics relate to its data needs, the inherent challenges of open-ended exploration problems, figuring out how to make robots operate reliably at scale, needing better and more accurate simulators to more cheaply let people train in simulators, creating robots that have more independent abilities to persistent at tasks, and trying to define (and learn) a range of ‘safe’ behaviors.
  The challenging part of these problems? Solving any single one of these would represent a significant breakthrough in applied AI research. Solving all of them would probably represent billions of dollars of IP. Therefore, it might take a while to make progress on this stuff, but if we do – wow!

Why this matters:
If we can work on these challenges, then we’ll get closer to “a future where RL can enable any robot to learn any task,” the researchers write. “This would lead to an explosive growth in the capabilities of autonomous robots – when the capabilities of robots are limited primarily by the amount of robot time available to learn skills, rather than the amount of engineering time necessary to program them, robots will be able to acquire large skill repertoires.”
  Read more:
How to Train Your Robot with Deep Reinforcement Learning; Lessons We’ve Learned (arXiv).

###################################################

AI Dungeon raises $3.3 million:
…AI-powered game startup gets seed funding…
Latitude, the startup behind the GPT2/3 generative text adventure game ‘AI Dungeon’, has raised $3.3 million in seed funding. We first wrote about AI Dungeon back in December 2019, after the game launched using the 1.5bn GPT2 model [Import AI 176]. AI Dungeon uses these language models to create a procedural, emergent text adventure game, where you can be anything and do anything with the generative models filling in your actions in the background. Since launching, Latitude has iterated on the game a lot and swapped out GPT2 for GPT3 across some of its stack.

Why this matters: Modern generative models are more like bottled up imaginations than anything else – with all the complexity and bugginess that implies. AI Dungeon is one of today’s best examples of how we can use these models to create entertainment that feels genuinely different.
  Read more:
AI Dungeon-maker Latitude raises $3.3M to build games with ‘infinite’ story possibilities (Techcrunch).

###################################################

Allen makes a harder ARC, ARC-DA:
…Where we’re going we don’t need multiple choice questions…
The Allen Institute for AI Research (AI2) has built ARC-DA, a direct answer variant of the multiple choice AI2 Reasoning Challenge, ARC. ARC-DA contains questions covering science, math, and other topics. Where ARC-DA differs is it requires a single, direct answer, rather than selecting from a bunch of distinct choices. This makes it harder and more natural than the original ARC evaluation.

Why this matters:
Tests fuel progress in machine learning, so the availability of more tests to assess for reasoning capabilities will lead to more progress here. This is a further sign of the breakneck advances in NLP – ARC-DA seems like a version of ARC with the training wheels taken off.
  Read more: Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge (arXiv).

###################################################

Defense contractor publishes a satellite surveillance MNIST:
…A tiny, 28×28 satellite imagery dataset emerges…
Researchers with PeopleTec, Inc., a defense services contractor, have released Overhead MNIST. Overhead MNIST is a collection of ~9500 labelled images of 10 objects commonly found in satellite footage. The images are black-and-white and 28×28 resolution and have been taken from datasets like SpaceNet, xView, UC Merced Land Use, and DOTA (not the videogame). Overhead MNIST is smaller than typical ‘small’ datasets (which usually have more like 100,000 to a million images), swo may be a useful dataset for testing out sample efficient computer vision algorithms.

The 10 classes: Storage tanks, parking lot, ships, helicopter, car, stadium, oil gas field, runway mark, plane, and harbor.

Things that make you go ‘hmmm’: The corresponding author of this paper is the Chief Scientist for PeopleTec.
  Read more: Overhead MNIST: A Benchmark Satellite Dataset (arXiv).
  Get the data: Overhead-MNIST (Kaggle).

###################################################

NVIDIA
: Billion dollar training runs are coming
…Success of language models means training run costs will rise…
Bryan Catanzaro, NVIDIA’s VP of applied deep learning says its possible “that in five years a company could invest one billion dollars in compute time to train a single language model”, according to comments paraphrased by The Next Platform.

“These models are so adaptable and flexible and their capabilities have been so correlated with scale we may actually see them providing several billions of dollars worth of value from a single model, so in the next five years, spending a billion in compute to train those could make sense,” The Next Platform quotes him as saying.

Why this matters: AI industrialization: AI is entering its phase of mass industrialization – after years of buildup, we have scalable, relatively generic systems that can be ‘fed’ arbitrary amounts of data and compute. Performance has also become more predictable via the emergence of research into things like ‘scaling laws’. Add it all up and it means it’s become easier and less risky for people to bet big on training large models. That’s going to cause problems for governments and academia which tend to distribute resources for science across a very large number of relatively small projects. Meanwhile, industry will start training big kahuna models – to put a billion into perspective, that’s about 1% of Ethiopia’s total GDP in 2020.
  Read more: The Billion Dollar AI Problem That Just Keeps Scaling (The Next Platform).

###################################################

Google boils the ocean to make a far more efficient AI system:
…Neural architecture search + GPU/TPU details + other tricks = 2X efficiency boost…
Google has boosted the efficiency of ‘EfficientNet’, its well-performing and highly efficient class of vision models, by 2X via the use of neural architecture search. Neural architecture search (NAS) is the process of using reinforcement learning to get an AI system to search through the design space of neural networks, coming up with candidate systems that do well at a given task. Google’s new research shows how to use this approach to search for model families – that is, a whole suite of models that use the same basic architecture.

What Google achieved: Google was able to build a new family of models called EfficientNet-X, which are 2X faster (aka, more efficient) than EfficientNet.

How they did it: Google carefully analyzed the target AI training hardware (TPUv3s and V100 GPUs), designed a NAS search space built around the particulars of this hardware and researched a technique to help scale-up networks according to both accuracy and latency constraints. They put all of this together and were able to use an AI-driven approach to come up with a far better family of models. This model family “achieves up to 2X+ faster speed and comparable accuracy to SOTA model families on TPUv3 and GPUv100”, Google says. .

The massively counterintuitive thing about this – you’ve gotta spend compute to make more efficient use of compute: The biggest thing about this paper is what it tells us about compute/energy expenditure and AI – here, a bunch of researchers boil the (metaphorical) ocean to do a complex two-stage search process, spending huge amounts of energy in the process. But what we end up with is a fairly generic family of AI models that are roughly 2X as efficient as their predecessors. That means the upfront energy used to train these models will get amortized over the (vast!) cost-savings from deploying these models onto large infrastructure.
  Read more: Searching for Fast Model Families on Datacenter Accelerators (arXiv).

DeepMind gets rid of batchnorm, makes more efficient neural nets:
…Batch normalization? I don’t know her
Researchers with DeepMind have built a better class of neural network by getting rid of a widely-used technique (batch normalization), matching the performance of EfficientNets (see elsewhere in this issue) while being significantly faster to train. They also set a new state-of-the-art on ImageNet by pre-training on Google’s secret, mammoth ‘JFT’ image repository.

What they did: The authors train ‘Normalizer-Free-ResNets’ (NF-ResNets), then use a technique called adaptive gradient clipping to help them train these NF-ResNets to larger batch sizes than was previously possible. One of the main tricks here is training networks without batch normalization, a widely-used technique that the authors want to get rid of because it’s a bit fiddly. (And generally in ML, when we simplify things, we get increased performance).
  They then try to set a new state-of-the-aert on ImageNet by manually picking through recent innovations in large-sale AI training and stapling them together. They then pre-train a NF-ResNet on the secret ~300 million image ‘JFT’ repository and set a new state-of-the-art of 86.5% for top-1 accuracy: this is meaningful, as it shows that Google’s technique holds up well under transfer (pre-training on JFT and finetuning on ImageNet), which indicates it might be a meaningful improvement.
  Read more: High-Performance Large-Scale Image Recognition Without Normalization (arXiv).

###################################################

AI Policy with
Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Cops use music to censor protestors’ video recordings:
An activist has shared intriguing videos of interactions with police officers in Beverly Hills. The officers, realising they are being filmed, start playing (copyrighted) music loudly on their phones, in an apparent effort to trick content algorithms into removing or muting the video. It’s not clear if this practice is widespread, or whether it’s ever been effective in suppressing citizen footage.
  Read more: Is This Beverly Hills Cop Playing Sublime’s ‘Santeria’ to Avoid Being Live-Streamed? (Vice)

What are the implications of large language models? 

This is a write-up of a discussion on the capabilities and impact of large language models, between researchers from OpenAI, Stanford’s HAI and elsewhere. If you’re interested in the topic, skip my summary and read the paper, which is short and concise. For a comprehensive reading list of papers on the subject, the authors suggest Bender & Gebru et al, and the original GPT-3 paper.


Q1: “What are the technical capabilities and limitations of large language models?”

  • Participants were optimistic about LMs continuing to reap the ‘blessings of scale’.
  • They mostly expected large multimodal models to become more prevalent and enable more diverse capabilities.
  • They’re worried about the alignment of model objectives with human values, with several emphasizing the challenge of optimizing for factual accuracy, and ensuring robustness to adversarial examples. 


Q2: “What are the societal effects of widespread use of large language models?” 

  • They don’t see leading actors (e.g. OpenAI) maintaining a monopoly on large LMs for very long, and expect it to take 6-9 months for such models to be widely reproduced. The lead actors should make use of this time period to establish and promote good norms around responsible deployment.
  • Some suggested more compute resources were needed for academia to do research into societal impacts of LMs to help inform deployment.
  • There was concern about potential misuse of LMs for disinformation, though opinions differed on the magnitude of the risk. They agreed that we need more research into the economics of automating disinformation.
  • They’re worried about LMs exhibiting bias, and suggested ways of addressing different aspects of the problem.

Read more: Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models (arXiv).

###################################################

Tech Tales:

Barkside of the moon
Earth, 2045

Its name was 389-DELTA-FOLLOWER003 but all of its friends just called it ‘DOG’, or whatever the machinespeak equivalent was. DOG was a spaceship about 50 feet long and 10 feet wide and it looked, from the outside, like a grey, fat cigar. Inside, it contained a range of stupefyingly complicated electronics, yet had no – strictly speaking – moving parts. DOGs purpose had been to trail other, far larger ships, acting as a roving sensor platform, communications hub, and general utility-support vehicle. It also acknowledged initial hails by playing back the sound of an animal barking – an odd coincidence, given its name, and one which our scientists are puzzling over.

DOG has so many human qualities, ranging from its name to the bark to the fact its logs use the English alphabet, that our scientists at first worried it came from the future. But we couldn’t understand how – or if – that was possible and, after some weeks passed, we became less concerned about an attack from there.  

Then we went back to the question: if not the future, where did DOG come from? We quickly eliminated the present – no one on Earth had technology like DOG. As far as we could work out, it represented hundreds to thousands of years of scientific advances which humankind was not yet privy to.

So then we checked the past. I got the job to go through the UFO archives among a few different military organizations. So I got to talk to a lot of people driven slightly mad by vast historical records of unexplained mysteries. But: fruitless. Despite it being one of the more exciting things that’d happened to the UFO archivists in decades, no one was able to find me much evidence of a 50 foot by 10 foot silver/grey cigar. Someone tried to tell me it could’ve been retold in history as a story about an ancient sea-going snake, but the evidence there was very sparse.

And then there was where we found it: the dark side of the moon.
For those of you that aren’t familiar with space: You don’t randomly end up on the dark side of the moon unless you’re a comet or an asteroid.
And then there was how we found it: the Chinese had sent a new probe to explore some of the deeper craters on the dark side of the moon. While doing this, the probe was also conducting some intelligence operations, basically sniffing around for other robots and probes placed there by other nations. We found DOG because the ‘DOG’ woke up in response to a hail from the Chinese probe and, yes, barked back to it.

Picture this: the President of the USA and the President of China go into a secure location, along with some other people. They all gather there and stare at eachother. We’ve found an alien craft, folks. And it barks like a dog.
It’s notable that the notes from that meeting are quite thin.
I like to think that someone started laughing and never stopped.

So, that’s where we are. We’ve got our DOG craft and no real explanation of how it got to the moon, why it responds with an animal hail, or why its logs are in English – though the spooky explanation for the latter might be that it did a sweep of the planet at some point and automatically restructured the encoding it used to match the English language; this explanation, along with being hard to prove, also has the inherent undesirable quality of irritating the Chinese government. If DOG could convert itself to English, why not Mandarin as well?

Things that inspired this story:Oumuamua; locked room mysteries; writing ‘barkside of the moon’ and thinking ‘gosh this is absurd’ and then chuckling to myself while writing this story saying ‘yes, this is absurd!’; dogs; the rendering of spaceships in Iain Banks’ culture novels.