03 | February | 2020

Import AI: 183: Curve-fitting conversation with Meena; GANs show us our climate change future; and what compute-data arbitrage means

by Jack Clark

Can curve-fitting make for good conversation?
…Google’s “Meena” chatbot suggests it can…
Google researchers have trained a chatbot with uncannily good conversational skills. The bot, named Meena, is a 2.6 billion parameter language model trained on 341GB of text data, filtered from public domain social media conversations. Meena uses a seq2seq model (the same sort of technology that powers Google’s “Smart Compose” feature in gmail), paired with an Evolved Transformer encoder and decoder – it’s interesting to see something like this depend so much on a component developed via neural architecture search.

Can it talk? Meena is a pretty good conversationalist, judging by transcripts uploaded to GitHub by Google. It also seems able to invent jokes (e.g., Human: do horses go to Harvard? Meena: Horses go to Hayvard. Human: that’s a pretty good joke, I feel like you led me into it. Meena: You were trying to steer it elsewhere, I can see it.)

A metric for good conversation: Google developed the ‘Sensibleness and Specificity Average’ (SSA) measure, which it uses to evaluate how good Meena is in conversation. This metric evaluates the outputs of language models for two traits – is the response sensible, and is the response specifically tied to what is currently being discussed. To calculate the SSA for a given chatbot, the researchers have a team of crowd workers evaluate some of the outputs of the models, then they use this to create an SSA score.
Humans vs Machines: The best-performing version of Meena gets an SSA of 79%, compared to 86% for an average human. By comparison, other state-of-the-art systems such as DialoGPT (51%) and Cleverbot (44%) do much more poorly.

Different release strategy: Along with their capabilities, modern neural language models have also been notable for the different release strategies adopted by the organizations that build them – OpenAI announced GPT-2 but didn’t release it all at once, releasing the model over several months along with research into its potential for misinformation, and its tendencies for biases. Microsoft announced DialoGPT but didn’t provide a sampling interface in an attempt to minimize opportunistic misuse, and other companies like NVIDIA have alluded to larger language models (e.g., Megatron), but not released any parts of them.
With Meena, Google is also adopting a different release strategy. “Tackling safety and bias in the models is a key focus area for us, and given the challenges related to this, we are not currently releasing an external research demo,” they write. “We are evaluating the risks and benefits associated with externalizing the model checkpoint, however”.

Why this matters: How close can massively-scaled function approximation get us to human-grade conversation? Can it get us there at all? Research like this pushes the limits of a certain kind of deliberately naive approach to learning language, and it’s curious that we’re developing more and more superficially capable systems, despite the lack of domain knowledge and handwritten systems inherent to these approaches.
Read more: Towards a Human-like Open-Domain Chatbot (arXiv).
Read more: Towards a Conversational Agent that Can Chat About… Anything (Google AI Blog).

####################################################

Chinese government use drones to remotely police people in coronavirus-hit areas:
…sTaY hEaLtHy CiTiZeN!…
Chinese security officials are using drones to remotely surveil and talk to people in coronavirus-hit areas of the country.

“According to a viral video spread on China’s Twitter-like Sina Weibo on Friday, officials in a town in Chengdu, Southwest China’s Sichuan Province, spotted some people playing mah-jong in a public place.
“Playing mah-jong outside is banned during the epidemic. You have been spotted. Stop playing and leave the site as soon as possible,” a local official said through a microphone while looking at the screen for a drone.
“Don’t look at the drone, child. Ask your father to leave immediately,” the official said to a child who was looking curiously up at the drone beside the mah-jong table.” – via Global Times.

Why this matters: This is a neat illustration of the omni-use nature of technology; here, the drones are being used for a societally-beneficial use (preventing viral transmission), but it’s clear they could be used for chilling purposes as well. Perhaps one outcome of the coronavirus outbreak will be a normalization for a certain form of drone surveillance in China?
Read more: Drones creatively used in rural areas in battle against coronavirus (Global Times).
Watch this video of a drone being used to instruct someone to go home and put on a respirator mask (Global Times, Twitter).

####################################################

Want smarter AI? Train something with an ego!
…Generalization? It’s easier if you’re self-centered…
Researchers with New York University think that there are a few easy ways to improve generalization of agents trained via reinforcement learning – and it’s all about ego! Specifically, their research suggests that if you can make technical tweaks that make a game more egocentric, that is, more tightly gear the observations around a privileged agent-centered perspective, then your agent will probably generalize better. Specifically, they propose “rotating, translating, and cropping the observation around the agent’s avatar”, to train more general systems.
“A local, ego-centric view, allows for better learning in our experiments and the policies learned generalize much better to new environments even when trained on only five environments”, they write.

The secrets to (forced) generalization:
– Self-centered (aka, translation): Warp the game world so that the agent is always at the dead center of the screen – this means it’ll learn about positions relative to its own consistent frame.
– Rotation: Change the orientation of the game map so that it faces the same direction as the player’s avatar. “Rotation helps the agent to learn navigation as it simplifies the task. For example: if you want to reach for something on the right, the agent just rotates until that object is above,” they explain.
– Zooming in (cropping): Crop the observation around the player, which reduces the state space the agent sees and needs to learn about (by comparison, seeing really complicated environments can make it hard for an agent to learn, as it takes it a looooong time to figure out the underlying dynamics.

Testing: They test out their approach on two variants of the game Zelda, the first is a complex Zelda-clone built in the General Video Game AI (GVGAI) framework; the second is a simplified version of the same game. They find that A3C-based agents trained in Zelda with a full set of variations (translation, rotation, cropping) generalize far better than those trained on the game alone (though their test scores of 22% are still pretty poor, compared to what a human might get).

Why this matters: Papers like this show how much tweaking goes on behind the scenes to set up training in such a way you get better or more effective learning. It also gives us some clues about the importance of ego-centric views in general, and makes me reflect on the fact I’ve spent my entire life learning via an ego-centric/world-centric view. How might my mind be different if my eyeballs were floating high above me, looking at me from different angles, with me uncentered in my field-of-vision? What might I have ‘learned’ about the world, then, and might I – similar to RL agents trained in this way – take an extraordinarily long time to learn how to do anything?
Read more: Rotation, Translation, and Cropping for Zero-Shot Generalization (arXiv).

####################################################

Import A-Idea: Reality Trading: Paying Computers to Generate Data:
In recent years, we’ve seen various research groups start using simulators to train their AI agents inside. With the arrival of domain randomization – a technique that lets you vary the parameters of the simulation to generate more data (for instance, data where you’ve altered the textures applied to objects in the simulator, or the physics constants used to govern how objects behave) – people have started using simulators as data generators. This is a pretty weird idea when you step back and think about it – people are paying computers to dream up synthetic datasets which they train agents inside, then they transfer the agents to reality and observe good performance. It’s essentially a form of economic arbitrage, where people are spending money on computers to generate data, because the economics work out better than collecting the data directly from reality.
Some examples:
– Alphastar: AlphaStar agents play against themselves in an algorithmically generated league that doubles as a curriculum, letting them achieve superhuman performance at the game.
– OpenAI’s robot hand: OpenAI uses a technique called automatic domain randomization “which endlessly generates progressively more difficult environments in simulation”, to let them train a hand to manipulate real-world objects.
– Self-driving cars being developed by a startup named ‘Voyage’ are partially trained in software called Deepdrive (Import AI #173), a simulator for training self-driving cars via reinforcement learning.
– Google’s ‘Minitaur’ robots are trained in simulation, then transferred to reality via the aid of domain randomization (Import AI #93).
– Drones learn to fly in simulators and transfer to reality, showing that purely synthetic data can be used to train movement policies that are subsequently deployed on real drones (Import AI #149).

What this means: Today, some AI developers are repurposing game engines (and sometimes entire games) to help them train smarter and more capable machines. As simulators become more advanced – partially as a natural dividend of the growing sophistication of game engines – what kinds of tasks will be “simcomplete”, in that a simulator is sufficient to solve them for real-world deployment, and what kinds of tasks will be “simhard”, requiring you to gather real-world data to solve it? Understanding the dividing line between these two things will define the economics of training AI systems for a variety of use cases. I can’t wait to read an enterprising AI-Economics graduate students’ paper on the topic.

####################################################

Want data? Try Google’s ‘Dataset Search’:
…Google, but for Data…
Google has released Dataset Search, a search engine for almost 25 million datasets on the web. The service has been in beta for about a year and is now debuting with improvements, including the ability to filter according to the type of dataset.

Is it useful for AI? A preliminary search suggests so, as searches for common things like “ImageNet”, “CIFAR-10”, and others, work well. It also generates useful results for broader terms, like “satellite imagery”, and “drone flight”.

Fun things: The search engine can also throw up gems that a searcher might not have been looking for, but which are usually interesting. E.g., when searching for drones it let me to this “Air-to-Air UAV Aerial Refueling” project page, which seems to have been tagged as ‘data’ even though it’s mostly a project overview. Regardless – an interesting project!
Try out the search engine here (Dataset Search).
Read more: Discovering millions of datasets on the web (Google blog).

####################################################

Facebook releases Polygames to help people train agents in games:
…Can an agent, self-play, and a curriculum of diverse games lead to a more general system?…
Facebook has released Polygames, open source code for training AI agents to learn to play strategy games through self-play, rather than training on labeled datasets of moves. Polygames supports games like Hex, Havannah, Connect6, Minesweeper, Nogo, Othello, and more. Polygames ships with an API developers can use to implement support for their own game within the system.

More games, more generality: Polygames has been designed to encourage generality in agents trained within it, Facebook says. “For example, a model trained to work with a game that uses dice and provides a full view of the opposing player’s pieces can perform well at Minesweeper, which has no dice, a single player, and relies on a partially observable board”, Facebook writes. “We’ve already used the frame to tackle mathematics problems related to Golomb rulers, which are used to optimize the positioning of electrical transformers and radio antennae”.

Why this matters: Given a sufficiently robust set of rules, self-play techniques let us train agents purely through trial and error matches against themselves (or sets of agents being trained in chorus). These approaches can reliably generate super-human agents for specific tasks. The next question to ask is if we can construct a curriculum of enough games with enough complicated rulesets that we could eventually train more general agents that can make strategic moves in previously unseen environments.
Read more: Open-sourcing Polygames, a new framework for training AI bots through self-play (Facebook AI Research webpage).
Get the code from the official Polygames GitHub.

####################################################

What might our world look like as the climate changes? Thanks to GANs, we can render this, rather than imagine it:
…How AI can let us externalize our imagination for political purposes…
Researchers with the Montreal Institute for Learning Algorithms (MILA) want to use AI systems to create images of climate change – the hope being that if people are able to see how the world will be altered, they might try and do something to avert our extreme weather future. Specifically, they use generative adversarial networks to use a combination of real and simulated data to generate street-level views of how places might be altered by sea-level rise.

What they did: They gather 2,000 real images of flooded and non-flooded street-level scenes taken from publicly available datasets such as Mapillary and Flickr. They use this to train an initial CycleGAN model that can warp new images into being flooded or non-flooded, but discover the results are insufficiently realistic. To deal with this, they use a 3D game simulator (Unity) to create virtual worlds with various levels of flooding, then extract 1,000 pairs of flood/no-flood images from this. With this data they use a MUNIT-architecture network (with a couple of tweaks to a couple of loss functions) to train a system on a combination of simulated and real-world data to generate images of flooded spaces.

Why this matters: One of the weird things about AI is it lets us augment our human ability to imagine and extend it outside of our own brains – instead of staring at an image of our house and seeing in our mind’s eye how it might look when flooded, contemporary AI tools can let us generate plausibly real images of the same thing. This allows us to scale our imaginations in ways that build on previous generations of creative tools (e.g., Photoshop). How might the world change as people envisage increasingly weird things and generate increasingly rich quantities of their own imaginings? And might work like this help us all better collectively imagine various climate futures and take appropriate actions?
Read more: Using Simulated Data to Generate Images of Climate Change (Arxiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Reconciling near and long–term

AI ethics and policy concerns are often carved up into ‘near-term’ and ‘long-term’, but this generally results in confusion and miscommunication between research communities, which can hinder progress in the field, according to researchers at Oxford and Cambridge in the UK.

Better distinctions: The authors suggest we instead consider 4 key dimensions along which AI ethics and policy research communities have different priorities:

Capabilities—whether to focus on current/near tech or advanced AI.
Impacts—whether to focus on immediate impacts or much longer run impacts.
Uncertainty—whether to focus on things that are well-understood/certain, or more uncertain/speculative.
Extremity—whether to focus on impacts at all scales, or to prioritize those on particularly large scales.

The research portfolio: I find it useful to think about research priorities as a question of designing the research portfolio—what is the optimal allocation of research across problems, and how should the current portfolio be adjusted. Combining this perspective with distinctions from this paper sheds light on what is driving the core disagreements – for example, finding the right balance between speculative and high-confidence scenarios depends on an individual researcher’s risk appetite, whereas assumptions about the difference between near-term and advanced capabilities will depend on an individual researcher’s beliefs about the pace and direction of AI progress and the influence they can have over longer time horizons, etc. It seems more helpful to view these near- and long term concerns as being situated in terms of various assumptions and tradeoffs, rather than as two sides of a divided research field.
Read more: Beyond Near and Long-Term: towards a Clearer Account of Research Priorities in AI Ethics and Society (arXiv)

Why DeepMind thinks value alignment matters for the future of AI deployment:

Research from DeepMind offers some useful philosophical perspectives on AI alignment, and directions for future research for aligning increasingly complex AI systems with the varied ‘values’ of people.

Technical vs normative alignment: If we are designing powerful systems to act in the world, it is important that they do the right thing. We can distinguish the technical challenge of aligning AI (e.g. building RL agents that don’t resist changes to their reward functions), and the normative challenge of determining the values we should be trying to align it with, the paper explains. It is important to recognize that these are interdependent—how we build AI agents will partially determine the values we can align them with. For example, we might expect it to be easier to align RL agents with moral theories specified in terms of maximizing some reward over time (e.g. classical utilitarianism) than with theories grounded in rights.

The moral and the political: We shouldn’t see the normative challenge of alignment as being to determine the correct moral theory, and loading this into AI. Rather we must look for principles for AI that are widely acceptable by individuals with different moral beliefs. In this way, it resembles the core problem of political liberalism—how to design democratic systems that are acceptable to citizens with competing interests and values. One approach is to design a mechanism that can fairly aggregate individuals’ views—that can take as input the range of moral views and weight them such that the output is widely accepted as fair. Democratic methods seem promising in this regard, i.e. some combination of voting, deliberation, and bargaining between individuals or their representatives.
Read more: Artificial Intelligence, Values, and Alignment (arXiv)

####################################################

Tech Tales:

Indiana Generator

Found it, he said, squinting at the computer. It was nestled inside a backup folder that had been distributed to a cold storage provider a few years prior to the originating company’s implosion. A clean, 14 billion parameter model, trained on the lost archives of a couple of social networks that had been popular sometime in the early 21st century. The data was long gone, but the model that had been trained on it was a good substitute – it’d spit out things that seemed like the social networks it had been trained on, or at least, that was the hope.

Downloading 80%, the screen said, and he bounced his leg up and down while he waited. This kind of work was always in a grey area, legally speaking. 88%. A month ago some algo-lawyer cut him off mid download. 93%. The month before that he’d logged on to an archival site and had to wait till an AI lawyer for his corporation and for a rival duked it out virtually till he could start the download. 100%. He pulled the thumbdrive out, got up from the chair, left the administrator office, and went into the waiting car.

“Wow,” said the billionaire, holding the USB key in front of his face. “The 14 billion?”
“That’s right.”
“With checkpoints?”
“Yes, I recovered eight checkpoints, so you’ve got options.”
“Wow, wow, wow,” he said. “My artists will love this.”
“I’m sure they will.”
“Thank you, once we verify the model, the money will be in your account.”
He thanked the rich person again, then left the building. In the elevator down he checked his phone and saw three new messages about other jobs.

Three months later, he went to the art show. It was real, with a small virtual component; he went in the flesh. On the walls of the warehouse were a hundred different old-style webpages, with their contents morphing from second to second, as different models from different eras of the internet attempted to recreate themselves. Here, a series of smeared cat-memes from the mid-2010s formed and reformed on top of a re-hydrated Geocities. There, words unfurled over old jittering Tumblr backgrounds. And all the time music was playing, with lyrics generated by other vintage networks, laid over idiosyncratic synthetic music outputs, taken from models stolen by him or someone just like him.
“Incredible, isn’t it”, said the billionaire, who had appeared besides him. “There’s nothing quite like the early internet.”
“I suppose,” he said. “Do you miss it?”
“Miss it? I built myself on top of it!“, said the billionaire. “No, I don’t miss it. But I do cherish it.”
“So what is this, then?” he asked, gesturing at the walls covered in the outputs of so many legitimate and illicit models.
“This is history,” said the billionaire. “This is what the new national parks will look like. Now come on, walk inside it. Live in the past, for once.”
And together they walked, glasses of wine in hand, into a generative legacy.

Things that inspired this story: Models and the value of pre-trained models serving as funhouse mirrors for their datasets; models as cultural artefacts; Jonathan Fly’s StyleGAN-ed Reddit; patronage in the 21st century; re-imagining the Carnegies and Rockefellers of old for a modern AI era.

Import AI

February 3, 2020

Import AI: 183: Curve-fitting conversation with Meena; GANs show us our climate change future; and what compute-data arbitrage means

by Jack Clark