Import AI 137: DeepMind uses (Google) StreetLearn to learn to navigate cities; NeuroCuts learns decent packet classification; plus a 490k labelled image dataset

The robots of the future will learn by playing, Google says:
…Want to solve tasks effectively? Don’t try to solve tasks during training!…
Researchers with Google Brain have shown how to make robots smarter by showing them what it means to play without a goal in mind. Google does this by collecting a dataset via people tele-operating a robot in simulation. During these periods of teleoperation, the people are playing around, using the robot hand and arm to interact with the world around them without a specific goal in mind, so in one scene a person might pick up a random object, in another they might fiddle around with a door on a piece of furniture, and so on.

Google saves this data, calling it ‘Learning from Play data’ (LfP). It fees this into a system that attempts to classify such playful sequences of actions, mapping them into a latent space. Meanwhile, another module in the system tries to look across the latent space and propose sequences of actions that could shift the robot from its current state to its goal state.

Multi-task training: Google evaluates this approach by comparing performance of robots trained with play data, from policies that use behavioural cloning to learn to complete tasks based on specific demonstration data. The tests show that robots which learn from play data are more robust to perturbations than ones trained without, and typically reach higher success rates on most tasks.
  Intriguingly, systems trained with play data display some over desirable traits: “We find qualitative evidence that play-supervised models make multiple attempts to retry the task after initial failure”, the researchers write. “Surprisingly we find that its latent plan space learns to embed task semantics despite never being trained with task labels”.

Why this matters: Gathering data for robotics work tends to be expensive, difficult, and prone to distribution problems (you can gather a lot of data, but you may subsequently discover that some quirk of the task or your robot platform means you need to go and re-gather a slightly different type of data). Being able to instead have robots learn behaviors primarily through cheaply-gathered non-goal-oriented play data will make it easier for people to experiment with developing such systems, and could make it easier to create large datasets shared between multiple parties. What might the ‘ImageNet’ for play robotics look like, I wonder?
  Read more: Learning Latent Plans from Play (Arxiv).

#####################################################

Google teaches kids to read with AI-infused ‘Bolo’:
…Tuition app ships with speech recognition and text-to-speech tech…
Google has released Bolo, a mobile app for Android designed to help Indian children learn to read. Bolo ships with ‘Diya’, a software agent that can help children learn to read.

Bilingual: “Diya can not only read out the text to your child, but also explain the meaning of English text in Hindi,” Google writes on its blog. Bolo ships with 50 stories in Hindi and 40 in English. Google says it found that 64% of children that interacted with Bolo showed an improvement in reading after three months of usage.
  Read more: Introducing ‘Bolo’: a new speech based reading-tutor app that helps children learn to read (Google India Blog).

#####################################################

490,000 fashion images… for science:
…And advertising. Lots and lots of advertising, probably…
Researchers with SenseTime Research and the Chinese University of Hong Kong have released DeepFashion2, a dataset containing around 490,000 images of 13 clothing categories from commercial shopping stores as well as consumers.

Detailed labeling: In DeepFashion2, “each item in an image is labeled with scale, occlusion, zoom-in, viewpoint, category, style, bounding box, dense landmarks and per-pixel mask,” the researchers write. “To our knowledge, clothing pose estimation is presented for the first time in the literature by defining landmarks and poses of 13 categories that are more diverse and fruitful than human pose”, the authors write.

The second time is the charm: DeepFashion2 is a follow-up to DeepFashion, which was released in early 2017 (see: Import AI #33). DeepFashion2 has 3.5X as many annotations as DeepFashion.

Why this matters: It’s likely that various industries will be altered by widely-deployed AI-based image analysis systems, and it seems probable that the fashion industry will take advantage of various image-analysis techniques to automatically analyze & understand changing fashion trends in the world, in part by automatically analyzing the visual world and using these insights to alter the sorts of clothing being developed, or how it is marketed.
  Read more: DeepFashion2: A Versatile Benchmark for Detection, Post Estimation, Segmentation and Re-Identification of Clothing Images (Arxiv).
  Get the DeepFashion data here (GitHub).

#####################################################

Facebook tries to shine a LIGHT on language understanding:
…Designs a MUD complete with netherworlds, underwater aquapolises, and more…
LIGHT contains humans and AI agents within a text-based multi-player dungeon (MUD). This MUD consists of 663 locations, 3462 objects, and 1755 individual characters. It also ships with data, as Facebook has already collected a set of around 11,000 interactions between humans roleplaying characters in the game.

Graveyards, bazaars, and more: LIGHT contains a surprisingly diverse gameworld – not that the AI agents which play within it will care. Locations that AI agents and/or humans can visit include the countryside, forest, castles (inside and outside) as well as some more bizarre locations like a “city in the clouds” or a “netherworld” or even an “underwater aquapolis”.

Actions and emotions: Characters in LIGHT can carry out a range of physical actions (eat, drink, get, drop, etc) as well as express emotive actions (’emotes’) like to applaud, blush, wave, etc.

Results: To test out the environment, the researchers train some baseline models to predict actions, emotes, and dialogue. They find that a system based on Google’s ‘BERT’ language model (pre-trained on Reddit data) does best. They also perform some ablation studies which indicate that models that are successful in LIGHT use a lot of context, depending on numerous streams of data (dialogue, environment descriptions, and so on).

Why this matters: Language is likely fundamental to how we interact with increasingly powerful systems. I think figuring out how to work with such systems will require us to interact with them in increasingly sophisticated environments, so it’ll be interesting to see how rapidly we can improve performance of agents in systems like LIGHT, and learn whether those improvements transfer over to other capabilities as well.
  Read more: Learning to Speak and Act in a Fantasy Text Adventure Game (Arxiv).

#####################################################

NeuroCuts replaces packet classification systems with learned behaviors:
…Research means that in the future computers will learn to effectively communicate with each other…

In the future, the majority of the ways our computers talk to each other will be managed by customized, learned behaviors, derived by AI systems. That’s the gist of a recent spate of research which has ranged from using AI approaches to try to learn how to perform computer tasks like creating and maintaining database indexes, or figuring out how to automatically search through large documents.

Now, researchers with the University of California at Berkeley and Johns Hopkins University have developed NeuroCuts, a system that uses deep reinforcement learning to figure out how to do effective network packet classification. This is an extremely low-level task, requiring precision and reliability. The deep RL approach works, meaning that “our approach learns to optimize packet classification for a given set of rules and objective, can easily incorporate pre-engineered heuristics to leverage their domain knowledge, and does so with little human involvement”.

Effectiveness: “NeuroCuts outperforms state-of-the-art solutions, improving classification time by 18% at the median and reducing both time and memory usage by up to 3X,” they write.

Why this matters: Adaptive systems tend to be more robust to failures than brittle ones, and one of the best ways to increase the adaptiveness of a system is to let it be able to learn in response to inputs; approaches like applying deep reinforcement learning to problems like network packet classification precede a future where many fundamental aspects of how computers connect to eachother will be learned rather than programmed.   Read more: Neural Packet Classification (Arxiv).

#####################################################

DeepMind teaches agents to navigate cities with ‘StreetLearn’:
…Massive Google Street View-derived dataset asks AI systems to navigate across New York and Pittsburgh…
Have you ever been lost in a city, and tried to navigate yourself to a destination by using landmarks? This happens to me a lot. I usually end up focusing on a particularly tall & idiosyncratic building, and as I walk I update my internal mental map in reference to this building and where I suspect my destination is.

Now, imagine how useful it’d be if when AI systems got lost they could perform such feats of navigation? That’s some of the idea behind StreetLearn, a new DeepMind-developed dataset & challenge to get agents to learn how to navigate across urban areas, and in doing so develop smarter, general systems.

What is StreetLearn? The dataset is build as “an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task,” DeepMind writes. StreetLearn initially consists of two large areas within Pittsburgh and New York City, and is made up of a set of geolocated 360-degree panoramic views, which form the nodes of a graph. In the case of New York City, this includes around 56,000 images, and in the case of Pittsburgh it is about 58,000. The two maps are further sub-divided into distinct regions, also.

Challenging agents with navigation tasks: StreetLearn is designed to be used to develop reinforcement learning agents, so it makes five actions available to an agent: slowly rotate the camera view left or right, rapidly rotate the camera view left or right, and to move forward if there is a free space. The system can also provide the agent with a specific goal, like an image, or following a natural language instruction.

Tasks, tasks, and tasks: To start with, DeepMind has created a ‘Courier’ task, in which the agent starts from a random position and has the goal of getting to within approximately one city block of another randomly chosen location, with the agent getting a higher reward if it takes a shorter route to get between the two locations.
   DeepMind has also developed the “coin_game” in which agents need to find invisible coins scattered throughout the map, and three types of ‘instruction game’, where agents use navigation instructions to get to a goal.

Why this matters: Navigation is a base of the pyramid-type task, where if we are able to develop computers that are good at navigation, we should be able to build a large number of second order applications on top of this as well.
  Read more: The StreetLearn Environment and Dataset (Arxiv).

#####################################################

Reproducibility and other research norms:
…Exploring the tension between reproducible research and enabling abuses…
John Langford, creator of Vowpal Rabbit and a researcher at Microsoft Research, has waded into the ongoing debate about reproducibility within AI research.

The debate:
Currently, the AI community is debating how to force more AI work to be reproducible. Today, some AI research papers are published without code or datasets. Some researchers think this should change, and papers should always come with code and/or data. Other researchers (eg Nando de Freitas at DeepMind) think that while reproducibility is important, there are some cases where you might want to publish a paper but restrict dissemination of some details so as to minimize potential abuses or malicious uses of the technology.

Reproducibility is nice, but so are other things: “Proponents should understand that reproducibility is a value but not an absolute value. As an example here, I believe it’s quite worthwhile for the community to see AlphaGoZero published even if the results are not necessarily easily reproduced.”

Additive conferences: What Langford proposes is adding some optional things to the community, like experimenting with whether reviewers can more effectively review papers if they also have access to code or data, and to explore how authors may or may not benefit from releasing code. These policies are essentially being trialled at ICML this year, he points out. “Is there a need for[sic] go further towards compulsory code submission?” he writes. “I don’t yet see evidence that default skeptical reviewers aren’t capable of weighing the value of reproducibility against other values in considering whether a paper should be published”.

Why this matters: I think figuring out how to strike a balance between maximizing reproducibility and minimizing potential harms is one of the main challenges of current AI research, so blog posts like this will help further this debate. It’s an important, difficult conversation to have.
  Read more: Code submission should be encouraged but not compulsory (John Langford blog).

Tech Tales:

Be The Boss

It started as a game and then, like most games, it became another part of reality. Be The Boss was a massively multiplayer online (MMO) game that was launched in the latter half of the third decade of the 21st century. The game saw players work in a highly-gamified “workplace” based on a 1990s-era corporate ‘cube farm’. Player activities included: undermining coworkers, filing HR complaints to deal with rivals, filling up a ‘relaxation meter’ by temporarily ‘escaping’ the office for coffee and/or cigarettes and/or alcohol. Players enjoyed this game, writing reviews praising it for its “gritty realism”, describing it as a “remarkable document of what life must have been life in the industrial-automation transition period”.

But, as with most games, the players eventually grew bored. Be The Boss lacked the essential drama of other smash-hit games from that era, like Hospital Bill Crisis! and SPECIAL ECONOMIC ZONE. So the designers of Be The Boss created an add-on to the game that delivered on its name; where previously, players competed with each other to rise up the hierarchy of the corporation, they had no real ability to change the rules of the game. With the expansion, this changed, and successful players were entered into increasingly grueling tournaments where the winner – whose identity was kept secret – would be allowed to “Be The Boss” of the entire gameworld, letting them subtly alter the rules of the game. It was this invention that assured the perpetuity of Be The Boss.

Now, all people play is Be The Boss, and rumors get swapped online about which rule was instituted by which boss: who decided that the water fountains should periodically dispense water laced with enough pheremones to make different player-characters fall in love with eachother? Who realized that they could save millions of credits across the entire corporate game world by reducing the height of all office chairs by one inch or so? And who made it so that one in twenty of every sent email would be shuffled to a random person in the game world, instead of the intended recipient?

Much of our art is now based on Be The Boss. We don’t talk about the asteroid miners or the AI psychologists or the people climbing the mountains of Mars: we talk about Joe from Accounting saving The Entire Goddamn Company, or how Susan from HR figured out a way to Pay People Less And Make Them Happy About It. Kids dream of what it would have been like to work in the cubes, and fantasize about how well they could have done.

Things that inspired this story: the videogame ‘Cart Life‘; MMOs; the highest form of capitalism is to disappear from normal life and run the abstract meta-life that percolates into normal life; transmutation; digital absolution.