Import AI 141: AIs play doom at thousands of frames per second; NeurIPS wants reproducible research; and Google creates&scraps AI ethics council.

by Jack Clark

75 seconds: How long it takes to train a network against ImageNet:
…Fujitsu Research claims state-of-the-art ImageNet training scheme…
Researchers with Fujitsu Laboratories in Japan have further reduced the time it takes to train large-scale, supervised learning AI models; their approach lets them train a residual network to around 75% accuracy on the ImageNet dataset after 74.7 seconds of training time. This is a big leap from where we were in 2017 (an hour), and is impressive relative to late-2018 performance (around 4 minutes: see issue #121).

How they did it: The researchers trained their system across 2,048 Tesla V100 GPUs via the Amazon-developed MXNet deep learning framework. They used a large mini-batch size of 81,920, and also implemented layer-wise adaptive scaling (LARS) and a ‘warming up’ period to increase learning efficiency.

Why it matters: Training large models on distributed infrastructure is a key component of modern AI research, and the reduction in time we’ve seen on ImageNet training is striking – I think this is emblematic of the industrialization of AI, as people seek to create systematic approaches to efficiently training models across large amounts of computers. This trend ultimately leads to a speedup in the rate of research reliant on large-scale experimentation, and can unlock new paths of research.
  Read more: Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds (Arxiv).

#####################################################

Ian ‘GANfather’ Goodfellow heads to Apple:
…Machine learning researcher swaps Google for Apple…
Ian Goodfellow, a machine learning researcher who developed an AI approach called generative adversarial networks (GANs), is leaving Google for Apple.

Apple’s deep learning training period: For the past few years, Apple has been trying to fill its ranks with more prominent people working on its AI projects. In 2016 it hired Russ Salakhutdinov, a researcher from CMU who had formerly studied under Geoffrey Hinton in Toronto, to direct its AI research efforts. Russ helped build up more of a traditional academic ML group at Apple, and Apple lifted its customary veil of secrecy a bit with the Apple Machine Learning Journal, a blog that details some of the research done by the secretive organization. Most recently, Apple hired John Giannandrea from Google to help lead its AI strategy. I hope Ian can push Apple towards being more discursive and open about aspects of its research, and I’m curious to see what happens next.

Why this matters: Two of Ian’s research interests – GANs and adversarial examples (manipulations made to data structures that cause neural networks to misclassify things) – have significant roles in AI policy, and I’m wondering if Apple might explore this more through proactive work (making things safer and better) along with policy advocacy.
  Read more: One of Google’s top A.I. people has joined Apple (CNBC).

#####################################################

World’s most significant AI conference wants more reproducible research:
…NeurIPS 2019 policy will have knock-on effect across wider AI ecosystem…
The organizing committee for the Neural Information Processing Systems Conference (NeurIPS, formerly NIPS), has made two changes to submissions for the AI conference: A “mandatory Reproducibility Checklist”, along with “a formal statement of expectations regarding the submission of code through a new Code Submission Policy”.

Reproducibility checklist: Those submitting papers to NeurIPS will fill out a reproducibility checklist, originally developed by researcher Joelle Pineau. “The answers will be available to reviewers and area chairs, who may use this information to help them assess the clarity and potential impact of submissions”.

Code submissions: People will be expected (though not forced – yet) to submit code along with their papers, if they involve experiments that relate to a new algorithm or a modification of an existing one. “It has become clear that this topic requires we move at a careful pace, as we learn where our “comfort zone” is as a community,” the organizers write.

  Non-executable: Code submitted to NeurIPS won’t need to be executable – this helps researchers whose work depends either on proprietary code (for instance, it plugs into a large-scale, proprietary training system, like those used by large technology companies), or who depend on proprietary datasets.

Why this matters: Reproducibility touches on many of the anxieties of current AI research relating to the difference in resources between academic researchers and those at corporate labs. Having more initiatives around reproducibility may help to close this divide, especially done in a (seemingly quite thoughtful) way that lets corporate researchers do things like publishing code without needing to worry about leaking information about internal proprietary infrastructure.
  Read more: Call for Papers (NeurIPS Medium page).
  Check out the code submission policy here (Google Doc).

#####################################################

Making RL research cheaper by using more efficient environments:
…Want to train agents on a budget? Comfortable with your agents learning within a retro hell? Then ViZDoom might be the right choice for you…
A team of researchers from INRIA in France have developed a set of tasks that demand “complex reasoning and exploration”, which can be run within the ViZDoom simulator at around 10,000 environment interactions per second; the goal of the project is to make it easier for people to do reinforcement learning research without spending massive amounts of compute.

Extending ViZDoom: ViZDoom is an implementation of the ancient first-person shooter game, Doom. However, one drawback is that it ships with only eight different scenarios to train agents in. To extend this, the researchers have developed four new scenarios designed to “test navigation, reasoning, and memorization”, variants of which can be procedurally generated.

Scenarios for creating thinking machines: These four scenarios include a navigation task called Labyrinth; Find and return, where the agents needs to find an object in the maze then return to its starting point; Ordered k-item, where the agent needs to collect a few different items in a predefined order; and Two color correlation, where an agent needs to explore a maze to find a column at its center, then pick up objects which are the same color as the column.

Spatial reasoning is… reassuringly difficult: “The experiments on our proposed suite of benchmarks indicate that current state-of-the-art models and algorithms still struggle to learn complex tasks, involving several different objects in different places, and whose appearance and relationships to the task itself need to be learned from reward”.
  Read more: Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer (Arxiv).

######################################################

Facebook wants to make smart robots, so it built them a habitat:
…New open source research platform can conduct large-scale experiments, running 3D world simulators at thousands of frames per second…
A team from Facebook, Georgia Institute of Technology, Simon Fraser University, Intel Labs, and Berkeley, have released Habitat, “a platform for embodied AI research”. The open source software is designed to help train agents for navigation and interaction tasks in a variety of domains, ranging from 3D environment simulators like Stanford’s ‘Gibson’ system or Matterport 3D to fully synthetic datasets like SUNCG.

  “Our goal is to unify existing community efforts and to accelerate research into embodied AI,” the researchers write. “This is a longterm effort that will succeed only by full engagement of the broader research community. To this end, we have opensourced the entire Habitat platform stack.”

  Major engineering: The Habitat simulator can support “thousands of frames per second per simulator thread and is orders of magnitude faster than previous simulators for realistic indoor environments (which typically operate at tens or hundreds of frames per second)”. Speed matters here, because the faster you can run your simulator, the more experience you can collect at each computational timestep. Faster simulators = its cheaper and quicker to train agents.

  Using habitat to test how well an agent can navigate: The researchers ran very large-scale tests on Habitat with a simple task: “an agent is initialized at a random starting position and orientation in an environment and asked to navigate to target coordinates that are provided relative to the agent’s position; no ground-truth map is available and the agent must use only its sensory input to navigate”. This is akin to waking up in a mansion with no memory and needing to get to a specific room…except in this world you do this for thousands of subjective years, since Facebook trains its agents for a little over 70 million timesteps in the simulator.

  PPO outperforms hand-coded SLAM approach: They find in tests that they can develop an AI agent based on a proximal policy optimization (PPO) policy trained via reinforcement learning which outperforms hand-coded ‘SLAM’ systems which implement “a classical robotics navigation pipeline including components for localization, mapping, and planning”.

Why this matters: Environments frequently contribute to the advancement of AI research, and the need for high-performance environments has been accentuated by the recent trend for using significant computational resources to train large, simple models. Habitat seems like a solid platform for large-scale research, and Facebook plans to add new features to it, like physics-based interactions within the simulator and supporting multiple agents concurrently. It’ll be interesting to see how this develops, and what things they learn along the way.
  Read more: Habitat: A Platform for Embodied AI Research (Arxiv).

######################################################

People want their AI assistants to be chatty, says Apple:
…User research suggests people prefer a chattier, more discursive virtual assistant…
Apple researchers want to build personal assistants that people actually want to use, so as part of that they’ve conducted research into how users respond to chatty or terse/non-chatty personal assistants, and how they respond to systems that try to ‘mirror’ the human they are interacting with.

Wizard-of-Oz: Apple composes this as a Wizard-of-Oz study, which means there is basically no AI involved: Apple instead had 20 people (three men and seventeen women – the lack of gender balance is not explained in the paper) take turns sitting in a room, where they would utter verbal commands for a simulated virtual assistant, which was in fact an Apple employee sitting in another room. The purpose of this type of study is to simulate the interactions that may occur between human and AI systems to help researchers figure out what they should build next, and how users might react to what they build.

Study methodology: They tested people against three systems: a chatty system, a non-chatty system, and one which tried to mirror the chattiness of the user.

  When testing the chatty vs non-chatty systems, Apple asked some human users to make a variety of verbal requests relating to alarms, calendars, navigation, weather, factual information, and searching the web. For example, a user make say “next meeting time”, and the simulated agent could respond with (chatty) “It looks like you have your next meeting after lunch at 2 P.M.”, or (non-chatty) “2 P.M.” Participants then classified the qualities of the responses into categories, like: good, off topic, wrong information, too impolite or too casual.

Talk chatty to me: The study finds that people tend to prefer chatty assistants to non-chatty ones, and have a significant preference for agents whose chattiness mirrors the chattiness of the human user. “”Mirroring user chattiness increases feelings of likability and trustworthiness in digital assistants. Given the positive impact of mirroring chattiness on interaction, we proceeded to build classifiers to determine whether features extracted from user speech could be used to estimate their level of chattiness, and thus the appropriate chattiness level of a response”, they explain.

Why this matters: Today’s virtual assistants contain lots and lots of hand-written material and/or specific reasoning modules (see: Siri, Cortana, Alexa, the Google Assistant). Many companies are trying to move to systems where a larger and larger chunk of the capabilities come from behaviors that are learned from interaction with users. To be able to build such systems, we need users that want to talk to their systems, which will generate the sorts of lengthy conversational interactions needed to train more advanced learning-based approaches.

  Studies like this from Apple show how companies are thinking about how to make personal assistants more engaging: primarily, this makes users feel more comfortable with the assistants, but as a secondary effect it can bootstrap the generation of data from which to learn from. There also may be stranger effects: “People not only enjoy interacting with a digital assistant that mirrors their level of chattiness in its responses, but that interacting in this fashion increases feelings of trust”, the researchers write.
  Read more: Mirroring to Build Trust in Digital Assistants (Arxiv).

######################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Google scraps AI ethics council:
Google’s new AI ethics council has been cancelled, just over a week after its launch (see #140).

What went wrong: There was significant backlash from employees at the appointments. ~2,500 employees signed a petition to remove the president of the conservative Heritage Foundation, Kay Coles James, from the council. Her appointment was described as bringing “diversity of thought” to the panel. Employees pointed to James’ track record of political positions described as anti-trans, anti-LGBT and anti-immigrant. There was also anger at the appointment of Dyan Gibbens, CEO of a drone company. A third appointee, Alessandro Acquisti, resigned from the body, saying it was not the right forum for engaging with ethical issues around AI.

What next: In a statement, the company says it is “going back to the drawing board,” and “will find different ways of getting outside opinions on these topics.”

Why it matters: This is an embarrassing outcome for Google, whose employees have again demonstrated their ability to force change at the company. Over and above the issues with appointments, there were reasons to be skeptical of the council as a supervision mechanism – the group was going to meet only four times in person over the next 12 months, and it is difficult to imagine the group being able, in this time, to understand Google’s activities enough to provide any meaningful oversight.
  Read more: Google cancels AI ethics board in response to outcry (Vox).

######################################################

Balancing openness and values in AI research
The Partnership on AI and OpenAI organized an event with members of the AI community, to explore openness in AI research. In particular, they considered how to navigate the tension between openness norms, and minimizing risks from unintended consequences and malicious uses of new technologies. Some of the impetus for the meal was OpenAI’s partial release of the GPT2 language model. Participants role-played an internal review board of an AI company, deciding whether to publish a hypothetical AI advance which may have malicious applications.

Key insights: Several considerations were identified: (1) Organizations should have standardized risk assessment processes; (2) The efficacy of review processes depends on time-frames, and whether other labs are expected to publish similar work. It is unrealistic to think that one lab could unilaterally prevent publication, so it is better to think of decisions as delaying (not preventing) the dissemination of information; (3) AI labs could learn from the ‘responsible disclosure’ process in computer security, where vulnerabilities are disclosed only after there has been sufficient time to patch security issues; (4) It is easier to mitigate risks at an early, design stage, of research, than once research has been completed.

Building consensus: A survey after the event showed consensus across the community that there should be standardized norms and review parameters across institutions. There was not consensus, however, on what these norms should be. PAI identify 3 viewpoints among respondents: one group believed openness is generally the best norm; another believed pre-publication review processes might be appropriate; another believed there should be sharing within trusted groups.
  Read more: When Is It Appropriate to Publish High-Stakes AI Research? (PAI).
  Read more: ATEAC member Joanna Bryson has written a post reflecting on the dissolution of the board, called: What we lost when we lost Google ATEAC (Joanna Bryson’s blog).

######################################################

Amazon shareholders could block government face recognition contract:
The SEC has ruled that Amazon shareholders can vote on two proposals to stop sales of face recognition technologies to law enforcement. The motions, put forward by activist shareholders, will be considered at the company’s annual shareholder meeting. One asks Amazon to stop sales of their Rekognition technology to government unless the company’s board determines it does not pose risks to human and civil rights. The other requests that the board commissions an independent review of the technology’s impacts on privacy and civil liberties. While the motions are unlikely to pass, they put further pressure on the company to address these long-running concerns.
  Read more: Amazon has to let shareholders vote on government Rekognition ban, SEC says (The Verge).
  Read more: A win for shareholders in effort to halt sales of Amazon’s racially biased surveillance tech (OpenMIC).

######################################################

Tech Tales:

Joy Asteroid

The joy asteroid landed in PhaseSpace at two in the morning, pacific time, in March 2025. Anyone with a real-world location that corresponded to the virtual asteroid was inundated with notifications for certain types of gameworld-enhancing augmentations, in exchange for the recording and broadcast of fresh media related to the theme of ‘happiness’.

Most people took the deal, and suddenly a wave of feigned happiness spread across the nearby towns and cities as people posed in bedrooms and parks and cars and trains in exchange for trinkets, sometimes mentioned and sometimes not. This triggered other performances of happiness and joy performances, entirely disconnected from any specific reward – though some who did it said they hoped a reward would magically appear, as it had done for the others.

Meanwhile, in PhaseSpace, the joy persisted, warping most of the rest of virtual reality with it. Joy flowed from PhaseSpace via novel joy-marketing mechanisms, all emanating from a load of financial resources that seemed to have been embedded in the asteroid.

All of this happened in about an hour, and after that people started to work out what the asteroid was. Someone on a social network had already used the term ’emergent burp’, and this wasn’t so far from the truth – something in the vast, ‘world modelling’ neural net that simultaneously modeled various real&virtual simulations while doing forward prediction and planning had spiraled into an emergent fault, leading to an obsession with joy – a reward loop suddenly appeared within the large model, diverging the objective. Most of this happened because of sloppy engineering – many safety protocols these days either have humans periodically calibrating the machines, or are based on systems with stronger guarantees.

The joy loop was eventually isolated, but rather than completely delete it, the developers of the game cordoned off the environment and moved it onto separate servers running on a separate air-gapped network, and created a new premium service for ‘a visit to the land of joy’. They claim to have proved that their networking system will prevent the joy bug from emanating, but they continue to feed it more compute, as people come back with wild tales of lands of bus-sized birds and two-headed sea lions, and trees that grow from the bottom of fat, winged clouds.

The company that operates the system is currently alleged to be building systems to provide ‘live broadcasts’ from the land of joy, to satisfy the demands of online influencers. I don’t want them to do this but I cannot stop them – and I know that if they succeed, I’ll tune in.

Things that inspired this story: virtual reality; imagining an economic ‘clicker-game’-esque version of P