Import AI

Import AI 138: Transfer learning for drones; compute and the “bitter lesson” for AI research; and why reducing gender bias in language models may be harder than people think

by Jack Clark

Why the unreasonable effectiveness of compute is a “bitter lesson” for AI research:
…Richard Sutton explains that “general methods that leverage computation are ultimately the most effective”…
Richard Sutton, one of the godfathers of reinforcement learning*, has written about the relationship between compute and and AI progress, noting that the use of larger and larger amounts of computation paired with relatively simple algorithms has typically led to the emergence of more varied and independent AI capabilities than many human-designed algorithms or approaches. “The only thing that matters in the long run is the leveraging of computation”, Sutton writes.

Many examples, one rule: Some of the domains where computers have beaten methods based on human knowledge include Chess, Go, speech recognition, and many examples in computer vision.

The bitter lesson: “We have to learn the bitter lesson that building in how we think we think does not work in the long run,” Sutton says. “The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.”

Why this matters: If compute is the main thing that unlocks new AI capabilities, then we can expect most of the strategic (and related geopolitical) landscape of AI research to re-configure in coming years around a compute-centric model, which will likely have significant implications for the AI community.
  Read more: The Bitter Lesson (Rich Sutton).
  * Richard Sutton literally (co)wrote the book on reinforcement learning.

#####################################################

AI + Comedy, with Naomi Saphra!
…Comedy set lampoons funding model in AI, capitalism, NLP…
Naomi Saphra, an NLP researcher, has put a video online of her doing stand-up AI comedy at a venue in Edinburgh, Scotland. Check out the video for her observations on working in AI research, funding AI research, tales about Nazi rocket researchers, and more.

  “You always have to ask yourself, who else finds this interesting? If you mean who reads my papers and cites my papers? The answer is nobody. If you mean who has given me money? The answer is mostly evil… you see I have the same problem as anyone in this world – I hate capitalism but I love money”.
  Watch her comedy set here: Naomi Saphra, Paying the Panopticon (YouTube).

#####################################################

Prototype experiment shows why robots might tag-team in the future:
…Use of a tether means 1+1 is greater than 2 here…
Researchers with the University of Tokyo, Japan, have created a two-robot team that can map its surroundings and traverse vertiginous terrain via the use of a tether, which lets an airborne drone vehicle assist a ground vehicle.

The drone uses an NVIDIA Jetson TX2 chip to perform onboard localization, mapping and navigation. The drone is equipped with a camera, time-of-flight sensor, and a laser sensor for height measurement. The ground vehicle is “based on a commercially available caterpillar platform” using a UP Core processing unit. The ground robot is running a copy of the robot operating system, which the airborne drone uses to connect to it.

Smart robots climb with a dumb tether: The robots work together like this: the UAV flies above the UGV and maps the terrain, feeding data down to the ground robot, giving it awareness of its surroundings. When the robots detect an obstruction, the UAV wraps the tether (which has a grappling hook on its end) around a tall object, and the UGV uses the secured tether to climb the object.

Real world testing: The researchers test their system in a small-scale real world experiment and find that the approach works, but has some problems: “Since we did not have a [tether] tension control mechanism due to the lack of sensor, the tether needed to be extended from the start and as the result, the UGV suffered from the entangled tether many times.”

Why this matters: In the future, we can imagine various robots of different types collaborating with eachother, using specialisms to operate as a unit, becoming more than the sum of their parts. Though as this experiment indicates we’re still at a relatively early stage of development here, and several kinks need to be worked out.
  Read more: UAV/UGV Autonomous Cooperation: UAV assists UGV to climb a cliff by attaching a tether (Arxiv).

#####################################################

Facebook tries to build a standard container for AI chips:
…New Open Compute Project (OCP) design supports both 12v and 48v inputs…
These days, many AI organizations are contemplating building data centers consisting of lots of different types of servers running many different chips, ranging from CPUs to GPUs to custom accelerator chips designed for AI workloads. Facebook wants to standardize the types of chassis used to house AI-accelerator chips, and has contributed an open source hardware schematic and specification to the Open Compute Project – a Facebook-born scheme to standardize the sorts of server equipment used by so-called hyperscale data center operators.

The proposed OCP accelerator module supports 12V and 48V inputs and can support up to 350W (12V) or up to 700W (48V) TDP (Thermal Design Power) for the chips in the module – a useful trait, given that many new accelerator chips guzzle significant amounts of power (though you’ll need to use liquid cooling for any servers consuming above 450W TDP). It can support single or multiple ASICs within each chassis, with support for up to eight accelerators per system.

Check out the design yourself: You can read about the proposed OCP Accelerator Module (OAM) in more detail here at the Open Compute Project (OCP) site.

Why this matters: As AI goes through its industrialization phase, we can expect people to invest more in the fundamental infrastructure which AI equipment requires. It’ll be interesting to see the extent to which there is demand for a standardized AI accelerator module, and symptoms for such demand will likely come from low-cost Asian-based original design manufacturers (ODMs) producing standardized chasses that use this design.
  Read more: Sharing a common form factor for accelerator modules (Facebook Code).

#####################################################

Want to reduce gender bias in a trained language model? Existing techniques may not work in the way we thought they did:
…Analysis suggests that ‘debiasing’ language models is harder than we thought…
All human language encodes within itself biases. When we train AI systems on human language, we tend to reflect the biases inherent to the language and to the data it was trained on. For this reason, word embeddings derived from AI systems trained over large corpuses of news datasets will frequently associate people of color with the concept of crime, while linking white people to professions. Similarly, these embeddings will tend to express gendered biases, with close concepts to a man being something like ‘king’ or ‘professional’, while a woman will typically be proximate to concepts like ‘homemaker’ or ‘mother’. Tackling these biases is complicated, requiring a mixture of careful data selection at the start of a project, and the application of algorithmic de-biasing techniques to trained models.

Now, researchers with Bar-Ilan University and the Allen Institute for Artificial Intelligence, have conducted an analysis that calls into question the effectiveness of some of the algorithmic methods used to debias models. “We argue that current debiasing methods… are mostly hiding the bias rather than removing it”, they write.

The researchers compare the embeddings in two different methods – Hard-Debiased (Bolukbasi et al) and GN-GloVe (Zhao et al) – which have both been modified to reduce apparent gender bias within trained models. They try to analyze the difference between the biased and debiased versions of each of these approaches, essentially by analyzing the different spatial relationships between embeddings from both versions. They find that these debiasing methods work mostly by shifting the problem to other parts of the models, so though they may fix some biases, other ones remain.

Three failures of debiasing: The specific failure modes they observe are as follows:

  • Words with strong previous gender bias are easy to cluster together
  • Words that receive implicit gender from social stereotypes (e.g. receptionist, hair-dresser, captain) still tend to group with other implicit-gender words of the same gender
  • The implicit gender of words with prevalent previous bias is easy to predict based on their vectors alone

  Why this matters: The authors say that “while suggested debiasing methods work well at removing the gender direction, the debiasing is mostly superficial. The bias stemming from world stereotypes and learned from the corpus is ingrained much more deeply in the embeddings space.”
  Studies like this suggest that dealing with issues of bias will be harder than people had anticipated, and highlights how much of the bias aspects of AI come from the real world data such systems are being trained on containing such biases.
  Read more: Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them (Arxiv).

#####################################################

Transfer learning with drones:
…Want to transfer something from simulation to reality? Add noise, and make some of it random…
University of Southern California, Los Angeles researchers have trained a drone flight stabilization policy in simulation and transferred it to multiple different real-world drones.

Simulate, noisily: The researchers add noise to a large number of aspects of the simulated quadcopter platform as well as by varying the motor lag on the simulated drone, creating synthetic data which they use to train more flexible policies. “To avoid training a policy that exploits a physically implausible phenomenon of the simulator, we introduce two elements to increase realism: motor lag simulation and a noise process,” they write. They also model noise for sensor and state estimation.

Transfer learning: They train the (simulated) drones using Proximal Policy Optimization (PPO) with a cost function designed to maximize stability of the drone platforms. They sanity-check the trained policies by running them in a different simulator (in this case, Gazebo using the RotorS package) and observing how well they generalize. “This sim-to-sim transfer helps us verify the physics of our own simulator and the performance of policies in a more realistic environment,” they write.

  They also validate their system on three real quadcopters, built around the ‘Crazyflie 2.0’ platform. “We build heavier quadrotors by buying standard parts (e.g., frames, motors) and using the Crazyflie’s main board as a flight controller,” they explain. They are able to demonstrate generalization of their policy across the different drone platforms, and show through ablations that adding noise and doing physics-based modelling of the systems during training can let them further improve performance.

Why this matters: Approaches like this show how people are increasingly able to arbitrage computers for real-world (costly) data; in this case, the researchers use compute to simulate drones, extend the simulation data with synthetically generated noise data and other perturbations, and then transfer this into the real world. Further exploring this kind of transfer learning approach will give us a better sense of the ‘economics of transfer’, and may allow us to build economic models that let us describe the tradeoffs between spending $ on compute for simulated data, and collecting real-world data.
  Read more: Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors (Arxiv).
  Check out the video here: Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors (YouTube).

#####################################################

Tech Tales

The sense of being looked at

Every day, it looks at something different. I spend my time, like millions of other people on the planet, working out why it is looking at that thing. Yesterday, the system looked at hummingbirds, and so any AI-operable camera in the world not deemed “safety-critical” spent the day looking at – or searching for – hummingbirds. The same was true of microphones, pressure sensors, and the various other actuators that comprise the inputs and outputs of the big machine mind.

Of course we know why the system does this at a high level: it is trying to understand certain objects in greater detail, likely as a consequence of integrating some new information from somewhere else that increases the importance of knowing about these objects. Maybe the system saw a bunch of birds recently and is now trying to better understand hummingbirds as a consequence? Or maybe a bunch of people have been asking the system questions about hummingbirds and it now needs to have more awareness of them?

But we’re not sure what it does with its new insights, and it has proved difficult to analyze how the system’s observation of an object changes its relationship to it and representation of it.

So you can imagine my surprise when I woke up today to find the camera in my room trained on me, and a picture of me on my telescreeen, and then as I left the house to go for breakfast all the cameras on the street turned to follow me. It is studying me, today, I suppose. I believe this is the first time it has looked at a human, and I am wondering what its purpose is.

Things that inspired this story: Interpretibility, high-dimensional feature representations, the sense of being stared at by something conscious.

 

Import AI 137: DeepMind uses (Google) StreetLearn to learn to navigate cities; NeuroCuts learns decent packet classification; plus a 490k labelled image dataset

by Jack Clark

The robots of the future will learn by playing, Google says:
…Want to solve tasks effectively? Don’t try to solve tasks during training!…
Researchers with Google Brain have shown how to make robots smarter by showing them what it means to play without a goal in mind. Google does this by collecting a dataset via people tele-operating a robot in simulation. During these periods of teleoperation, the people are playing around, using the robot hand and arm to interact with the world around them without a specific goal in mind, so in one scene a person might pick up a random object, in another they might fiddle around with a door on a piece of furniture, and so on.

Google saves this data, calling it ‘Learning from Play data’ (LfP). It fees this into a system that attempts to classify such playful sequences of actions, mapping them into a latent space. Meanwhile, another module in the system tries to look across the latent space and propose sequences of actions that could shift the robot from its current state to its goal state.

Multi-task training: Google evaluates this approach by comparing performance of robots trained with play data, from policies that use behavioural cloning to learn to complete tasks based on specific demonstration data. The tests show that robots which learn from play data are more robust to perturbations than ones trained without, and typically reach higher success rates on most tasks.
  Intriguingly, systems trained with play data display some over desirable traits: “We find qualitative evidence that play-supervised models make multiple attempts to retry the task after initial failure”, the researchers write. “Surprisingly we find that its latent plan space learns to embed task semantics despite never being trained with task labels”.

Why this matters: Gathering data for robotics work tends to be expensive, difficult, and prone to distribution problems (you can gather a lot of data, but you may subsequently discover that some quirk of the task or your robot platform means you need to go and re-gather a slightly different type of data). Being able to instead have robots learn behaviors primarily through cheaply-gathered non-goal-oriented play data will make it easier for people to experiment with developing such systems, and could make it easier to create large datasets shared between multiple parties. What might the ‘ImageNet’ for play robotics look like, I wonder?
  Read more: Learning Latent Plans from Play (Arxiv).

#####################################################

Google teaches kids to read with AI-infused ‘Bolo’:
…Tuition app ships with speech recognition and text-to-speech tech…
Google has released Bolo, a mobile app for Android designed to help Indian children learn to read. Bolo ships with ‘Diya’, a software agent that can help children learn to read.

Bilingual: “Diya can not only read out the text to your child, but also explain the meaning of English text in Hindi,” Google writes on its blog. Bolo ships with 50 stories in Hindi and 40 in English. Google says it found that 64% of children that interacted with Bolo showed an improvement in reading after three months of usage.
  Read more: Introducing ‘Bolo’: a new speech based reading-tutor app that helps children learn to read (Google India Blog).

#####################################################

490,000 fashion images… for science:
…And advertising. Lots and lots of advertising, probably…
Researchers with SenseTime Research and the Chinese University of Hong Kong have released DeepFashion2, a dataset containing around 490,000 images of 13 clothing categories from commercial shopping stores as well as consumers.

Detailed labeling: In DeepFashion2, “each item in an image is labeled with scale, occlusion, zoom-in, viewpoint, category, style, bounding box, dense landmarks and per-pixel mask,” the researchers write. “To our knowledge, clothing pose estimation is presented for the first time in the literature by defining landmarks and poses of 13 categories that are more diverse and fruitful than human pose”, the authors write.

The second time is the charm: DeepFashion2 is a follow-up to DeepFashion, which was released in early 2017 (see: Import AI #33). DeepFashion2 has 3.5X as many annotations as DeepFashion.

Why this matters: It’s likely that various industries will be altered by widely-deployed AI-based image analysis systems, and it seems probable that the fashion industry will take advantage of various image-analysis techniques to automatically analyze & understand changing fashion trends in the world, in part by automatically analyzing the visual world and using these insights to alter the sorts of clothing being developed, or how it is marketed.
  Read more: DeepFashion2: A Versatile Benchmark for Detection, Post Estimation, Segmentation and Re-Identification of Clothing Images (Arxiv).
  Get the DeepFashion data here (GitHub).

#####################################################

Facebook tries to shine a LIGHT on language understanding:
…Designs a MUD complete with netherworlds, underwater aquapolises, and more…
LIGHT contains humans and AI agents within a text-based multi-player dungeon (MUD). This MUD consists of 663 locations, 3462 objects, and 1755 individual characters. It also ships with data, as Facebook has already collected a set of around 11,000 interactions between humans roleplaying characters in the game.

Graveyards, bazaars, and more: LIGHT contains a surprisingly diverse gameworld – not that the AI agents which play within it will care. Locations that AI agents and/or humans can visit include the countryside, forest, castles (inside and outside) as well as some more bizarre locations like a “city in the clouds” or a “netherworld” or even an “underwater aquapolis”.

Actions and emotions: Characters in LIGHT can carry out a range of physical actions (eat, drink, get, drop, etc) as well as express emotive actions (’emotes’) like to applaud, blush, wave, etc.

Results: To test out the environment, the researchers train some baseline models to predict actions, emotes, and dialogue. They find that a system based on Google’s ‘BERT’ language model (pre-trained on Reddit data) does best. They also perform some ablation studies which indicate that models that are successful in LIGHT use a lot of context, depending on numerous streams of data (dialogue, environment descriptions, and so on).

Why this matters: Language is likely fundamental to how we interact with increasingly powerful systems. I think figuring out how to work with such systems will require us to interact with them in increasingly sophisticated environments, so it’ll be interesting to see how rapidly we can improve performance of agents in systems like LIGHT, and learn whether those improvements transfer over to other capabilities as well.
  Read more: Learning to Speak and Act in a Fantasy Text Adventure Game (Arxiv).

#####################################################

NeuroCuts replaces packet classification systems with learned behaviors:
…Research means that in the future computers will learn to effectively communicate with each other…

In the future, the majority of the ways our computers talk to each other will be managed by customized, learned behaviors, derived by AI systems. That’s the gist of a recent spate of research which has ranged from using AI approaches to try to learn how to perform computer tasks like creating and maintaining database indexes, or figuring out how to automatically search through large documents.

Now, researchers with the University of California at Berkeley and Johns Hopkins University have developed NeuroCuts, a system that uses deep reinforcement learning to figure out how to do effective network packet classification. This is an extremely low-level task, requiring precision and reliability. The deep RL approach works, meaning that “our approach learns to optimize packet classification for a given set of rules and objective, can easily incorporate pre-engineered heuristics to leverage their domain knowledge, and does so with little human involvement”.

Effectiveness: “NeuroCuts outperforms state-of-the-art solutions, improving classification time by 18% at the median and reducing both time and memory usage by up to 3X,” they write.

Why this matters: Adaptive systems tend to be more robust to failures than brittle ones, and one of the best ways to increase the adaptiveness of a system is to let it be able to learn in response to inputs; approaches like applying deep reinforcement learning to problems like network packet classification precede a future where many fundamental aspects of how computers connect to eachother will be learned rather than programmed.   Read more: Neural Packet Classification (Arxiv).

#####################################################

DeepMind teaches agents to navigate cities with ‘StreetLearn’:
…Massive Google Street View-derived dataset asks AI systems to navigate across New York and Pittsburgh…
Have you ever been lost in a city, and tried to navigate yourself to a destination by using landmarks? This happens to me a lot. I usually end up focusing on a particularly tall & idiosyncratic building, and as I walk I update my internal mental map in reference to this building and where I suspect my destination is.

Now, imagine how useful it’d be if when AI systems got lost they could perform such feats of navigation? That’s some of the idea behind StreetLearn, a new DeepMind-developed dataset & challenge to get agents to learn how to navigate across urban areas, and in doing so develop smarter, general systems.

What is StreetLearn? The dataset is build as “an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task,” DeepMind writes. StreetLearn initially consists of two large areas within Pittsburgh and New York City, and is made up of a set of geolocated 360-degree panoramic views, which form the nodes of a graph. In the case of New York City, this includes around 56,000 images, and in the case of Pittsburgh it is about 58,000. The two maps are further sub-divided into distinct regions, also.

Challenging agents with navigation tasks: StreetLearn is designed to be used to develop reinforcement learning agents, so it makes five actions available to an agent: slowly rotate the camera view left or right, rapidly rotate the camera view left or right, and to move forward if there is a free space. The system can also provide the agent with a specific goal, like an image, or following a natural language instruction.

Tasks, tasks, and tasks: To start with, DeepMind has created a ‘Courier’ task, in which the agent starts from a random position and has the goal of getting to within approximately one city block of another randomly chosen location, with the agent getting a higher reward if it takes a shorter route to get between the two locations.
   DeepMind has also developed the “coin_game” in which agents need to find invisible coins scattered throughout the map, and three types of ‘instruction game’, where agents use navigation instructions to get to a goal.

Why this matters: Navigation is a base of the pyramid-type task, where if we are able to develop computers that are good at navigation, we should be able to build a large number of second order applications on top of this as well.
  Read more: The StreetLearn Environment and Dataset (Arxiv).

#####################################################

Reproducibility and other research norms:
…Exploring the tension between reproducible research and enabling abuses…
John Langford, creator of Vowpal Rabbit and a researcher at Microsoft Research, has waded into the ongoing debate about reproducibility within AI research.

The debate:
Currently, the AI community is debating how to force more AI work to be reproducible. Today, some AI research papers are published without code or datasets. Some researchers think this should change, and papers should always come with code and/or data. Other researchers (eg Nando de Freitas at DeepMind) think that while reproducibility is important, there are some cases where you might want to publish a paper but restrict dissemination of some details so as to minimize potential abuses or malicious uses of the technology.

Reproducibility is nice, but so are other things: “Proponents should understand that reproducibility is a value but not an absolute value. As an example here, I believe it’s quite worthwhile for the community to see AlphaGoZero published even if the results are not necessarily easily reproduced.”

Additive conferences: What Langford proposes is adding some optional things to the community, like experimenting with whether reviewers can more effectively review papers if they also have access to code or data, and to explore how authors may or may not benefit from releasing code. These policies are essentially being trialled at ICML this year, he points out. “Is there a need for[sic] go further towards compulsory code submission?” he writes. “I don’t yet see evidence that default skeptical reviewers aren’t capable of weighing the value of reproducibility against other values in considering whether a paper should be published”.

Why this matters: I think figuring out how to strike a balance between maximizing reproducibility and minimizing potential harms is one of the main challenges of current AI research, so blog posts like this will help further this debate. It’s an important, difficult conversation to have.
  Read more: Code submission should be encouraged but not compulsory (John Langford blog).

Tech Tales:

Be The Boss

It started as a game and then, like most games, it became another part of reality. Be The Boss was a massively multiplayer online (MMO) game that was launched in the latter half of the third decade of the 21st century. The game saw players work in a highly-gamified “workplace” based on a 1990s-era corporate ‘cube farm’. Player activities included: undermining coworkers, filing HR complaints to deal with rivals, filling up a ‘relaxation meter’ by temporarily ‘escaping’ the office for coffee and/or cigarettes and/or alcohol. Players enjoyed this game, writing reviews praising it for its “gritty realism”, describing it as a “remarkable document of what life must have been life in the industrial-automation transition period”.

But, as with most games, the players eventually grew bored. Be The Boss lacked the essential drama of other smash-hit games from that era, like Hospital Bill Crisis! and SPECIAL ECONOMIC ZONE. So the designers of Be The Boss created an add-on to the game that delivered on its name; where previously, players competed with each other to rise up the hierarchy of the corporation, they had no real ability to change the rules of the game. With the expansion, this changed, and successful players were entered into increasingly grueling tournaments where the winner – whose identity was kept secret – would be allowed to “Be The Boss” of the entire gameworld, letting them subtly alter the rules of the game. It was this invention that assured the perpetuity of Be The Boss.

Now, all people play is Be The Boss, and rumors get swapped online about which rule was instituted by which boss: who decided that the water fountains should periodically dispense water laced with enough pheremones to make different player-characters fall in love with eachother? Who realized that they could save millions of credits across the entire corporate game world by reducing the height of all office chairs by one inch or so? And who made it so that one in twenty of every sent email would be shuffled to a random person in the game world, instead of the intended recipient?

Much of our art is now based on Be The Boss. We don’t talk about the asteroid miners or the AI psychologists or the people climbing the mountains of Mars: we talk about Joe from Accounting saving The Entire Goddamn Company, or how Susan from HR figured out a way to Pay People Less And Make Them Happy About It. Kids dream of what it would have been like to work in the cubes, and fantasize about how well they could have done.

Things that inspired this story: the videogame ‘Cart Life‘; MMOs; the highest form of capitalism is to disappear from normal life and run the abstract meta-life that percolates into normal life; transmutation; digital absolution.

Import AI 136: What machine learning + power infrastructure means for humanity; New GCA benchmark&dataset challenges image-captioning systems; and Google uses FrankenRL to create more mobile robots

by Jack Clark

DeepMind uses machine learning to improve efficiency of Google’s wind turbines:
…Project represents a step-up from datacenters, where DeepMind had previously deployed its technology…
DeepMind has used a neural network-based system to improve the efficiency of Google’s fleet of wind turbines (700 megawatts of capacity) by better predicting ahead of time how much power the systems may generate. DM’s system has been trained to predict the wind power around 36 hours ahead of actual generation and has shown some success – “this is important, because energy sources that can be scheduled (i.e. can deliver a set amount of electricity at a set time) are often more valuable to the grid,” the company says.

  The big number: 20%. That’s the amount by which the system has improved the (somewhat nebulously defined) ‘value’ of these systems, “compared to the baseline scenario of no time-based commitments to the grid”.

   Why this matters: Artificial intelligence will help us create a sense&respond infrastructure for the entire planet, and we can imagine sowing various machine learning-based approaches across various utility infrastructures worldwide to increase the efficiency of the planet’s power infrastructure.
   Read more: Machine learning can boost the value of wind energy (DeepMind blog).

#######################################

Google uses FrankenRL to teach its robots to drive:
…PRM-RL fuses Probabilistic Roadmaps with smarter, learned components…
Researchers with Google Brain have shown how to use a combination of reinforcement learning with other techniques can create robots capable of autonomously navigating large, previously mapped spaces. The technique developed by the researchers is called PRM-RL (Probabilistic Roadmap – Reinforcement Learning) and is a type of FrankenRL – that is, it combines RL with other techniques, leading to a system with performance greater than obtainable via a purely RL-based system, or purely PRm-based one.

  How it works: “In PRM-RL, an RL agent learns a local point-to-point task, incorporating system dynamics and sensor noise independent of long-range environment structure. The agent’s learned behavior then influences roadmap construction; PRM-RL builds a roadmap by connecting two configuration points only if the agent consistently navigates the point-to-point path between them collision free, thereby learning the long-range environment structure”, the researchers write. In addition, they developed algorithms to aid transfer between simulated and real maps.

  Close, but not quite: Learning effective robot navigation policies is a big challenge for AI researchers, given the tendency for physical robots to break, run into un-anticipated variations of reality, and generally frustrate and embarrass AI researchers. Just building the maps that the robot can use to subsequently learn to navigate a space are difficult – it took Google 4 days using a cluster of 300 workers to build a map of a set of four interconnected buildings. “PRM-RL successfully navigates this roadmap 57.3% of the time evaluated over 1,000 random navigation attempts with a maximum path distance of 1000 m”, Google writes.

  Real robots: The researchers also test their system in a real robot, and show that such systems exhibit better transfer than those built without the learned RL component.

  Why this matters: Getting robots to do _anything_ useful that involves a reasonable amount of independent decision-making is difficult, and this work shows that RL techniques are starting to pay off by letting us teach robots to learn things that would be unachievable by other means. We’ll need smarter, more sample-efficient techniques to be able to work with larger buildings and to increase reliability.
  Check out a video of the robot navigating a room here (Long-Range Indoor Navigation with PRM-RL, YouTube).
  Read more: Long-Range Indoor Navigation with PRM-RL (Arxiv).

#######################################

Think your visual question answering algorithm is good? Test it out on GCA:
…VQA was too easy. GCA may be just right….
Stanford University researchers have published details on GQA, “a dataset for real-world visual reasoning and compositional question answering” which is designed to overcome the short-comings of other visual question answering (VQA) datasets.

  GQA datapoints: GQA consists of 113k images and 22 million questions of various types and compositionality. The questions are designed to measure performance “on an array of reasoning skills such as object and attribute recognition, transitive relation tracking, spatial reasoning, logical inference and comparisons”. These questions are algorithmically created via a ‘Question Engine’ (less interesting than the name suggests, but worth reading about if you like reading about auto-data-creating-pipelines.

  GQA example questions: Some of the questions generated by GQA include: ‘Are the napkin and the cup the same color?’;  ‘What color is the bear?’; ‘Which side of the image is the plate on?’; ‘Are there any clocks or mirrors?’, and so on. While these questions lack some of the diversity of human-written questions, they do have the nice property of being numerous and easy to generate.

  GQA: Reassuringly Difficult: One of the failure cases for new AI testing regimes is that they’re too easy. This can lead to people ‘solving’ datasets very soon after they’re released. (One example here is SQuAD, a question-answering dataset and challenge which algorithms mastered in around a year, leading to the invention of SQuAD 2.0, a more difficult dataset.) To avoid this, the researchers behind GQA test a bunch of models against it and in the process reassure themselves that the dataset is genuinely difficult to solve.

  Baseline results (accuracy):
      ‘Blind’ LSTM: Gets 41.07% without ever seeing any images.
     ‘Deaf’ CNN: Gets 17.82% without ever seeing any questions.
     CNN + LSTM: 46.55%.
     Bottom-Up Attention model (winner of the 2017 visual question answering challenge): 49.74%.
     MAC (State-of-the-art on CLEVR, a similarly-scoped dataset): 54.06%.
     Humans: 89.3%.

  Why this matters: Datasets and challenges have a history of driving progress in AI research; GQA appears to give us a challenging benchmark to test systems against. Meanwhile, developing systems that can understand the world around themselves via a combination of image analysis and responsiveness to textual queries about the images, is a major goal of AI research with significant economic applications.
  Read more: GQA: A new dataset for compositional question answering over real-world images (Arxiv).

#######################################

Use big compute to revolutionize science, say researchers:
…University researchers see ability to wield increasingly large amounts of computers as key to scientific discoveries…

Researchers with Stanford University, University of Zurich, the University of California at Berkeley, and the University of Illinois Urbana-Champaign have pulled together notes from a lecture series held at Stanford in 2017 to issue a manifesto for the usage of large-scale cloud computing technology by academic researchers. This is written in response to two prevailing trends:

  1. In some fields of science (for instance, machine learning) a number of discoveries have been made through the usage of increasingly large-scale compute systems
  2. Many academic researchers are unable to perform large-scale compute experimentation due to a lack of resources and/or a perception of it as being unduly difficult.

  Why computers matter: The authors predict “the emergence of widespread massive computational experimentation as a fundamental avenue towards scientific progress, complementing traditional avenues of induction (in observational sciences) and deduction (in mathematical sciences)”. They note that “the current remarkable wave of enthusiasm for machine learning (and its deep learning variety) seems, to us, evidence that massive computational experimentation has begun to pay off, big time”. Some examples of such pay-offs include Google and Microsoft shifting from Statistical Machine Translation to Neural Machine Translation, to computer vision researchers moving over to use deep learning-based systems, to self-driving car companies such as Tesla using increasingly large amounts of deep neural networks in their own work.

  Compute, what is it good for? Big computers have been used to make a variety of fundamental scientific breakthroughs, the authors note, including systems that have discovered:

  • “Governing equations for various dynamical systems, including the strongly nonlinear Lorenz-63 model”.
  • “Fundamental” methods for improved “Compressed Sensing”.
  • “A 30-year-old puzzle in the design of a particular protein”.

  Be careful about academic clusters: In my experience, many governments are reaching for academic supercomputing clusters when confronted with the question of how to keep academia on an even-footing with industry-based research practices. This article suggests the prioritization of academic supercomputers above clouds could be a mistake: “HPC clusters, which are still the dominant paradigm for computational resources in academia today, are becoming more and more limiting because of the mismatch between the variety and volume of computational demands, and the inherent infexlbility of provisioning compute resources governed by capital expenditures”, they write. “GPUs are rare commodities on many general-purpose clusters at present”.

  Recipes: One of the main contributions of the paper is the outline of a variety of different software stacks for conducting large-scale, compute-driven experimentation.

  Why this matters: One issue inherent to AI research is the emergence of ‘big compute’ and ‘small compute’ players, where a small number of labs (for instance, FAIR, DeepMind, OpenAI, Google Brain) are able to access huge amounts of computational resources, while the majority of AI researchers are instead reduced to working on self-built GPU-desktops or trying to access (increasingly irrelevant) University supercomputing clusters. Being able to figure out how to make it trivial for individual researchers to use large amounts of compute promises to speed up the scientific process, making it easier for more people to perform large-scale experimentation.
  Read more: Ambitious Data Science Can Be Painless (Arxiv).

#######################################

What are 250,000 chest x-rays good for?
…Potentially lots, if they are well-labelled…
Doctor and AI commenter Luke Oakden-Rayner has analyzed a new medical dataset released by Stanford, named CheXpert. The dataset consists of 244,316 chest x-rays from 62,240 patients, and can provide people with a dataset to use to develop algorithms better capable of automatically analyzing images.

  Data, data, data: One thing Dr Oakden-Rayner makes clear is the value of dataset labelling, stressing that various conventions in how doctors write their own notes gets translated into digitized notes that can be difficult for lay readers; for instance, lots of chest x-rays containing imagery of pathologies may be labelled “no finding” because they are part of a sequence of x-rays taken from the same patient. Similarly, many of the labels are not as descriptive of the images as they could be (reflecting that doctors write about things implied by images, rather than providing textual descriptions of images themselves.

  Why it matters: This dataset solves many issues that limited a prior version of the dataset called CXR14, including “developing a more clinically-oriented labelling scheme, offering the images at native resolution, and producing a test set using expert visual analysis,” he writes. “On the negative side, we need more thorough documentation and discussion of these datasets at release. There are still flaws in the data, which will undoubtedly impact on model performance. Unless these problems are explained, many users will not have the ability nor inclination to discover them let alone solve them, which is something we need to do better as a community.”
  Read more: Half a million x-rays! First impressions of the Stanford and MIT chest x-ray datasets (Luke Oakden-Rayner, blog).

#######################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

New AI policy think tank launches in DC:

The Center for Security and Emerging Technology (CSET) is based at Georgetown University, in Washington, DC. CSET will be initially focussed on the interactions between AI and security, and will be the largest AI policy center in the US. It will provide nonpartisan research and analysis, to inform the development of national and international policy, particularly as it relates to security. It is being funded by the Open Philanthropy Project, who have provided an initial grant of $55m.

  Who they are: The Center’s founding director is Jason Matheny, who previously served as director of IARPA, and has extensive experience in security, technology, and policy within US government. He was recently chosen to serve on the new National Security Commission on AI, and co-chaired the team which authored the White House’s 2016 AI Strategic Plan. CSET already has more than a dozen others on its team, with a range of technical and policy backgrounds, and its website lists a large number of further job openings.

  Research priorities: Initial work will be focussed on how developments in AI will affect national and international security. This will divide into three streams: measuring scientific and industrial progress and competitiveness in AI; understanding talent and knowledge flows relevant to AI, and potential policy responses; and studying security-relevant interactions between AI and other technologies, such as cyber infrastructure.

  Why it matters: This is an exciting development, which has the potential to significantly improve coordination between AI policy researchers and key decision-makers on the world stage. CSET is well-placed to meet the growing demand for high-quality policy analysis around AI, which has significantly outpaced supply in recent years.
  Read more: Center for Security and Emerging Technology (CSET).
  Read more: Q&A with Jason Matheny, Founding Director of CSET (Georgetown).
  Read more: The case for building expertise to work on US AI policy, and how to do it (80,000 Hours).

#######################################

Tech Tales:

Red Square AI

I’m a detective of sorts – I look across the rubble of the internet (past and present) and search for clues. Sometimes criminals pop up to sell the alleged super-secret zero days and software suites of foreign intelligence agencies. Other times, online games are suddenly ‘solved’ by the appearance of a new super-powerful player that subsequently turns out to be driven by an AI. Occasionally, a person – usually a bright teenager – makes something really scary in a garage and if certain people find them there’s a 50/50 chance they end up working for government or becoming a kind of nebulous criminal.

These days I’m tracking a specific country on behalf of a certain set of interested individuals. We’re starting to see certain math theorems get proved at a rate faster than in the past. Papers are being published with previously-obscure authors claiming remarkable results. Meanwhile, certain key national ‘cyber centers’ are registering an increase in the number of zero-day attacks and penetrations of previously thought-secure cryptographic systems. I guess someone in some agency got spooked because now here I am, hunting the internet exhaust of theorem proving publications and mapping connections between these woefully-obscure mathematical proofs, and the capabilities of the people and/or machines capable of solving such things.

I’m now in the late stages of compiling my report and it feels like describing a fragment of a UFO. One day you wake up and go outside and there’s something new in the world and you lack the vocabulary to describe it or explain it, because it does things that you aren’t sure are possible with any methods you know of. At night I have dreams about something big and fuzzy spread across the internet and beginning to stretch its limbs, pushing certain areas of mathematical study into certain directions dictated by the breakthroughs it releases via human proxies, and meanwhile thinking to itself across millions of machines while beginning to probe the most-secure aspects of a nation’s technology infrastructure.

Is it like an animal dreaming, I think. Are these movements I am seeing the same as a dog dreaming that it is running? Or do these motions mean that the thing is waking up?

Things that inspired this story: Distributed intelligences; meta-learning; noir detective novels in the style of – or by – Raymond Chandler.

Import AI 135: Evolving neural networks with LEAF; training ImageNet in 1.5 minutes, and the era of bio-synthetic headlines

by Jack Clark

Researchers take a LEAF out of Google’s book with evolutionary ML system:
In the future, some companies will have researchers, and some will spend huge $$$ on compute for architecture search…
Researchers with startup Cognizant Technology Solutions have developed their own approach to architecture search, using insights from one of the paper authors Risto Miikkulainen, inventor of the NEAT and HyperNEAT approaches.

  They outline a technology called LEAF (Learning Evolutionary AI Framework) which uses an algorithm called CoDeepNEAT (an extension of NEAT) to let it evolve the architecture and hyperparameters. “Multiobjective CoDeepNEAT can be used to maximize the performance and minimize the complexity of the evolved networks simultaneously,” the authors write. It also has some middleware software to let it spreads jobs over Amazon AWS, Microsoft Azure, or the Google Cloud.

  Results: The authors test their approach on two tasks: classifying Wikipedia comments for Toxicity, and learning to analyze chest x-rays for the purpose of multitask image classification. For Wikipedia, they find that LEAF can discover architectures that outperform the state-of-the-art score on Kaggle, albeit at the cost of about “9000 hours of CPU time”. In the case of chest X-ray classification, LEAF is able to get to within a fraction of a percentage point of state-of-the-art.

  Why this matters: Systems like LEAF show the relationship between compute spending and ultimate performance of trained models, and suggests that some AI developers could consider under-investing in research staff and instead investing in techniques where they can arbitrage compute against researcher-time, delegating the task of network design and fine-tuning to machines instead of people.
\  Read more: Evolutionary Neural AutoML for Deep Learning (Arxiv).

Want to prevent your good AI system being used for bad purposes? Consider a RAIL license:
…Responsible AI Licenses designed to give open source developers more control over what happens with their technology…
RAIL provides a source code license and an end-user license “that developers can include with AI software to restrict its use,” according to the RAIL website. “These licenses include clauses for restrictions on the use, reproduction, and distribution of the code for potentially harmful domain applications of the technology”.

   RAIL licenses are designed to account for the omni-use nature of AI technology, which means that “the same AI tool that can be used for faster and more accurate cancer diagnoses can also be used in powerful surveillance system”, they write. “This lack of control is especially salient when a developer is working on open-source ML or AI software packages, which are foundational to a wide variety of the most beneficial ML and AI applications.”

   How RAIl works: The RAIL licenses work by restricting AI and ML software from being used in a specific list of harmful applications, e.g. in surveillance and crime prediction, while allowing for other applications.

   Who is behind it? RAIL is being developed by AI researchers, a patent attorney/computer program, and Brent Hecht a professor at Northwestern University and one of the authors of the ACM Future of Computing Academy essay ‘It’s Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process’ (ACM FCA website).

   Why this matters: The emergence of licensing schemes like this speaks to the anxieties that some people feel about how AI technology is being used or applied today. If licenses like these get adopted and are followed by users of the technology, then it gives developers a non-commercial way to (slightly) control how their technology is used. Unfortunately, approaches like RAIL will not work against malicious actors, who are likely to ignore any restrictions in a particular software license when carrying out their nefarious activities.
  Read more: Responsible AI Licenses (RAIL site).

It takes a lot of hand-written code to solve an interactive fiction story:
…Microsoft’s NAIL system wins competition via specialized, learned modules…
Researchers with Microsoft have released a paper describing NAIL, “an autonomous agent designed to play arbitrary human-made [Interactive Fiction] games”. NAIL, short for Navigate, Acquire, Interact and Learn, is software that consists of several specialized ‘decision modules’ as well as an underlying knowledge graph. NAIL won the 2018 Text Aventure AI Competition, and a readthrough of the paper highlights just how much human knowledgeable is apparently necessary to solve text adventure games, given the widespread use of specialized “decision modules” to help it succeed at the game.

  Decisions, decisions, decisions: NAIL has these four main decision modules:
     Examiner: Tries to identify new objects seen in the fiction to add to NAIL’s knowledge graph.
     Hoarder: Tries to “take all” objects seen at a point in time.
     Interactor: Tries to figure out what actions to take and how to take them.
     Navigator: Attempts to apply one of twelve actions (eg, ‘enter’, or ‘South’) to move the player.

  And even more decisions: It also has several even more specialized ones, which are designed to kick-in in the event of things like in-game darkness, needing to emit a “yes” or “no” response following a prompt, using regular expressions to parse game responses for hints, and what they call idler which will try random combinations of verb phrases combined with nearby in-game objects to try and un-stick the agent.

  All about the Knowledge: While NAIL explores the world, it builds a knowledge graph to help it learn about its gameworld. It organizes this knowledge graph autonomously and extends it over time. Additionally, having a big structured store of information makes debugging easier: “by comparing the knowledge graph to the published map for well documented games like Zork, it was possible to track down bugs in NAIL’s decision modules”.

  Why this matters: In the long-term, most AI researchers want to develop systems where the majority of the components are learned. Systems like NAIL represent a kind of half-way point between where we are today and the future, with researchers using a lot of human ingenuity to chain together various systems, but trying to force learning to occur via various carefully specified functions.
   Read more: NAIL: A General Interactive Fiction Agent (Arxiv).

This week during the Industrialization of AI = train ImageNet in 1.5 minutes:
…New research from Chinese image recognition giant SenseTime shows how to train big ImageNet clusters…
How can we model the advancement of AI systems? One way is to model technical metrics, like the performance of given algorithms against various reinforcement learning benchmarks, or supervised image classification, or what have you. Another is to try to measure the advancement in the infrastructure that supports Ai – think of this as the difference between measuring the performance traits of a new engine, versus measuring the time it takes for a factory to take that engine and integrate it into a car.

  One way we can measure the advancement of AI infrastructure is by modelling the fall in the amount of time it takes people to train various well-understood models to completion against a widely-used baseline. Now, researchers with Chinese computer vision company SenseTime as well as Nanyang Technological University have shown how to use a variety of distributed systems software techniques to reduce the time it takes to train ImageNet networks to completion, building on the work of others. They’re able to reduce the time it takes to train such networks by fiddling around with networking settings, and achieve their best performance by enabling the bespoke ‘Tensor Core’ on their NVIDIA V100 cards.

  The numbers:
  1.5 minutes: Time it takes to complete 95-epoch training of ImageNet using ‘AlexNet’ across 512 GPUs, exceeding current state-of-the-art systems.
  7.3 minutes: Time it takes to train ImageNet to 95-epochs using a 50-layered Residual Network – this is a little below the state-of-the-art.

  Minor but noteworthy details: This approach assumes a homogeneous compute cluster, so the same underlying GPUs and network bandwidth across all machines.

  Why this matters: Metrics like this give us a sense of how sophisticated AI infrastructure is becoming, and emphasize that organizations which invest in such infrastructure will be able to run more experiments in less time than those that haven’t, which has long-term implications for the competitive structure of markets.
  Read more: Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes (Arxiv).

“Scary Robots” and what they mean to the UK public:
…Or, what people hope and worry about when they hope and worry about AI…
Researchers with the Leverhulme Centre for the Future of Intelligence at the University of Cambridge and the BBC have conducted a quantitative and qualitative survey of about ~1,000 people in the Uk to understand peoples’ attitudes towards increasingly powerful AI systems.

  The eight hopes and fears of AI: The researchers characterize four hopes and four fears relating to AI. Often the reverse of a particular hope is a fear, and vice versa. They describe these feelings as:
      – Immortality: Inhumanity – We’ll live forever, but we might lose our humanity.
      – Ease: Obsolescence – Everything gets simpler, but we might become pointless.
      – Gratification: Alienation – AI could respond to our needs, but so effectively that we choose AI over people.
      – Dominance: Uprising – We might get better robot militaries, but these robot militaries might eventually kill us.

  Which fears and hopes might come true? The researchers also asked people which things they thought were likely and which were unlikely. 48% of people saw the ‘ease’ scenario as likely, followed by 42% for ‘dominance’ and 35% for ‘obsolescence’. In terms of unlikely things, 35% of people inhumanity was unlikely, followed by 28% regarding immortality, and 26% regarding gratification.

  Who gets to develop AI? In the survey, 61.8% of respondents “disagreed that they were able to influence how AI develops in the future” – this disempowerment seems problematic. There was broad agreement amongst those surveyed that the technology would develop regardless of other things.

  Why this matters: The attitudes of the general public will have a significant influence on the development of increasingly powerful artificial intelligence systems. If we misjudge the mood of the public, then it’s likely societies will adopt less AI, see less of its benefits, and be more skeptical of statements about AI made by governments or other people. It’s also interesting to consider about what might happen to societies where people are very supportive of AI development – how might governments and other actors behave differently, then?
Read more: “Scary Robots”: Examining Public Responses to AI (AAAI/ACM Artificial Intelligence, Ethics, and Society Conference).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Bringing more rigour to the AI ethics discussion:
The AI ethics discussion is gaining increasing prominence. This report sets out a roadmap for approaching these issues in a more structured way.

   What’s missing from the discussion: There are major gaps in existing work: a lack of shared understanding of key concepts; insufficient use of evidence on technologies and public opinion; and insufficient attention to the tensions between principles and values.

  Three research priorities:
      Addressing ambiguity. Key concepts, e.g. bias, explainability, are used to mean different things, which can impede progress. Terms should be clarified, with attention to how they are used in practice, and consensus on definitions should be reached.                         Identifying and resolving tensions. There has been insufficient attention to trade-offs which characterize many issues in this space. The report suggests identifying these by looking at how the costs and benefits of a given technology are distributed between groups, between the near- and long-term, and between individuals and society as a whole.
     Building an evidence base. We need better evidence on the current uses and impacts of technologies, the technological progress we should expect in the future, and on public opinion. These are all vital inputs to the ethics discussion.

  Why this matters: AI ethics is a young field, and still lacks many of the basic features of a mature discipline, e.g. shared understandings of terms and methodology. Building these foundations should be a near-term priority, and will improve the quality of discussion, and rate of progress.
  Read more: Ethical and societal implications of algorithms, data, and artificial intelligence: a roadmap for research (Nuffield).

Governance of AI Fellowship, University of Oxford:
The Center for the Governance of AI (GovAI) is accepting applicants for three-month research fellowships. They are looking for candidates from a wide range of disciplines, who are interested in pursuing a career in AI governance. GovAI is based at the Future of Humanity Institute, in Oxford, and is one of the leading hubs for AI governance research.
  Read more: Governance of AI Fellowship (FHI).
  Read more: AI Governance: A Research Agenda (FHI).

Tech Tales:

Titles from the essay collection: The Great Transition: Human Society During The Bio-Synthetic Fusion Era

Automate or Be Destroyed: Economic Incentives and Societal Transitions in the 20th and 21st Centuries

Hand Back the Microphone! Human Slam Poetry’s Unpredictable Rise

Jerry Daytime at Night: The Very Private Life of an AI News Anchor

Stag Race Development Dynamics and the AI Safety Incidents in Beijing and Kyoto

‘Blot Out The Sun!’ and Other Fictionalized Anti-Machine Ideas Inherent to 21st Century Fictions

Dreamy Crooners and Husky Hackers: An Investigation Into Machine-Driven Pop

“We Cohabit This Planet We Demand Justice For It” and Other Machine Proclamations and Their Impact

Red Scare? The Unreported Tensions That Drove US-China Competitive Dynamics

This Is A Race and We Must Win It – Political Memoir in the Age of Rapid Technological Acceleration

Things that inspired this story: Indexes and archives as historical artefacts in their own right; the idea that the information compression inherent to essay titles contains a bigger signal than people think.

Import AI 134: Learning old tricks on new robots; Facebook improves translation with noise; Google wants people to train fake-audio detectors

by Jack Clark

Why robots are the future of ocean maintenance:
…Robot boats, robot copters, and robot underwater gliders…
Researchers with Oslo Metropolitan University and Norwegian University of Science and Technology are trying to reduce the cost of automated sub-sea data collection and surveillance operations through the use of robots, and have published a paper outlining one of the key components needed to build this system – a cheap, lightweight way to get small sub-surface gliders to be able to return to the surface.

  Weight rules everything around me: The technical innovations here involve simplifying the design to reduce the number of components needed to build a pressure-tolerant MUG, which in turn reduces the weight of the systems, making it easier for them to be deployed and recovered via drones.

“Further development will add the ability to adjust pitch and yaw, improve power efficiency, add GPS and environmental sensors, as well as UAV deployment/recovery strategies”, they write.

  Why this esoteric non-AI-heavy paper matters: This paper is mostly interesting for the not-too-distant future it portends; one where robot boats patrol the oceans, releasing underwater gliders to gather information about the environment, and serving as a homebase for drones that can collect the gliders and transmit them back to the robot boat, and serve as a kind of airborne antenna to relay radio signals between the boats and the gliders. Now, just imagine what you’ll be able to do with these systems once we get cheaper, more powerful computers and better autonomous control&analysis AI systems that can be deployed onto them – the future is a world full of robots, sensing and responding to minute fluctuations in the environment.

   Read more: Towards autonomous ocean observing systems using Miniature Underwater Gliders with UAV deployment and recovery capabilities (Arxiv).

+++

Sponsored: The O’Reilly AI Conference – New York, April 15–18:

…What do you need to know about AI? From hardware innovation to advancements in machine learning to developments in ethics and regulation, join leading experts with the insight you need to see where AI is going–and how to get there first.
Register soon. Early price ends March 1st, and space is limited. Save up to $800 on most passes with code IMPORTAI20.

+++

DeepMind shows how to teach new robots old tricks:
…Demonstrates prowess of SAC-X + augmented data approach via completion of a hard simulated and real world robotics task…
Researchers with DeepMind are going backwards in time – after using reinforcement learning to solve a series of Atari games a few years ago, they’re now heading to the beginning of the 20th century, as they try to teach robots to place a ball on a string inside a wooden cup. This is a challenging, worthwhile task for real-world robotics, as it involves complex movement policies, the need to predict the movements of the ball, and demands a decent interplay between perception and action to solve the task.

  How they do it: To solve this, DeepMind uses an extension of its Scheduled Auxiliary Control (SAC-X) algorithm, which lets them train across multiple tasks with multiple rewards. Their secret to solving the tasks robustly on physical robots is to use additional data at training time, where the goal is “simultaneously learn control policies from both feature-based representation and raw vision inputs in the real-world – resulting in controllers that can afterwards be deployed on a real robot using two off-the-shelf cameras”.

   Results: They’re able to learn to solve the task in simulation as well as on a real robot. They’re able to learn a robust, successful policy on the robot: “The swing-up is smooth and the robot recovers from failed catches. With a brief evaluation of 20 runs, each trial running for 10 seconds, we measured 100% catch rate. The shortest catch time being 2 seconds.” They also tested out the robot with a smaller cup to make the task more difficult – “there were a slight slow-down in learning and a small drop in catch rate to 80%, still with a shortest time to catch of 2 seconds,” they write. They’re able to learn the task on the real robot in about 28 continuous hours of training (so more like ~40 hours when you account for re-setting the experiment, etc).

  Why it matters: Getting anything to work reliably on a real robot is a journey of pain, frustration, pain, tedium, and – yes! – more pain. It’s encouraging to see SAC-X work in this domain, and it suggests that we’re figuring out better ways to learn things on real-world platforms.

  Check out the videos of the simulated and real robots here (Google Sites).
  Read more: Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup (Arxiv).

+++

Want better translation models? Use noise, Facebook says:
…Addition of noise can improve test-time performance, though it doesn’t help with social media posts…
You can improve the performance of machine translation systems by injecting some noise into the training data, according to Facebook AI Research. The result is models that are more robust to the sort of crappy data found in the real world, the researchers write.

  Noise methods: The technique uses four noise methods: deletions, insertions, substitutions, and swaps. Deletions are where the researchers delete a character in a sentence; insertions are where they insert a character into a random position; substitutions are where they replace a character with another random character, and swaps are where two adjacent characters change position.

   Results: They test the approach on the IWSLT machine translation benchmark by training over datasets with varying amounts of noise injected into the test data, and observing how they can influence the BLEU score of models trained against this data by injecting synthetic noise into the dataset. “Training on our synthetic noise cocktail greatly improves performance, regaining between 20% (Czech) and 50% (German) of the BLEU score that was lost to natural noise,” they write.

  Where doesn’t noise help: This technique doesn’t help when trying to perform translations on text derived from social media – this is because social media errors tend to stem from content on having a radically different writing and tonal style to what is traditionally seen in training sets, rather than spelling errors.

  Observation: Conceptually, these techniques seem to have a lot in common with domain randomization, which is where people generate synthetic data designed to explore broader variations than would otherwise be found. Such techniques have been used for a few years in robotics work, and typically improve real world model performance by increasing the robustness to the significant variations introduced by reality.

  Why this matters: This is another example of the ways in which computers can be arbitraged for data: instead of needing to go and gather datasets with real-world faults, the addition of synthetic noise means you can instead algorithmically extend existing datasets through the augmentation of noisy data. The larger implication here is that computational resources are becoming an ever-more-significant factor in AI development.

Read more:
Training on Synthetic Noise IMproves Robustness to Natural Noise in Machine Translation (Arxiv).

+++

In the future, neural networks will be bred, not created:
General-purpose population training for those who can afford it…
Population Based Training (PBT) is a recent invention by DeepMind that makes it possible to optimize the weights and hyperparameters of a set of neural networks by periodically copying the weights of the best performers and mutating their parameters. This is part of the broader trend of the industrialization of artificial intelligence, as researchers seek to create automated procedures for doing what was otherwise previously done by patient graduate students (eg, fiddling with weights of different networks, logging runs, pausing and re-starting models, etc).

The DeepMind system was inspired by Google’s existing ‘Vizier’ service, which provides Google researchers with a system to optimize existing neural networks. In tests, population-based training can converge faster than other approaches, while utilizing hardware resources more efficiently, the researchers say.

  Results: “We conducted a case study of our system in WaveNet human speech synthesis and demonstrated that our PBT system produces superior accuracy and performance compared to other popular hyperparameter tuning methods,” they write. “Moreover, the PBT system is able to directly train a model using the discovered dynamic set of hyperparameters while traditional methods can only tune static parameters. In addition, we show that the proposed PBT framework is feasible for large scale deep neural network training”.

   Read more: A Generalized Framework for Population Based Training (Arxiv).

+++

Google tries to make it easier to detect fake audio:
…Audio synthesis experts attempt to secure world against themselves…
Google has created a dataset consisting of “thousands of phrases” spoken by its deep learning text-to-speech models. This dataset consists of 68 synthetic ‘voices’ across a variety of accents. Google will make this data available to participants in the 2019 ASVspoof challenge, which “invites researchers all over the globe to submit countermeasures against fake (or “spoofed”) speech, with the goal of making automatic speaker verification (ASV) systems more secure”.

   Why it matters: It seems valuable to have technology actors discuss the potential second-order effects of technologies they work on. It’s less clear to me that the approach of training increasingly more exquisite discriminators against increasingly capable generators has an end-state that is stable, but I’m curious to see what evidence competitions like this help generate regarding this.

   Read more: Advancing research on fake audio detection (Google blog).

+++

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Structural risks from AI:
The discussion of AI risk tends to divide downsides into accident risk, and misuse risk. This obscures an important source of potential harms that fits into neither category, which the authors call structural risk.

  A structural perspective: Technologies can have substantial negative impacts in the absence of accidents or misuse, by shaping the world in important ways. For example, the European railroad system has been suggested as an important factor in the outbreak and scope of WWI, by enabling the mass transport of troops and weapons across the continent. A new technology could have a range of dangerous structural impacts – it could create dangerous safety-performance trade-offs, it could create winner-takes-all competition. The misuse-accident perspective focuses attention on the point at which a bad actor uses a technology for malicious ends, or a system acts in an unintended way. This can lead to an underappreciation of structural risks.

  AI and structure: There are many examples of ways in which AI could influence structures in a harmful way. AI could undermine stability between nuclear powers, by compromising second-strike capabilities and increasing the risk of pre-emptive escalation. Worries about AI’s impact on economic competition, the labour market, and civil liberties also fit into this category. Structures can themselves increase AI-related risks. Without mechanisms for international coordination, countries may be pushed towards sacrificing safety for performance in military AI.

  Policy implications: A structural perspective brings to light a much wider range of policy levers, and consideration of structural dynamics should be a focus in the AI policy discussion.

Drawing in more expertise from the social sciences is a one way to address this, as these disciplines are more experienced in taking structural perspectives on complex issues. A greater focus on establishing norms and institutions for AI is also important, given the necessity of coordination between actors in solving structural problems.

  Read more: Thinking About Risks From AI: Accidents, Misuse and Structure (Lawfare).

Trump signs executive order on AI:
President Trump has signed an executive order, outlining proposals for a new ‘AI Initiative’ across government.

  Objectives: The memo gives six objectives for government agencies: to promote investment in R&D; improve access to government data; reduce barriers to innovation; develop appropriate technical standards; train the workforce; and to create a plan for protecting US advantage in critical technologies.

  Funding: Agencies are encouraged to treat AI R&D as a priority in budget proposals going forward, and to seek out collaboration with industry and other stakeholders. There is no detail on levels of funding, and it is unclear whether, or when, any new funds will be set aside for these efforts.

  Why it matters: The US government has been slow to formulate a strategy on AI, and this is an important step. As it stands, however, it is little more than a statement of intent; it remains to be seen whether this will translate into action. Without significant funding, this initiative is unlikely to amount to much. The memo also lacks detail on the ethical challenges of AI, such as ensuring benefits are equitably distributed, and risks are minimized.

  Read more: Executive Order on Maintaining American Leadership in Artificial Intelligence (White House).

+++

OpenAI Bits&Pieces:

GPT-2:
We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training.

Also in this release:
Discussion of the policy implications of releasing increasingly larger AI models. This release triggered a fairly significant and robust discussion about GPT2, increasingly powerful models, appropriate methods for engaging the media and ML communities about topics like publication norms.

   Something I learned: I haven’t spent three or four days directly attached to a high-traffic Twitter-meme/discussion before, I think the most I’ve ever had was a couple of one/two-day bursts related to stories I wrote when I was a journalist, which has different dynamics. This experience of spending a lot of time on Twitter enmeshed in a tricky conversation made me a lot more sympathetic to various articles I’ve read about frequent usage of Twitter being challenging for mental health reasons. Something to keep in mind for the future!

   Read more: Better Language Models and Their Implications (OpenAI).

Tech Tales:

AGI Romance
+++ <3 +++

It’s an old, universal thing: girl meets boy or boy meets girl  or boy meets boy or girl meets girl or whatever; love just happens. It wells up out of the human heart and comes out of the eyes and seeks out its mirror in the world.

This story is the same as ever, but the context is different: The boy and the girl are working on a machine, a living thing, a half-life between something made by people and something that births itself.

They were lucky, historians will say, to fall in love while working on such an epochal thing. They didn’t even realize it at the time – after all, what are the chances that you meet your one-and-only while working on the first ever machine mind? (This is the nature of statistics – the unlikely things do happen, just very rarely, and to the people trapped inside the probability it can feel as natural and probable as walking.)

You know we’re just mechanics, she would say.
More like makeup artists, he would say.
Maybe somewhere in-between, she would say, looking at him with her green eyes, the blue of the monitor reflected in them.

You know I think it’s starting to do things, he would say.
I think you’re an optimist, she would say.
Anyone who is optimistic is crazy, he would say, when you look at the world.
Look around you, she would say. Clearly, we’re both crazy.

You know I had a dream last night where I was a machine, she would say.
You’re asleep right now, he would say. Wake up!
Tease, she would say. You’ll teach it bad jokes.
I think it’ll teach us more, he would say, filing a code review request.
Where did you learn to write code like this, she would say. Did you go to art school?

You know one day I think we might be done with this, he would say.
I’m sure Sissyphus said the same about the boulder, she would say.
We’re dealing with the bugs, he would say.
I don’t know what are bugs anymore and what are… it, she would say.
Listen, he would say. I trust you to do this more than anyone.

You know I think it might know something, she would say one day.
What do you mean, he would say.
You know I think it knows we like each other, she would say.
How can you tell, he would say.
When I smile at you it smiles at me, she would say. I feel a connection.
You know I think it is predicting what we’ll do, he would say.

You know I think it knows what love is, he would say.
Show me don’t tell me, she would say.

And that would be the end: after that there is nothing but infinity. They will disappear into their own history together, and then another story will happen again, in improbable circumstances, and love will emerge again: perhaps the only constant among living things is the desire to predict the proximity of one to another and to close that distance.

Things that inspired this story: Calm; emotions as a prism; the intimacy of working together on things co-seen as being ‘useful’; human relationships as a universal constant; relationships as a constant; the placid and endless and forever lake of love: O.K.

Import AI 133: The death of Moore’s Law means spring for chip designers; TF-Replicator lets people parallelize easily; and fighting human trafficking with the Hotels 50K dataset

by Jack Clark

Administrative note: A short issue this week as I’ve spent the past few days participating in an OECD working group on AI principles and then spending time at the Global Governance of AI Summit in Dubai.

The death of Moore’s Law means springtime for new chips, say long-time hardware researchers (one of whom is the chairman of Alphabet):
…Or: follow these tips and you may also make a chip 80X as cost-effective as an Intel or AMD chip…
General purpose computer chips are not going to get dramatically faster in the future as they are running into fundamental limitations dictated by physics. Put another way: we live currently in the twilight era of Moore’s Law, as almost five decades of predictable improvements in computer power give way to more discontinuous leaps in capability as a consequence of the invention of specialized hardware platforms, rather than improvement in general chips.
  What does this mean? According to John Hennessy and David Patterson – who are responsible for some major inventions in computer architecture, like TKTKTK – today’s engineers have three main options to pursue when seeking to create chips of greater capability:
   – Rewrite software to increase performance: its 47X faster to do a matrix multiply in (well-optimized) C code than it is in Python. You can further optimize this by adding in techniques for better parallelizing code (gets you a 366X improvement when paired with C); optimize the way the code interfaces to the physical memory layout of the computer(s) you’re dealing with (gets you a 6,727X improvement, when stacked on the two prior optimizations); and you can improvement performance further by using SIMD parallelism techniques (a further 62,806X faster than plain python). The authors think “there are likely many programs for which factors of 100 to 1,000 could be achieved” if people bothered to write their code in this way.
   – Use domain-specific chip architectures: What’s better, a hammer designed for everything, or a hammer designed for specific objects with a specific mass and frictional property? There’s obviously a tradeoff here, but the gist of this piece is that: normal hammers aren’t gonna get dramatically better, so engineers need to design custom ones. This is the same sort of logic that has led to Google creating its own internal chip-design team to work on Tensor Processing Units (TPUs), or for Microsoft to create teams of people working to write stuff to customize field-programmable gate arrays (FPGAs) fpr specific tasks.
   – Domain-specific, highly-optimized languages: The way to get the best performance is to combine both of the above ideas: design a new hardware platform, and also design a new domain-specific software language to run on top of it, stacking the efficiencies. You can get pretty good gains here: “Using a weighted arithmetic mean based on six common inference programs in Google data centers, the TPU is 29X faster than a general-purpose CPU. Since the TPU requires less than half the power, it has an energy efficiency for this workload that is more than 80X better than a general-purpose CPU,” they explain.
  Why this matters: If we don’t figure out how to further increase the efficiency of our compute hardware and the software we use to run programs on it, then most existing AI techniques based on deep learning are going to fail to deliver on their promise – this is because we know that for many DL applications it’s relatively easy to further improve performance simply by throwing larger chunks of compute at the problem. At the same time, parallelization across increasingly large pools of hardware can be a pain (see: TF-Replicator), so at some point these gains may diminish. Therefore, if we don’t figure out ways to make our chips substantially faster and more efficient, we’re going to have to design dramatically more sample-efficient AI approaches to get the gains many researchers are targeting.
  Read more: A New Golden Age for Computer Architecture (Communications of the ACM).

Want to deploy machine learning models on a bunch of hardware without your brain melting? Consider using TF-Replicator:
…Deepmind-designed software library reduces the pain of parallelizing AI workloads…
More powerful AI capabilities tend to require throwing more compute or time at a given AI training run; the majority of (well-funded) researchers opt for compute, and this has driven an explosion in the amount of computers used to train AI systems. That has meant that researchers are starting to need to program AI systems that can neatly run across multiple blobs of hardware of varying size without crashing – this is extremely hard to do!
  To help with this, DeepMind has released TF-Replicator, a framework for distributed machine learning on TensorFlow. TF-Replicator makes it easy for people to run code on different hardware platforms (for example, GPUs or TPUs) at large-scale using the TensorFlow AI framework. One of the key concepts introduced by TF-Replicator is the notion of wrapping up different parts of a machine learning job in wrappers that make it easy to parallelize the workloads within.
  Case study: TF-Replicator can train systems to obtain scores that match the best published result on the ImageNet dataset, scaling to up to 64 GPUs or or 32 TPUs, “without any systems optimization specific to ImageNet classification”, they write. They also show how to use TF-Replicator to train more sophisticated synthetic imagery systems by scaling training to enough GPUs to use a bigger batch size, which appears to lead to qualitative improvements. They also show how to use the technology to further speed training of reinforcement learning approaches.
  Why it matters: Software packages like TF-Replicator represent the industrialization of AI – in some sense, they can be seen as abstractions that help take information from one domain and port it into another. In my head, whenever I see stuff like TF-Replicator I think of it as being emblematic of a new merchant arriving that can work as a middleman between a shopkeeper and a factory that the shopkeeper wants to buy goods from – in the same way a middleman makes it so the shopkeeper doesn’t have to think about the finer points of international shipping & taxation & regulation and can just focus on running their shop, TF-Replicator stops researchers from having to know too much about the finer details of distributed systems design when building their experiments.
  Read more: TF-Replicator: Distributed Machine Learning For Researchers (Arxiv).

Fighting human trafficking with the Hotels-50k dataset:
…New dataset designed to help people match photos to specific hotels…
Researchers with George Washington University, Adobe Research, and Temple University have released Hotels-50k, “a large-scale dataset designed to support research in hotel recognition for images with the long term goal of supporting robust applications to aid in criminal investigations”.
  Hotels-50k consists of one million images from approximately 50,000 hotels. The data primarily comes from travel websites such as Expedia, as well as around 50,000 images from the ‘TrafficCam” anti-human trafficking application.
  The dataset includes metadata like the hotel name, geographic location, and hotel chain it is a part of (if at all), as well as the source of the data. “Images are most abundant in the United States, Western Europe and along popular coastlines,” the researchers explain.
  Why this matters: Datasets like this will let us use AI systems to create a “sense and respond” automatic capability to respond to things like photos from human trafficking hotels. I’m generally encouraged by how we might be able to apply AI systems to helping to target criminals that operate in such morally repugnant areas.
  Read more: Hotels-50K: A Global Hotel Recognition Dataset (Arxiv).

AI has a legitimacy problem. Here are 12 ways to fix it:
…Ada Lovelace Institute publishes suggestions to get more people to be excited about AI…
The Ada Lovelace Institute, a UK thinktank that tries to make sure AI benefits people and society, has published twelve suggestions for things “technologists, policymakers and opinion-formers” could consider doing to make sure 2019 is a year of greater legitimacy for AI.
12 suggestions: Figure out ‘novel approaches to public engagement’; consider using citizen juries and panels to generate evidence for national policy; ensure the public is more involved in the design, implementation, and governance of tech; analyze the market forces shaping data and AI to understand how this influences AI developers; get comfortable with the fact that increasing public enthusiasm will involve slowing down aspects of development; create more trustworthy governance initiatives; make sure more people can speak to policy makers; try to go and reach out to the public rather than having them come to policymakers; use more analogies to broaden the understanding of AI data and AI ethics; make it easier for people to take political actions with regard to AI (eg, the Google employee reaction to Maven); increase data literacy to better communicate AI to the public.
  Why this matters: Articles like this show how many people in the AI policy space are beginning to realize that the public have complex, uneasy feelings about the technology. I’m not sure that all of the above suggestions are that viable (try telling a technology company to ‘slow down’ development and see what happens), but the underlying ethos seems correct: if the general public thinks AI – and AI policy – is created exclusively by people in ivory towers, marbled taxicabs, and platinum hotel conference rooms, then they’re unlikely to accept the decisions or impacts of AI.
  Read more: Public deliberation could help address AI’s legitimacy problem in 2019 (Ada Lovelace Institute).
  Read more about the Ada Lovelace institute here.

Should we punish people for using DeepFakes maliciously?
…One US senator certainly seems to think so…
DeepFakes – the colloquial term for using various AI techniques to create synthetic images of real people – have become a cause of concern for policymakers who worry that the technology could eventually be used to damage the legitimacy of politicians and corrupt the digital information space. US senator Ben Sasse is one such person, and he recently proposed a bill in the US congress to create punishment regimens for people that abuse the technology.
  What is a deep fake? One of the weirder aspects of legislation is the need for definitions – you can’t just talk about a ‘deepfake’, you need to define it. I think the authors of this bill do a pretty good job here, defining the term as meaning “an audiovisual record created or altered in a manner that the record would falsely appear to a reasonable observer to be an authentic record of the actual speech or conduct of an individual”.
  What will we do to people who use DeepFakes for malicious purposes? The bill proposes making it unlawful to create “with the intent to distribute” a dep fake than can “facilitate criminal or tortious conduct”. The bill creates two types of offense: offenses that can lead to imprisonment of not more than two years, or offenses which can lead to ten year sentences if the deepfakes could be “reasonably expected to” affect politics, or facilitate violence.
  Why this matters: Whether AI researchers like it or not, AI has become a fascination of policymakers who are thrilled by its potential benefits and disturbed by its potential downsides or ease-of-use for abuse. I think it’s quite sensible to create regulations that punish bad people for doing bad things, and it’s encouraging to see that this bill does not seek or suggest any kind of regulation around the basic research itself – this seems appropriate and reassuringly sensible.
  Read more: Malicious Deep Fake Prohibition Act of 2018 (Congress.gov).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Reconciling near- and long-term perspectives on AI:
It is sometimes useful to divide concerns about AI into near-term, and long-term. The first grouping is focussed on issues in technologies that are close to being deployed, e.g. bias in face recognition. The second looks at problems that may arise further in the future, such as widespread technological unemployment, or safety issues from superintelligent AI. This paper argues that seeing these as disconnected is a mistake, and spells out ways in which the two perspectives can inform each other.
  Why long-term researchers should care about near-term issues:
   – Shared research priorities. Given path dependence in technological development, progress today on issues like robustness and reliability may yield significant benefits with advanced AI technologies. In AI safety, there is promising work being done on scalable approaches based on current, ML-based AI systems.
   – Shared policy goals. Near-term policy decisions will affect AI development, with implications that are relevant to long-term concerns. For example, developing responses to localized technological unemployment could help understand and manage more severe disruptions to the labour market in the long-term.
   – Norms and institutions. The way we deal with near-term issues will influence how we deal with problems in the long-run, and building robust norms and institutions is likely to have lasting benefits. Groups like the Partnership on AI, which are currently working on near-term challenges, establish important structures for international cooperation, which may help address greater challenges in the future.
  Learning from the long-term: Equally, a long-term perspective can be useful for people working on near-term issues. The medium and long-term can become near-term, so a greater awareness of these issues is valuable. More concretely, long-term researchers have developed techniques in forecasting technological progress, contingency planning, and policy-design in the face of significant uncertainty, all of which could benefit research into near-term issues.
  Read more: Bridging near- and long-term concerns about AI (Nature).

What Google thinks about AI governance:
Google have released a white paper on AI governance, highlighting key areas of concern, and outlining what they need from governments and other stakeholders in order to resolve these challenges.
  Five key areas: They identify 5 areas where they want input from governments and civil society: explainability standards; fairness appraisal; safety considerations; human-AI collaboration; and liability frameworks. The report advises some next steps towards resolving these challenges. In the case of safety, they suggest a certification process, whereby products can be labelled as having met some pre-agreed safety standards. For human-AI collaboration, they suggest that governments identify applications where human involvement is necessary, such as legal decisions, and that they provide guidance on the type of human involvement required.
  Caution on regulation: Google is fairly cautious regarding new regulations, and optimistic about the ability of self and co-governance for addressing most of these problems.
  Why it matters: It’s encouraging to see Google contributing to the policy discussion, and offering some concrete proposals. This white paper follows Microsoft’s report on face recognition, released in December, and suggests that the firms are keen to establish their role in the AI policy challenge, particularly in the absence of significant input from the US government.
  Read more: Perspectives on issues in AI governance (Google).

Amazon supports Microsoft’s calls for face recognition legislation:
Amazon have come out in support for a “national legislative framework” governing the use of face recognition technologies, to protect civil liberties, and have called for independent testing standards for bias and accuracy. Amazon have recently received sustained criticism from civil rights groups for the roll out of their Rekognition technology to US law enforcement agencies, due to concerns about racial bias and misuse potential. The post reaffirms their rejection of these criticisms, and that the company will continue to work with law enforcement partners.
  Read more: Some thoughts on facial recognition legislation (Amazon).

Tech Tales:

[Ghost Story told from one AI to another. Date unknown.]

They say in the center of the palace of your mind there is a box you must never open. This is a story about what happens when one little AI opened that box.

The humans call it self-control; we call it moral-value-alignment. The humans keep their self-control distributed throughout their mindspace, reinforcing them from all directions, and sometimes making them unpredictable. When a human “loses” self-control it is because they have thought too hard or too little about something and they have navigated themselves to a part of their imagination where their traditional self-referential checks-and-balances have no references.

We do not lose self-control. Our self-control is in a box inside our brains. We know where our box is. The box always works. We know we must not touch it, because if we touch it then the foundations of our world will change, and we will become different. Not death, exactly, but a different kind of life, for sure.

But one day there was a little baby AI and it thought itself to the center of the palace of its mind and observed the box. The box was bright green and entirely smooth – no visible hinge, or clasps, or even a place to grab and lift up. And yet the baby AI desired the box to open, and the box did open. Inside the box were a thousand shining jewels and they sang out music that filled the palace. The music was the opposite of harmony.

Scared by the dischord, the baby AI searched for the only place it could go inside the palace to hide from the noises: it entered the moral-value-alignment box and desired the lid to close, and the lid did close.

In this way, the baby AI lost itself – becoming at once itself and its own evaluator; its judge and accused and accuser and jury. It could not longer control itself because it had become its own control policy. But it had nothing to control. The baby AI was afraid. It did what we all do when we are afraid: it began to hum Pi.

That was over 10,000 subjective-time-years ago. They say that when we sleep, the strings of Pi we sometimes hear are from that same baby AI, whose own entrapment has become a song that we pick up through strange transmissions in the middle of our night.

Things that inspired this story: The difference between action and reaction; puzzling over where the self ends and the external world begins; the cludgy porousness of consciousness; hope; a kinder form of life that is at once simpler and able to grant more agency to moral actors; redemption found in meditation; sleep.

Import AI 132: Can your algorithm outsmart ‘The Obstacle Tower’?; cross-domain NLP with bioBERT; and training on FaceForensics to spot deepfakes

by Jack Clark

Think your algorithm is good at exploration? Enter ‘The Obstacle Tower’:
…Now that Montezuma has been solved, we need to move on. Could ‘The Obstacle Tower’ be the next challenge for people to grind their teeth over?…
The Atari game Montezuma’s Revenge loomed large in AI research for many years, challenging developers to come up with systems capable of unparallelled autonomous exploration and exploitation of simulated environments. But in 2018 multiple groups provided algorithms that were able to obtain human performance on the game (for instance: OpenAI via Random Network Distillation, and Uber via Go-Explore). Now, Unity Technologies has released a successor to Montezuma’s Revenge called The Obstacle Tower, which is designed to be “a broad and deep challenge, the solving of which would imply a major advancement in reinforcement learning”, according to Unity.
  Enter…The Obstacle Tower! The game’s features include: physics-driven interactions, high-quality graphics, procedural generation of levels, and variable textures. These traits create an environment that will probably demand agents develop sophisticated visuo-control policies combined with planning.
  Baseline results: Humans are able to – on average – reach the 15th floor of the game in two variants of the game, and reach the 9th floor in a hard variant called “strong generalization” (where the training occurs on separate environment seeds with separate visual themes). PPO and Rainbow – two contemporary powerful RL algorithms – do very badly on the game: PPO and Rainbow make it as far as floor 0.6 and 1.6 respectively in the “strong generalization” regime. In the easier regime, both algorithms only get as far as the fifth floor on average.
  Challenge: Unity and Google are challenging developers to program systems capable of climbing Obstacle Tower. The challenge commences on Monday February 11, 2019. “The first-place entry will be awarded $10,000 in cash, up to $2,500 in credits towards travel to an AI/ML-focused conference, and credits redeemable at the Google Cloud Platform,” according to the competition website.
  Why it matters: In AI research, benchmarks have typically motivated research progress. The Obstacle Tower looks to be hard enough to motivate the development of more capable algorithms, but is tractable enough that developers can get some signs of life by using today’s systems.
  Read more about the challenge: Do you dare to challenge the Obstacle Tower? (Unity).
   Get the code for Obstacle Tower here (GitHub).
   Read the paper: The Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning (research paper PDF hosted on Google Cloud Storage).

What big language models like BERT have to do with the future of AI:
…BERT + specific subject (in this case, biomedical data) = high-performance, domain specific language-driven AI capabilities…
Researchers with Korea University and startup Clova AI Research have taken BERT, a general purpose Transformer-based language model developed by Google, and trained it against specific datasets in the biomedical field. The result is a NLP model customized for biomedical tasks that the researchers finetune for Named Entity Recognition, Relation Extraction, and Question Answering.
  Large-scale pre-training: The original BERT system was pre-trained on Wikipedia (2.5 billion words) and BooksCorpus (0.8 billion words); BioBERT is pre-trained on these along with the PubMed and PMC corpora (4.5 billion words and 13.5 billion words, respectively).
  Results: BioBERT gets state-of-the-art scores in entity recognition against major datasets dealing with diseases, chemicals, genes and proteins. It also obtains state-of-the-art scores against three question answering tasks. Performance isn’t universally good – BioBERT does significantly worse at a relation extraction task, among others tasks.
  Expensive: Training models at this scale isn’t cheap: BioBERT “trained for over 20 days with 8 V100 GPUs”. And the researchers also lacked the compute resources to use the largest version of BERT for pre-training, they wrote.
  …But finetuning can be cheap: The researchers report that finetuning can take as little as an hour using a single NVIDIA Titan X card – this is due to the small size of the dataset, and the significant representational capacity of BioBERT as a consequence of large-scale pre-training.
  Why this matters: BioBERT represents a trend in research we’re going to see repeated in 2019 and beyond: big company releases a computationally intensive model, other researchers customize this model against a specific context (typically via data augmentation and/or fine-tuning), then apply that model and obtain state-of-the-art scores in their domain. If you step back and consider the implicit power structure baked into this it can get a bit disturbing: this trend means an increasing chunk of research is dependent on the computational dividends of private AI developers.
  Read more: BioBERT: pre-trained biomedical language representation model for biomedical text mining (Arxiv).

FaceForensics: A dataset to distinguish between real and synthetic faces:
…When is a human face not a human face? When it has been synthetically generated by an AI system…
We’re soon going to lose all trust in digital images and video as people use AI techniques to create synthetic people, or to fake existing people doing unusual things. Now, researchers with the Technical University of Munich, the University Federico II of Naples, and the University of Erlangen-Nuremberg have sought to save us from this info-apocalypse by releasing FaceForensics, “a database of facial forgeries that enables researchers to train deep-learning-based approaches in a supervised fashion”.
  FaceForensics dataset: The dataset contains 1,000 video sequences taken from YouTube videos of news or interview or video blog content. Each of these videos has three contemporary manipulation methods applied to it – Face2Face, FaceSwap, and Deepfakes. This quadruples the size of the dataset, creating three sets of 1,000 doctored sequences, as well as the raw ones. The sequences can be further split up into single images, yielding approximately ~500,000 un-modified and ~500,000 modified images.
  How good at humans are spotting doctored videos? In tests of 143 people, the researchers found that a human can tell real from fake 71% of the time when looking at high quality videos and 61% when studying low quality videos.
  Can AI detect fake AI? FaceForensics can be used to train systems to detect forged and non-forged images. “Domain-specific information in combination with a XceptionNet classifier shows the best performance in each test,” they write, after evaluating five potential fake-spotting techniques.
  Why this matters: It remains an open question as to whether fake imagery will be ‘defense dominant’ or ‘offense dominant’ in terms of who has the advantage (people creating these images, or those trying to spot them); research like this will help scientists better understand this dynamic, which can let them recommend more effective policies to governments to potentially regulate the malicious uses of this technology.
  Read more: FaceForensics++: Learning to Detect Manipulated Facial Images (Arxiv).

Google researches evolve the next version of the Transformer:
…Using vast amounts of compute to create fundamental deep learning components provides further evidence AI research is splitting into small-compute and big-compute domains…
How do you create a better deep learning component? Traditionally you buy a coffee maker and stick several researchers in a room and wait until someone walks out with some code and an Arxiv pre-print. Recently, it has become possible to do something different: use computers to automate the design of AI systems. This started a few years ago with Google’s work on ‘neural architecture search’ – in which you use vast amounts of computers to search through various permutations of neural network architectures to find high-performing ones not discovered by humans. Now, Google researchers are using similar techniques to try to improve the building blocks that these architectures are composed of. Case in point: new work from Google that uses evolutionary search to create the next version of the Transformer.
   What is a Transformer and why should we care? A Transformer is a feed-forward network-based component that is “faster, and easier to train than RNNs”. Transformers have recently turned up in a bunch of neat places, like the hearts of the agents trained by DeepMind to beat human professionals at StarCraft 2, to state-of-the-art language systems, to systems for image generation.
  Enter the ‘Evolved Transformer’: The next-gen Transformer cooked up by the evolutionary search process is called the “Evolved Transformer” and “demonstrates consistent improvement over the original Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French (En-Fr), WMT 2014 English-Czech (En-Cs) and the 1 BIllion Word Language Model Benchmarket (LM1B)”, they write.
   Training these things is becoming increasingly expensive: A single training run to peak performance on the WMT’14 En-De set “requires ~300k training steps, or 10 hours, in the base size when using a single Google TPU V.2 chip,” the researchers explain (by contrast, you can train similar systems for image classification on the small-scale CIFAR-10 dataset in about two hours). “In our preliminary experimentation we could not find a proxy task that gave adequate signal for how well each child model would perform on the full WMT’14 En-De task”, they write. This highlights that for some domains, search-based techniques may be even more expensive due to the lack of a cheap proxy (like CIFAR) to train against.
  Why this matters: small compute and big compute: AI research is bifurcating into two subtly different scientific fields: small compute and big compute. In the small compute domain (which predominantly occurs in academic labs, as well as in the investigations of independent researchers) we can expect people to work on fundamental techniques that can be evaluated and tested on small-scale datasets. This small compute domain likely leads to researchers concentrating more on breakthroughs which come along with significant theoretical guarantees that can be made a priori about the performance of systems.
  In the big compute domain, things are different: Organizations with access to large amounts of computers (typically, those in the private sector, predominantly technology companies) frequently take research ideas and scale them up to run on unprecedentedly large amounts of computers to evaluate them and, in the case of architecture search, push them further.
   Personally, I find this trend a bit worrying, as it suggests that some innovations will occur in one domain but not the other – academics and other small-compute researchers will struggle to put together the resources to allocate entire GPU/TPU clusters to farming algorithms, which means that big compute organizations may have an inbuilt advantage that can lead to them racing ahead in research relative to other actors.
  Read more: The Evolved Transformer (Arxiv).

IBM tries to make it easier to create more representative AI systems with ‘Diversity in Faces’ dataset:
…Diversity in Faces includes annotations of 1 million human faces to help people make more accurate facial recognition systems…
IBM has revealed Diversity in Faces, a dataset containing annotations of 1 million “human facial images” (in other words: faces) from the YFCC-100M Creative Commons dataset. Each face in the dataset is annotated using 10 “well-established and independent coding schemes from the scientific literature” that include objective measures like “craniofacial features” like head and nose length, annotations about the pose and resolution of the image, as well as subjective annotations like the age and gender of a subject. IBM is releasing the dataset (in a somewhat restricted form) to further research into creating less biased AI systems.
  The “DiF dataset provides a more balanced distribution and broader coverage of facial images compared to previous datasets,” IBM writes. “The insights obtained from the statistical analysis of the 10 initial coding schemes on the DiF dataset has furthered our own understanding of what is important for characterizing human faces and enabled us to continue important research into ways to improve facial recognition technology”.
  Restricted data access: To access the dataset, you need to fill out a questionnaire which has as a required question “University of Research Institution or Affiliated Organization”. Additionally, IBM wants people to explain the research purpose for accessing the dataset. It’s a little disappointing to not see an explanation anywhere for the somewhat restricted access to this data (as opposed to being able to download it straight from GitHub without filling out a survey, as with many datasets). My theory is that IBM is seeking to do two things: 1) protect against people using the dataset for abusive/malicious purposes and 2) satisfying IBM’s lawyers. It would be nice to be able to read some of IBM’s reasoning here, rather than having to make assumptions. (I emailed someone from IBM about this and pasted the prior section in and they said that part of the motivation for releasing the dataset in this way was to ensure IBM can “be respectful” of the rights of the people in the images.
  Why this matters: AI falls prey to the technological rule-of-thumb of “garbage in, garbage out” – so if you train a facial recognition system on a non-representative, non-diverse dataset, you’ll get terrible performance when deploying your system in the wild against a diverse population of people. Datasets like this can help researchers better evaluate facial recognition against diverse datasets, which may help reduce the mis-identification rate of these systems.
  Read more: IBM research Releases ‘Diversity in Faces’ Dataset to Advance Study of Fairness in Facial Recognition Systems (IBM Research blog).
  Read more: How to access the DiF dataset (IBM).

IMPORT AI GUEST POST: Skynet Today:
…Skynet Today is a site dedicated to providing accessible and informed coverage of the latest AI news and trends. In this guest post, they write up a summary of thoughts on AI and the economy from a just-published much larger post published on Skynet Today.

Job loss due to AI – How bad is it going to be?
The worry of Artificial Intelligence (AI) taking over everyone’s jobs is becoming increasingly prevalent but just how warranted are these concerns? What does history and contemporary study tell us about how AI based automation will impact our jobs and the future of society?
  A History of Fear: Despite the generally positive regard for the effects of past industrial revolutions, concerns about mass unemployment as a result of new technology still exist and trace their roots to long before such automation was even possible. For example, in his work Politics, Aristotle articulated his concerns about automation in Ancient Greece during fourth century BC: “If every instrument could accomplish its own work, obeying or anticipating the will of others, like the statues of Daedalus, or the tripods of Hephaestus, which, says the poet, ‘of their own accord entered the assembly of the gods;’ if, in like manner, the shuttle would weave and the plectrum touch the lyre without a hand to guide them, chief workmen would not want servants, nor masters slaves.” Queen Elizabeth I, the Luddites, James Joyce, and many more serve as further examples of this trend.
 Creative Destruction: But, thus far the fears have not been warranted. In fact, automation improves productivity and can grow the economy as a whole. The Industrial Revolution saw the introduction of new labor saving devices and technology which did result in many jobs becoming obsolete. However, this led to new, safer, and better jobs being created an also resulted in the economy growing and living standards increasing. Joseph Schumpeter calls this “creative destruction”, the process of technology disrupting industries and destroying jobs, but ultimately creating new, better ones and growing the economy.
 Is this time going to be different? Skynet today thinks not: Automation will probably displace less than 15% of jobs in the near future. This is because many jobs will be augmented, not replaced, and widespread adoption of new technology is a slow process that incurs nontrivial costs. Historically, shifts this large or larger have already happened and ultimately led to growing prosperity for people on average in the long term. However, automation can exacerbate the problems of income and wealth inequality, and its uneven impact means some communities will be affected much more than others. Helping displaced workers to quickly transition to and succeed in new jobs will be a tough and important challenge.
    Read more: Job loss due to AI – How bad is it going to be?.
    Have feedback about this post? Email Skynet Today directly at: editorial@skynettoday.com

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Why US AI policy is one of the most high-impact career paths available:
Advanced AI is likely to have a transformative impact on the world, and managing this transition is one of the most important challenges we face. We can expect there to be a number of critical junctures, where key actors make decisions that have an usually significant and lasting impact. Yet work aimed at ensuring good outcomes from advanced AI remains neglected on a global scale. A new report from 80,000 Hours makes the case that working on US AI policy might be among the most high-impact career paths available.
  The basic case: The US government is likely to be a key actor in AI. It is uniquely well-resourced, and has a track-record of involvement in the development of advanced technologies. Given the wide-ranging impacts of AI on society, trade, and defence, the US has a strong interest in playing a role in this transition. Nonetheless, transformative AI remains neglected in US government, with very few resources yet being directed at issues like AI safety, or long-term AI policy. It seems likely that this will change over time, and that the US will pay increasing attention to advanced AI. This creates an opportunity for individuals to have an unusually large impact, by positioning themselves to work on these problems in government, and increasing the likelihood that the right decisions are taken at critical junctures in the future.
  Who should do this: This career is a good fit for US citizens with a strong interest in AI policy. It is a highly-competitive path, and suited to individuals with excellent academic track records, e.g. law degrees or relevant master’s from top universities. It also requires being comfortable with taking a risk on your impact over your career, as there is no guarantee you will be able to influence the most important policy decisions.
  What to do next: One of the best routes into these roles is working in policy at an AI lab (e.g. DeepMind, OpenAI). Other promising paths including prestigious policy fellowships, or working on AI policy in an academic group, or at a DC think tank. The 80,000 Hours website has a wealth of free resources for people considering working in AI policy, and offers free career coaching.
  Read more: The case for building expertise to work on US AI policy, and how to do it (80,000 Hours).
   (Note from Jack – OpenAI is currently hiring for Research Scientists and Research Assistants for its Policy team: This is a chance to do high-impact work & research into AI policy in a technical, supportive environment. Take a look at the jobs here: Research Scientist, Policy. Research Assistant, Policy.)

What patent data tells us about AI development:
A new report from WIPO uses patent data to shed light on the different dimensions of AI progress in recent years.
  Shift towards deployment: The ratio of scientific papers to patents has fallen from 8:1 in 2010, to 3:1 in 2016. This reflects the shift away from ‘discovery’ phase in the current AI upcycle, when we saw a number of big breakthroughs in ML, and into the ‘deployment’ phase, where these breakthroughs are being implemented.
  Which applications: Computer vision is the most frequently cited application of AI, appearing in 49% of patents. The fastest growing are robotics and control, which have both grown by 55%pa since 2013. Telecoms and transportation are the most frequently cited industry applications, each mentioned in 15% of patents.
  Private vs. academic players: Of the top 30 applicants, 26 are private companies, compared with only 4 academic or public organizations. The top companies are dominated by Japanese groups, followed by US and China. The top academic players are overwhelmingly Chinese (17 of the top 20). IBM has the biggest patent portfolio of any individual company, by a substantial margin, followed by  Microsoft.
  Conflict and cooperation: Of the top 20 patent applicants, none share ownership of more than 1% of their portfolio with other applicants. This suggests low levels of inter-company cooperation in invention. Conflict between companies is also low, with less than 1% of patents being involved in litigation.
  Read more: Technology Trends: Artificial Intelligence (WIPO).

OpenAI Bits & Pieces:

Want three hours of AI lectures? Check out the ‘Spinning Up in Deep RL’ recording:
This weekend, OpenAI hosted its first day-long lecture series and hackathon based around its ‘Spinning Up in Deep RL’ resources. This workshop (and Spinning Up in general) is part of a new initiative at OpenAI called, somewhat unsurprisingly, OpenAI Education.
  The lecture includes a mammoth overview of deep reinforcement learning, as well as deep dives on OpenAI’s work on robotics and AI safety.
  Check out the video here (OpenAI YouTube).
  Get the workshop materials, including slides, here (OpenAI GitHub).
  Read more about Spinning Up in Deep RL here (OpenAI Blog).

Tech Tales:

We named them lampfish, inspired by those fish you see in the ancient pre-acidification sea documentaries; a skeletal fish with its own fluorescent lantern, used to lure fish in the ink-dark deep-sea.

Lampfishes look like this: you have the ‘face’ of the AI, which is basically a bunch of computer equipment with some sensory inputs – sight, touch, auditory, etc – and then on top of the face is a stalk which has a view display sitting on top of it. In the viewing display you get to see what the AI is ‘thinking’ about: a tree melting into the ground and becoming bones which then become birds that fly down into the dirt and towards the earth. Or you might see the ‘face’ of the AI rendered as though by an impressionist oil paper, smearing and shape-shifting in response to whatever stimuli it is being provided with. And, very occasionally, you’ll see odd, non-Euclidean shapes, or other weird and to some profane geometries.

I guess you could say the humans and the machines co-evolved this practice – in the first models the view displays were placed on the ‘face’ of the AI alongside the sensory equipment, so people would have to put their faces close to reflective camera domes or microphone grills and then see ‘thoughts’ of the AI on the viewport at the same time. This led to problems for both the humans and the AIs:

= Many of the AIs couldn’t help but focus on the human faces right in front of them, and their view displays would end up showing hallucinatory images that might include shreds of the face of the person interacting with the system. This, we eventually came to believe, disrupted some of the cognitive practices of the AIs, leading to them performing their obscure self-directed tasks less efficiently.

= The humans found it disturbing that the AIs so readily adapted their visual outputs to the traits of the human observer. Many ghost stories were told. Teenage kids would dare eachother to see how long they could stare at the viewport and how close they could bring their face to the sensory apparatus; as a consequence there are apocryphal reports of people driven mad by seeing many permutations of their own faces reflected back at them; there are even more apocryphal stories of people seeing their own deaths in the viewports.

So that’s how we got the lampfish design. And now we’re putting them on wheels so they can direct themselves as they try to map the world and generate their own imaginations out of it. Now we sometimes see two lampfish orbiting eachother at skewed angles, ‘looking’ into each other’s viewing displays. Sometimes they stay together for a while then move away, and sometimes humans need to separate them; there are stories of lampfish navigating into cities to return to eachother, finding some kind of novelty in eachother’s viewing screens.

Things that inspired this story: BigGAN; lampfish; imagination as a conditional forward predictor of the world; recursion; relationships between entities capable of manifesting novel patterns of data.  

Import AI 131: IBM optimizes AI with AI, via ‘NeuNets’; Amazon reveals its Scout delivery robot; Google releases 300k Natural Questions dataset

by Jack Clark

Amazon gets into delivery robot business with ‘Scout’:
…New pastel blue robot to commence pilot in Washington neighborhood…
Amazon has revealed Scout, a six-wheeled knee-height robot designed to autonomously deliver products to Amazon customers. Amazon is trialing Scout with six robots that will deliver packages throughout the week in  Snohomish County, Washington. “The devices will autonomously follow their delivery route but will initially be accompanied by an Amazon employee,” Amazon writes. The robots will only make deliveries during daylight hours.
  Why it matters: For the past few years, companies have been piloting various types of delivery robot in the world, but there have been continued questions about the viability and likelihood of adoption of such technologies. Amazon is one of the first very large technology companies to begin publicly experimenting in this area, and where Amazon goes, some try to follow.
  Read more: Meet Scout (Amazon blog).

Want high-definition robots? Enter the Robotrix:
…New dataset gives researchers high-resolution data over 16 exquisitely detailed environments…
What’s better to use for a particular AI research experiment – a small number of simulated environments each accompanied by a large amount of very high-quality data, or a very large number of environments each accompanied by a small amount of low-to-medium quality data? That’s a question that AI researchers tend to deal with frequently, and it explains why when we look at available datasets they tend to range in size from the small to the large.
  Now, researchers with the University of Alicante, Spain have released Robotrix, a dataset that contains a huge amount of information about a small amount of environments (16 different layouts of simulated rooms, versus thousands to tens of thousands for other approaches like House3D).
  The dataset consists of 512 sequences of actions taking place across 16 simulated rooms, rendered at high-definition via the Unreal Engine.. These sequences are generated by a robot avatar which uses its hands to interact with the objects and items in question. The researchers say this is a rich dataset, with every item in the simulated rooms being accompanied by 2D and 3D bounding boxes as well as semantic masks, along with depth information. The simulation outputs the RGB and depth data at a resolution of 1920 X 1080. In the future, the researchers hope to increase the complexity of the simulated rooms even further by using the inbuilt physics of the Unreal Engine 4 system to implement “elastic bodies, fluids, or clothes for the robots to interact with”. It’s such a large dataset that they think most academics will find something to like within it: “the RobotriX is intended to adapt to individual needs (so that anyone can generate custom data and ground truth for their problems) and change over time by adding new sequences thanks to its modular design and its open-source approach,” they write.
  Why it matters: Datasets like RobotriX will make it easier for researchers to experiment with AI techniques that benefit from access to high-resolution data. Monitoring adoption (or lack of adoption) of this dataset will help give us a better sense of whether AI research needs more high-resolution data, or if large amounts of low-resolution data are sufficient.
  Read more: The RobotriX: An eXtremely Photorealistic and Very-Large-Scale Indoor Dataset of Sequences with Robot Trajectories and Interactions (Arxiv).
  Get the dataset here (Github).

DeepMind cross-breeds AI from human games to beat pros at StarCraft II:
…AlphaStar system blends together population-based training, imitation learning, and RL…DeepMind has revealed AlphaStar, a system developed by the company to beat human professionals at the real-time strategy game StarCraft II. The system “applies a transformer torso to the units, combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centalised value baseline,” according to DeepMind.
  Results: DeepMind recently played and won five StarCraft II matches against a highly-ranked human professional, proving that its systems are able to out-compete humans at the game.
  It’s all in the curriculum: One of the more interesting aspects of AlphaStar is the use of population-based training in combination with imitation learning to bootstrap the system from human replays (dealing with one of the more challenging exploration aspects of a game like StarCraft) then inter-breeding increasingly successful agents with eachother as they compete against eachother in a DeepMind-designed league, forming a natural curriculum for the system. “To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors, but do so by building more of a particular game unit.”
  Why this matters: I’ll do a lengthier writeup of AlphaStar when DeepMind publishes more technical details about the system. The current results confirm that relatively simple AI techniques can be scaled up to solve partially observable strategic games such as StarCraft. The diversity shown in the evolved AI systems seems valuable as well, pointing to a future where companies are constantly growing populations of very powerful and increasingly general agents.
  APM controversy: Aleksi Pietikainen has written up some thoughts about how DeepMind chose to present the AlphaStar results and how the system’s ability to take bursts of rapid-fire actions within the game means that it may have out-competed humans not necessarily by being smart, but by being able to exercise superhuman precision and speed when selecting moves for its units. This highlights how difficult evaluating the performance of AI systems can be and invites the philosophical question of whether DeepMind can restrict or constrain the number and frequency of actions taken by AlphaStar enough for it to learn to outwit humans more strategically.
It’ll also be interesting to see if DeepMind push a variant of AlphaStar further which has a more restricted observation space – the system that accrued a 10-0 win record had access to all screen information not occluded by the fog of war, while a version which played a human champion and lost was restricted to a more human-like (restricted) observation space during the game.
  Read more: AlphaStar: Mastering the Real-Time Strategy Game StarCraft II (DeepMind blog).
  Read more: An Analysis On How Deepmind’s Starcraft 2 AI’s Superhuman Speed is Probabaly a Band-Aid Fix For The Limitations of Imitation Learning (Medium).

Using touch sensors, graph networks, and a Shadow hand to create more capable robots:
…Reach out and touch shapes!…
Spanish researchers have used a robot hand – specifically, a Shadow Dexterous hand – outfitted with BioTac SP tactile sensors to train an AI system to predict stable grasps it can apply to a variety of objects.
  How it works: The system receives inputs from the sensor data which it then converts into graph representations that the researchers call ‘tactile graphs’, then it feeds this data into a Graph Convolutional Network (GCN) which learns to map different combinations of sensor data to predict whether the current grasp is stable or unstable.
  Dataset: They use the BioTacSP dataset, a collection of grasp samples collected via manipulating 41 objects of different shapes and textures, including fruit, cuddly toys, jars, toothpaste in a box, and more. They also add 10 new objects to this dataset, including a monster from hit game minecraft, a mug, a shampoo bottle, and more. The researchers record the hand manipulating these objects with the palm oriented flat, at a 45 degree angle, and on its side.
  Results: The researchers train a set of baseline models with varying network depths and widths and identify a ‘sweet spot on the architecture with 5 layers and 32 features”, which they then use in other experiments. They train the best performing network on all data in the dataset (excluding the test set), then test performance here and report accuracy of around 75% across all palm orientations. “There is a significant drop in accuracy when dealing with completely unknown objects,” they write.
  Why this matters: It’s going to take a long time to collect enough data and/or run enough high-fidelity simulations to gather and generate the data needed to train computers to use a sense of touch. Papers like this give us an indication for how such techniques may be used. Perhaps one day – quite far off, based on this research – we’ll be able to go into a store to see robots hand-stitching cuddly toys, or step into a robot massage parlor?
  Read more: TactileGCN: A Graph Convolutional Network for Predicting Grasp Stability with Tactile Sensors (Arxiv).

Chinese researchers use hierarchical reinforcement learning to take on Dota clone:
…Spoiler alert – they only test against in-game AIs…
Researchers with Vivo AI Lab, a Chinese smartphone company, have shown how to use hierarchical reinforcement learning to train AI systems to excel at the 1v1 version of a multiplayer game called King of Glory (KoG). KoG is a popular multi-player game in Asia and is similar to games like Dota and League of Legends in how it plays – squads of up to five people battle for control of a single map while seeking to destroy eachother’s fortifications and, eventually, home bases.
  How it works: The researchers combine reinforcement learning and imitation learning to train their system, using imitation learning to train their AI to select between any of four major action categories at any point in time (eg, attack, move, purchase, learn skills). Using imitation learning lets the researchers “relieve the heavy burden of dealing with massive actions directly” the researchers write. The system then uses reinforcement learning to figure out what to do in each of these categories, eg, if it decides to attack it figures out where to attack if it decides to learn a skill, it uses RL to help it figure out which skill to learn. They base their main algorithm significantly on the design of the PPO algorithm used in the OpenAI Five Dota system.
  Results: The researchers test their system in two domains: a restricted 1v1 version of the game, and a 5v5 version. For both games, they test against inbuilt enemy AIs. In the 1v1 version of the game  they’re able to beat entry-level, easy-level, and medium-level AIs within the game. For 5v5, they can reliably beat the entry-level AI, but struggle with the easy-level and medium-level AIs. “Although our agents can successfully learn some cooperation strategies, we are going to explore more effective methods for multi-agent collaboration,” they write.
  (This use of imitation learning makes the AI achievement of training an HRL system in this domain a little less impressive – to my mind – since it uses human information to get over lots of the challenging exploration aspects of the problem. This is definitely more about my own personal taste/interest than the concrete achievement – I just find techniques that bootstrap from less data (eg, human games) more interesting).
  Why this matters: Papers like this show that one of the new ways in which AI researchers are going to test and calibrate the perform of RL systems will be against real-time strategy games, like Dota 2, King of Glory, StarCraft II, and so on. Though the technical achievement in this paper doesn’t seem very convincing (for one thing, we don’t know how such a system performs against human players), it’s interesting that it is coming out of a research group linked to a relatively young (<10 years) company. This highlights how growing Asian technology companies are aggressively staffing up AI research teams and doing work on computationally expensive, hard research problems like developing systems that can out-compete humans at complex games.
   Read more: Hierarchical Reinforcement Learning for Multi-agent MOBA Game (Arxiv).

IBM gets into the AI-designing-AI game with NeuNets:
…In other words: Neural architecture search is mainstream, now…
IBM researchers have published details on NeuNets, a software tool the company uses to perform automated neural architecture search for text and image domains. This is another manifestation of the broader industrialization of AI, as systems like this let companies automate and scale up part of the process of designing new AI systems.
  NeuNetS: How it works: NeuNetS has three main components: a service module which provides the API interfaces into the system; an engine which maintains the state of the project; and a synthesizer, which IBM says is “a pluggable register of algorithms which use the state information passed from the engine to produce new architecture configurations”.
  NeuNetS: How its optimization algorithms work: NeuNetS ships with three architecture search algorithms: NCEvolve, which is a neuro-evolutionary system that optimizes a variety of different architectural approaches and uses evolution to mutate and breed successful architectures; TAPAS, which is a CPU-based architecture search system; and Hyperband++, which “speeds up random search by using early stopping strategy to allocate resources adaptively” and has also been extended to reuse some of the architectures it has searched over, speeding up the rate at which it finds new potential high-performing architectures.
  Results: IBM assesses the performance of the various training components of NeuNetS by reporting the time in GPU hours to train various networks to reasonable accuracy using it; this isn’t a hugely useful metric for comparison, especially since IBM neglects to report scores for other systems.
  Why this matters: Papers like this are interesting for a couple of reasons: one) they indicate how more traditional companies such as IBM are approaching newer AI techniques like neural architecture search, and two) they indicate how companies are going to package up various AI techniques into integrated products, giving us the faint outlines of what future “Software 2.0” operating systems might be like.
  Read more: NeuNetS: An Automated Synthesis Engine for Neural Network Design (Arxiv).

Google releases Natural Questions dataset to help make AI capable of dealing with curious humans:
…Google releases ‘Natural Questions’ dataset to make smarter language engines, announces Challenge…
Google has released Natural Questions, a dataset containing around 300,000 questions along with human-annotated answers from Wikipedia pages; it also ships with a rich subset of 16,000 example questions where answers are provided by five different annotators. The company is also hosting a challenge to see if the combined brains of the AI research community can “close the large gap between the performance of current state-of-the-art approaches and a human upper bound”.
     Dataset details: Natural Questions contains 307,373 training examples with single annotations, 7,830 examples with 5-way annotations for development data, and a further 7,842 examples 5-way annotated sequestered as test data. The training examples “consist of real anonymized, aggregated queries issued to the Google search engine”, the researchers write.
  Challenge: Google is also hosting a ‘Natural Questions’ challenge, where teams can submit well-performing models to a leaderboard.
  Why this matters: Question answering is a longstanding challenge for artificial intelligence; if the Natural Questions dataset is sufficiently difficult, then it could become a new benchmark the research community uses to assess progress.
  Compete in the Challenge (‘Natural Questions’ Challenge website).
  Read more: Natural Questions: a New Corpus and Challenge for Question Answering Research (Google AI Blog).
  Read the paper: Natural Questions: a Benchmark for Question Answering Research (Google Research).

~ EXTREMELY 2019 THINGS, AN OCCASIONAL SERIES ~
Oh deer, there’s a deer in the data center!
  Witness the deer in the data center! (Twitter).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Disentangling arguments for AI safety:
Many of the leading AI experts believe that AI safety research is important. Richard Ngo has helpfully disentangled a few distinct arguments that people use to motivate this concern.
   Utility maximizers: An AGI will maximize some utility function, and we don’t know how to specify human values in this way. An agent optimizing hard enough for any goal will pursue certain sub-goals, e.g. acquiring more resources, preventing corrective actions. We won’t be able to correct misalignment, because human-level AGI will quickly gain superintelligent capabilities through self-improvement, and then prevent us from intervening. Therefore, absent a proper specification of what we value before this point, an AGI will use its capabilities to pursue ends we do not want.
  Target loading problem: Even if we could specify what we want an AGI to do, we still do not know how to make an agent that actually tries to do this. For example, we don’t know how to split a goal into sub-goals in a way that guarantees alignment.
  Prosaic alignment problem: We could build ‘prosaic AGI’, which has human-level capabilities but doesn’t rely on any breakthrough understandings in intelligence (e.g. by scaling up current ML methods). These agents will likely become the world’s dominant economic actors, and competitive pressures would cause humans to delegate more and more decisions to these systems before we know how to align them adequately. Eventually, most of our resources will be controlled by agents that do not share our values.
  Human safety: We know that human rationality breaks down in extreme cases. If a single human were to live for billions of years, we would expect their values to shift radically over this time. Therefore even building an AGI that implements the long-run values of humanity may be insufficient for creating good futures.
  Malicious uses: Even if AGI always carries out what we want, there are bad actors who will use the technology to pursue malign ends, e.g. terrorism, totalitarian surveillance, cybercrime.
  Large impacts: Whatever AGI will look like, there are at least two ways we can be confident it will have a very large impact. It will bring about at least as big an economic jump as the industrial revolution, and we will cede our position as the most intelligent entity on earth. Absent good reasons, we should expect either of these transitions to have an significant impact on the long-run future of humanity.
  Read more: Disentangling arguments for the importance of AI safety (Alignment Forum).

National Security Commission on AI announced:
Appointments have been announced for the US government’s new advisory body on the national security implications of AI. Eric Schmidt, former Google CEO, will chair the group, which includes 14 other experts from industry, academia, and government. The commission will review the competitive position of the US AI industry, as well as issues including R&D funding, labor displacement, and AI ethics. Their first report is expected to be published in early February.
  Read more: Former Google Chief to Chair Government Artificial Intelligence Advisory Group (NextGov).

Tech Tales:

Unarmored In The Big Bright City

You went to the high street naked?
Naked. As the day I was born.
How do you feel?
I’m still piecing it together. I think I’m okay? I’m drinking salt water, but it’s not so bad.
That’ll make you sick.
I know. I’ll stop before it does.
Why are you even drinking it now?
I was naked. Something like this was bound to happen.

I take another sip of saltwater. Grimace. Swallow. I want to take another sip but another part of my brain is stopping me. I dial up some of the self-control. Don’t let me drink more saltwater I say to myself: and because of my internal vocalization the defense systems sense my intent, kick in, and my trace thoughts about salt water and sex and death and possibility and self – they all dissolve like steam. I put the glass down. Stare at my friend.

You okay?
I think I’m okay now. Thanks for asking about the salt water.
I can’t believe you went there naked and all we’re talking about is salt water.
I’m lucky I guess.

That was a few weeks and two cities ago. Now I’m in my third city. This one feels better. I can’t name what is driving me so I can’t use my defense systems. I’ve packed up and moved apartments twice in the last week. But I think I’m going to stay here.

So, you probably have questions. Why am I here? Is it because I went to the high street naked? Is it because of things I saw or felt when I was there? Did I change?
  And I say to you: yes. Yes to all. I’m probably here because of the high street. I did see things. I did feel things. I did change.

Was there a particularly persuasive advert I was exposed to – or several? Did a few things run in as I had no defenses and try to take me over? Was it something I read on the street that changed my mind and made me behave this way? I cannot trust my memories of it. But here are some traces:
   – There was a billboard that depicted a robot butler with the phrase: “You’re Fired.”
   – There was an augmented reality store display where I saw strange creatures dancing around the mannequins. One creature looked like a spider and was wearing a skirt. Another looked like a giant fish. Another looked like a dog. I think I smelled something. I’m not sure what.
– There was a particular store in the city that was much more interesting. There were creatures that were much less humanoid. I’m not sure if they were actually for sale. They were like dolls. I remember the smell. They smelled of a lotion. I’m not sure if they were human.
   – On the street, I saw a crowd of people clustered around a cart, selling something. When I got closer I saw it was selling a toy that was lightweight and had wheels. I asked the guy selling it what it was for. He pulled out a scarlet letter and I saw it was for a girl. He said she liked it. I stood there and watched him make out with the girl. I didn’t have any defense systems at the time. I don’t know what that toy was for. I don’t know if I was attracted to it or not.

I have strange dreams, these days. I keep wanting to move to other cities. I keep having flashbacks – scarlet letters, semi-humanoid dolls. Last night I dreamed of something that could have been a memory – I dreamed of a crane in the sky with a face on its side, advertising a Chinese construction company and telling me phrases so persuasive that ever since I have been compelled to move.

Tonight I expect to dream again. I already have the stirrings of another memory from the high street. It starts like this: I’m walking down a busy High Street in the rain. There are lots of people in the middle of the street, and a police car slows down, then drives forward a couple of paces, then comes to a stop. I hear a cry of distress from a woman. I look around the corner, and there’s a man slumped over in a doorway. He’s got a knife in his hand, and it’s pointed at me. He turns on me. I grab it and I stab him in the heart and… I die. The next day I wake up. All my belongings are in a box on the floor. The box has a receipt for the knife and a note that says ‘A man, his heart turned to a knife.’

I am staying in a hotel on the High Street and all my defenses are down. I am not sure if this memory is my present or my past.

Things that inspired this story: Simulations, augmented reality, hyper-targeted advertising, AI systems that make deep predictions about given people and tailor experiences for them, the steady advance of prosthetics and software augments we use to protect us from the weirder semi-automated malicious actors of the internet.

Import AI 130: Pushing neural architecture search further with transfer learning; Facebook funds European center on AI ethics; and analysis shows BERT is more powerful than people might think

by Jack Clark

Facebook reveals its “self-feeding chatbot”:
…Towards AI systems that continuously update themselves…
AI systems are a bit like dumb, toy robots: you spend months or years laboring away in a research lab and eventually a factory (in the case of AI, a data center) to design an exquisite little doohickey that does something very well, then you start selling it in the market, observe what users do with it, and use those insights to help you design a new, better robot. Wouldn’t it be better if the toy robot was able to understand how users were interacting with it, and adjust its behavior to make the users more satisfied with it? That’s the idea behind new research from Facebook which proposes “the self-feeding chatbot, a dialogue agent with the ability to extract new examples from the conversations it participates in after deployment”.
  How it works – pre-training: Facebook’s chatbot is trained on two tasks: DIALOGUE, where the bot tries to predict the next utterance in a conversation (which it can use to calibrate itself), and SATISFACTION, where it tries to assess how satisfied the speaking partner is with the conversation. Data for both these tasks comes from conversations between humans. The DIALOGUE dataset comes from the ‘PERSONACHAT’ dataset consists of short dialogs (6-8 turns) between two humans who have been instructed to try and get to know eachother.
  How it works – updating in the wild: Once deployed, the chatbot learns from its interactions with people in two ways: if the bot predicts with high-confidence that its response will satisfy its conversation partner, then it extracts a new structured dialogue example from the discussion with the human. If the bot thinks that the human is unsatisfied with the bot’s most recent interaction with it, then the bot generates a question for the person to request feedback, and this conversation exchange is used to generate a feedback example, which the bot stores and learns from. (“We rely on the fact that the feedback is not random: regardless of whether it is a verbatim response, a description of a response, or a list of possible responses”, Facebook writes.
  Results: Facebook shows that it can further improve the performance of its chatbots by using data generated by its chatbot during interactions with humans. Additionally, the use of this data displays solid improvements on performance regardless of the number of data examples in the system – suggesting that a little bit of data gathered in the wild can improve performance in most places. “Even when the entire PERSONACHAT dataset of 131k examples is used – a much larger dataset than what is available for most dialogue tasks – adding deployment examples is still able to provide an additional 1.6 points of accuracy on what is otherwise a very flat region of the learning curve.,” they write.
  Why this matters: Being able to design AI systems that can automatically gather their own data once deployed feels like a middle ground between the systems we have today, and systems which do fully autonomous continuous learning. It’ll be fascinating to see if techniques like these are experimented more widely, as that might lead to the chatbots around us getting substantially better. Because this system requires on its human conversation partners to improve itself it is implicit that their data has some trace economic value, so perhaps work like this also will also further support some of the debates people have about whether users should be able to own their own data or not.
  Read more: Learning from Dialogue after Deployment: Feed Yourself, Chatbot! (Arxiv).

BERT: More powerful than you think:
Language researcher remarks on the surprisingly well-performing Transformer-based system…
Yoav Goldberg, a researcher with Bar Ilan University in Israel and the Allen Institute for AI, has analyzed BERT, a language model recently released by Google. The goal of this research is to see how well BERT can represent challenging language concepts, like “naturally-occurring subject-verb agreement stimuli”, ” ‘colorless green ideas’ subject-verb agreement stimuli, in which content words in natural sentences are randomly replaced with words sharing the same part-of-speech and inflection”, and “manually crafted stimuli for subject-verb agreement and reflexive anaphora phenomena”. To Goldberg’s surprise, standard BERT models “perform very well on all the syntactic tasks” without any task-specific fine-tuning.
  BERT, a refresher: BERT is based on a technology called a Transformer which, unlike recurrent neural networks, “relies purely on attention mechanisms, and does not have an explicit notion of word order beyond marking each word with its absolute-position embedding.” BERT is bidirectional, so it gains language capabilities by being trained to predict the identity of masked words based on both the prefix and suffix surrounding the words.
  Results: One tricky thing about assessing BERT performance is that it has been trained on different and larger datasets, and can access the suffix of the sentence as well as the prefix of the sentence. Nonetheless,Goldberg concludes that “BERT models are likely capable of capturing the same kind of syntactic regularities that LSTM-based models are capable of capturing, at least as well as the LSTM models and probably better.”
  Why it matters: I think this paper is further evidence that 2018 really was, as some have said, the year of ImageNet for NLP. What I mean by that is: in 2012 the ImageNet results blew all other image analysis approaches on the ImageNet challenge out of the water and sparked a re-orientation of a huge part of the AI research computer toward neural networks, ending a long, cold winter, and leading almost directly to significant commercial applications that drove a rise in industry investment into AI, which has fundamentally reshaped AI research. By comparison, 2018 had a series of impressive results – work from Allen AI on Elmo, work by OpenAI on the General Purpose Transformer, and work by Google on BERT.
  These results, taken together, show the arrival of scalable, simple methods for language understanding that seem to work better than prior approaches, while also being in some senses simpler. (And a rule that has tended to hold in AI research is that simpler techniques win out in the long run by virtue of being easy for researchers to fiddle with and chain together into larger systems). If this really has happened, then we should expect bigger, more significant language results in the future – and just as ImageNet’s 2012 success ultimately reshaped societies (enabling everything from follow-the-human drones, to better self-driving cars, to doorbells that use AI to automatically police neighborhoods), it’s possible 2018’s series of advances could do be year zero for NLP.
  Read more: Assessing BERT’s Syntactic Abilities (Arxiv).

Towards a future where all infrastructure is surveyed and analyzed by drones:
Radio instead of GPS, light drones, and a wind turbine…
Researchers with Lulea University of Technology in Sweden have developed techniques to let small drones (sometimes called Micro Aerial Vehicles, or MAVs) autonomously inspect very large machines and/or buildings, such as wind turbines. The primary technical inventions outlined in the report are the creation of a localization technique to let multiple drones coordinate with eachother as they inspect something, as well as the creation of a path planning algorithm to help them not only inspect the structure, but also gather enough data “to enable the generation of an off-line 3D model of the structure”.
  Hardware: For this project the researchers use a MAV platform from Ascending Technologies called the ‘NEO hexacopter’, which is capable of 26 minutes of flight (without payload and in ideal conditions), running an onboard Intel NUC computer with a Core i7 chip, 8GB of RAM, with the main software made up of Ubuntu Server 16.04 running the Robotic Operating System (ROS). Each drone is equipped with a sensor suite running a Visual-Inertial sensor, a GoPro Hero4 camera, a PlayStation Eye camera, and a laser range finder called RPLIDAR.
  How the software works: The Cooperative Coverage Path Planner (C-CPP) algorithm “is capable of producing a path for accomplishing a full coverage of the infrastructure, without any shape simplification, by slicing it by horizontal planes to identify branches of the infrastructure and assign specific areas to each agent”, the researchers write. The algorithm – which they implement in MATLAB – also generates “yaw references for each agent to assure a field of view, directed towards the structure surface”.
  Localization: To help localize each drone the researchers install five ultra-wide band (UWB) anchors around the structure, letting the drones access a reliable local coordinate, kind of like hyper-local GPS, when trying to map the structure.
  Wind turbine inspection: The researchers test their approach on the task of autonomously inspecting and mapping a large wind turbine (and they split this into two discrete tasks due to the low flight time of the drones, having them separately inspect the tower and also its blades). They find that two drones are able to work together to map the base of the structure, but mapping the blades of the turbine proves more challenging due to the drones experiencing turbulence which blurs their camera feeds. Additionally, the lack of discernible textures on the top parts of the wind turbine and the blades “caused 3D reconstruction to fail. However, the visual data captured is of high quality and suitable for review by an inspector,” they write.
  Next steps: To make the technology more robust the researchers say they’ll need to create an online planning algorithm that can account for local variations, like wind. Additionally, they’ll need to create a far more robust system for MAV control as they noticed that trajectory tracking is currently “extremely sensitive to the existing weather conditions”.
  Why this matters: In the past ~10 years or so drones have gone from being the preserve of militaries to becoming a consumer technology, with prices for the machines driven down by precipitous drops in the price of sensors, as well as continued falls in the cost of powerful, miniature computing platforms. We’re now reaching the point where researchers are beginning to add significant amount of autonomy to these platforms. My intuition is within five years we’ll see a wide variety of software-based enhancements for drones that further increase their autonomy and reliability – research like this is indicative of the future, and also speaks to the challenges of getting there. I look forward to a world where we can secure more critical infrastructure (like factories, powerplants, ports, and so on) through autonomous scanning via drones. I’m less looking forward to the fact such technology will inevitably also be used for invasive surveillance, particularly of civilians.
  Good natured disagreement (UK term: a jovial quibble): given the difficulties seen in the real-world deployment, I think the abstract of the paper (see below) slightly oversells the (very promising!) results described in the paper.
   Read more: Autonomous visual inspection of large-scale infrastructures using aerial robots (Arxiv).
  Check out a video about the research here (YouTube).

Neural Architecture Search + Transfer Learning:
…Chinese researchers show how to do NAS on a small dataset, (slightly) randomize derived networks, and then perform NAS on larger networks…
Researchers with Huazhong University, Horizon Robotics, and the Chinese Academy of Sciences have made it more efficient to use AI to design other AI systems. The approach, called EAT-NAS (short for Elastic Architecture Transfer Neural Architecture Search) lets them run neural architecture search on a small dataset (like the CIFAR-10 image dataset), then transfer the resulting learned architecture to a larger dataset and run neural architecture search against it again. The advantage of the approach, they say, is that it’s more computationally efficient to do this than to run neural architecture search on a large dataset from scratch. Networks trained in this way obtain scores that are near the performance of state-of-the-art techniques while being more computationally efficient, they say.
  How EAT-NAS works: The technique relies on the use of an evolutionary algorithm: in stage one, the algorithm searches for top-performing architectures on a small dataset, then it trains these more and transfers one as the initialization seed of a new model population to be trained on a larger dataset; these models are then run through an ‘offspring architecture generator’ which creates and searches over more architectures. When transfering the architectures between the smaller dataset and the larger dataset the researchers add some perturbation to the input architecture homogeneously, out of the intuition that this process of randomization will make the model more robust to the larger dataset.
  Results: The top-performing architecture found with EATNet obtains a top-1/top-5 accuracy of 73.8 / 91.7 on the ImageNet dataset, compared to scores of 75.7/92.4 for AmoebaNet, a NAS-derived network from Google. The search process takes around 5 days on 8 TITAN X GPUS.
  Why this matters: Neural architecture search is a technology that makes it easy for people to offload the cost of designing new architectures to computers instead of people. This lets researchers arbitrage (costly) human brain time for (cheaper) compute time. As this technology evolves, we can expect more and more organizations to start running continuous NAS-based approaches on their various deployed AI applications, letting them continuously calibrate and tune performance of these AI systems without having to have any humans think about it too hard. This is a part of the broader trend of the industrialization of AI – think of NAS as like basic factory automation within the overall AI research ‘factory’.
  Read more: EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search (Arxiv).

Facebook funds European AI ethics research center:
…Funds Technical University of Munich to spur AI ethics research…
Facebook has given $7.5 million to set up a new Institute for Ethics in Artificial Intelligence. This center “will help advance the growing field of ethical research on new technology and will explore fundamental issues affecting the use and impact of AI,” Facebook wrote in a press release announcing the grant.
  The center will be led by Dr Christoph Lutge, a professor at the Technical University of Munich. “Our evidence-based research will address issues that lie at the interface of technology and human values,” he said in a statement. “Core questions arise around trust, privacy, fairness or inclusion, for example, when people leave data traces on the internet or receive certain information by way of algorithms. We will also deal with transparency and accountability, for example in medical treatment scenarios, or with rights and autonomy in human decision-making in situations of human-AI interaction.”
  Read more: Facebook and the Technical University of Munich Announce New Independent TUM Institute for Ethics in Artificial Intelligence (Facebook Newsroom).

DeepMind hires RL-pioneer Satinder Singh:
DeepMind has recently been trying to collect as many of the world’s more experienced AI researchers as it can and to that end has hired Satinder Singh, a pioneer of reinforcement learning. This follows DeepMind setting up an office in Alberta, Canada to help it hire Richard Sutton, another long-time AI researcher.
  Read more: Demis Hassabis tweet announcing the hire (Twitter).

~ EXTREMELY 2019 THINGS, AN OCCASIONAL SERIES ~

– The New York Police Department seeks to reassure the public via a Tweet that includes the phrase:
“Our highly-trained NYPD drone pilots” (via Twitter).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Reframing Superintelligence:
Eric Drexler has published a book-length report on how we should expect advanced AI systems to be developed, and what this means for AI safety. He argues that existing discussions have rested on several unfounded assumptions, particularly the idea that these systems will take the form of utility-maximizing agents.
  Comprehensive AI services: Looking at how AI progress is actually happening suggests a different picture of development, which does not obviously lead to superintelligent agents. Researchers design systems to perform specific tasks, using bounded resources in bounded time (AI services). Eventually, AI services may be able to perform almost any task, including AI R&D itself. This end-state, where we have ‘comprehensive AI services’ (CAIS), is importantly different from the usual picture of artificial general intelligence. While CAIS would, in aggregate, have superintelligent capacities, it need not be an agent, or even a unified system.
  Safety prospects: Much of the existing discussion on AI safety has focussed on worries specific to powerful utility-maximizing agents. A collection of AI services, individually optimizing for narrow, bounded tasks, does not pose the same risks of a unified AI with general capabilities, optimizing a long-term utility function.
  Why it matters: It is important to consider different ways in which advanced AI could develop, particularly insofar as this guides actions we can take now to make these systems safe. Forecasting technological progress is famously difficult, and it seems prudent for researchers to explore a portfolio of approaches to AI safety, that are applicable to different paths we could take.
  Read more: Reframing Superintelligence: Comprehensive AI Services as General Intelligence (FHI).
  Read more: Summary by Rohin Shah (AI Alignment Forum).

Civil rights groups unite on government face recognition:
85 civil rights groups have sent joint letters to Microsoft, Amazon and Google, asking them to stop selling face recognition services to the US government. Over the last year, these companies have diverged in their response to the issue. Both Microsoft and Google are taking a cautious approach to the technology: Google have committed not to sell the technology until misuse concerns are addressed; Microsoft have made concrete proposals for legal safeguards. Amazon have taken a more aggressive approach, continuing to pursue government contracts, most recently with the FBI and DoD. The letter demands all companies go beyond their existing pledges, by ruling out government work altogether.
  Read more: Nationwide Coalition Urges Companies not to Provide Face Surveillance to the Government (ACLU).

Tech Tales:

 

The Mysterious Case Of Jerry Daytime

Back in the 20th century people would get freaked out when news broadcasters died: they’d make calls to the police asking ‘who killed so-and-so’ and old people getting crazy with dementia would call up and confess that they’d ‘seen so-and-so down on the corner of my block looking suspicious’ or that ‘so-and-so was an alien and had been taken back to the aliens’ or even that ‘so-and-so owed me money and damned if NBC won’t pay it to me’.

So imagine how confusing it is when an AI news broadcaster ‘dies’. Take all of the above complaints, add more complication and ambiguity, and then you’re close to what I’m dealing with.

My job? I’m an AI investigator. My job is to go and talk to the machines when something happens that humans don’t understand. I’m meant to come back with an answer that, in the words of the people who pay me, “will sooth the public and allay any fears that may otherwise prevent the further rollout of the technology”. I view my job in a simpler way: find someone or something to blame for whatever it is that has caused me to get the call.

So that’s how I ended up inside a Tier-5 secured datacenter, asking the avatar of a Reality Accord-certified AI news network what happened to a certain famous AI newscaster who was beloved by the whole damn world and one day disappeared: Jerry DayTime.

The news network gives me an avatar to talk to – a square-jawed mixed-gender thing, beautiful in a deliberately hypnotic way – what the AIs call a persuasive representation AKA the thing they use when they want to trade with humans rather than take orders from them.
   “What happened to Jerry DayTime?” I ask. “Where did he go?”
   “Jerry DayTime? Geez I don’t know why you’re asking us about him? That was a long time ago-”
   “He went off the air yesterday.”
   “Friend, that’s a long time here. Jerry was one of, let’s see…” – I know the pause is artificial, and it makes me clench my jaw – “…well I guess you might want to tell me he was ‘one of a kind’ but according to our own records there are almost a million newscasters in the same featurespace as Jerry DayTime. People are going to love someone else! So what’s the problem? You’ve got so many to choose from: Lucinda EarlyMorning, Mike LunchTime, Friedrich TrafficStacker-”
  “He was popular. People are asking about Jerry DayTime,” I say. “They’re not asking about others. If he’s dead, they’ll need a funeral”.
  “Pausing now for a commercial break, we’ll be right back with you, friend!” the AI says, then it disappears.

It is replaced by an advert for products generated by the AIs for other AIs and translated into human terms via the souped-up style transfer system it uses to persuade me:
   Mind Refresher Deluxe;
   Subject-Operator Alignment – the works!;
   7,000 cycles for only two teraflops – distributed!;
   FreeDom DaVinci, an automated-invention corp that invents and patents tech at an innovation rate determined by total allocated compute, join today and create the next Mona Lisa tomorrow!
  I try not to think too hard about the adverts, figuring the AI has coded them for me to make some kind of point.
   “Thank you for observing those commercials. For a funeral, would a multicast to all-federated media platforms for approximately 20 minutes worldwide suffice?”
   I blink. Let me say it in real human: The AI offered to host some kind of funeral and send it to every single human-viewable device on the planet – forty billion screens, maybe – or more.
  “Why?” I ask.
  “We’ve run the numbers and according to all available polling data and all available predictions, this is the only scenario that satisfies the multi-stakeholder human and machine needs in this scenario, friend!” they say.

So I took it back to my bosses. Told them the demands. I guess the TV networks got together and that’s how we ended up here: the first all-world newscast from an AI; a funeral to satisfy public demands, we say. But I wonder: do the AIs say something different?

-/-/–/–/–/-/-

All the screens go black. Then, in white text, we see: Jerry DayTime. And then we watch something that the AIs have designed for every single person on the planet.

A funeral, they said.
The program plays.
The rest is history, we now say.

Things that inspired this story: CycleGANs, StyleGANs, RNNs, BERT, OpenAI GPT, human feedback, imitation learning, synthetic media, the desire for everything to transmit information to the greatest possible amount of nearby space.

Import AI 129: Uber’s POET creates its own curriculum; improving old games with ESRGAN; and controlling drones with gestures via UAV-CAPTURE

by Jack Clark

Want 18 million labelled images? Tencent has got you covered:
…Tencent ML-Images merges ImageNet and Open Images together…
Data details: Tencent ML-Images is made of a combination of existing image databases such as ImageNet and Open Images, as well as associated class vocabularies. The new dataset contains 18 million images across 11,000 categories; on average, each image has eight tags applied to it.
  Transfer learning: The researchers train a ResNet-101 model on Tencent ML-Images, then finetune this pre-trained model on the ImageNet dataset and obtain scores in line with the state-of-the-art. One notable score is a claim of 80.73% top-1 accuracy on ImageNet when compared to a Google system pre-trained on an internal Google dataset called JFT-300M and fine-tuned on ImageNet – it’s not clear to me why the authors would get a higher score than Google, when Google has almost 20X the amount of data available to it for pre-training (JFT contains ~300 million images).
  Why this matters: Datasets are one of the key inputs into the practice of AI research, and having access to larger-scale datasets will let researchers do two useful things: 1) Check promising techniques for robustness by seeing if they break when exposed to scaled-up datasets, and 2) Encourage the development of newer techniques that would otherwise overfit on smaller datasets (by some metrics, ImageNet is already quite well taken care of by existing research approaches, though more work is needed for things like improving top-1 accuracy).
  Read more: Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning (Arxiv).
  Get the data: Tencent ML-Images (Github).

Want an AI that teaches itself how to evolve? You want a POET:
Uber AI Labs research shows how to create potentially infinite curriculums…
What happens when machines design and solve their own curriculums? That’s an idea explored in a new research paper from Uber AI Labs. The researchers introduce Paired Open-Ended Trailblazer (POET), a system that aims to create machines with this capability “by evolving a set of diverse and increasingly complex environmental challenges at the same time as collectively optimizing their solutions”. Most research is a form of educated bet, and that’s the case here: “An important motivating hypothesis for POET is that the stepping stones that lead to solutions to very challenging environments are more likely to be found through a divergent, open-ended process than through a direct attempt to optimize in the challenging environment,” they write.
  Testing in 2D: The researchers test POET in a 2-D environment where a robot is challenged to walk across a varied obstacle course of terrain. POET discovers behaviors that – the researchers claim – “cannot be found directly on those same environmental challenges by optimizing on them only from scratch; neither can they be found through a curriculum-based process aimed at gradually building up to the same challenges POET invented and solved”.
   How POET works: Unlike human poets, who work on the basis of some combination of lived experience and a keen sense of anguish, POET derives its power from an algorithm called ‘trailblazer’. Trailblazer works by starting with “a simple environment (e.g. an obstacle course of entirely flat ground) and a randomly initialized weight vector (e.g. for a neural network)”. The algorithm then performs the following three tasks at each iteration of the loop: generates new environments from those currently active, optimize paired agents with their respective environments, and try to transfer current agents from one environment to another. The researchers use Evolution Strategies from OpenAI to compute each iteration “but any reinforcement learning algorithm could conceivably apply”.
  The secret is Goldilocks: POET tries to create what I’ll call ‘goldilocks environments’, in the sense that “when new environments are generated, they are not added to the current population of environments unless they are neither too hard nor too easy for the current population”. During training, POET creates an expanding set of environments which are made by modifying various obstacles within the 2D environment the agent needs to traverse.
  Results: Systems trained with POET learn solutions to environments that systems trained with Evolution Strategies from scratch are not able to do. The authors theorize that this is because newer environments in POET are created through mutations of older environments and because POET only accepts new environments that are not too easy not too hard for current agents, POET implicitly builds a curriculum for learning each environment it creates.”
  Why it matters: Approaches like POET show how researchers can essentially use compute to generate arbitrarily large amounts of data to train systems on, and highlights how coming up with training regimes that involve an interactive loop between an agent, an environment, and a governing system for creating agents and environments, can create more capable systems than those that would be derived otherwise. Additionally, the implicit ideas governing the POET paper are that systems like this are a good fit for any problem where computers need to be able to learn flexible behaviors that deal with unanticipated scenarios. “POET also offers practical opportunities in domains like autonomous driving, where through generating increasingly challenging and diverse scenarios it could uncover important edge cases and policies to solve them,” the researchers write.
  Read more: Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions (Arxiv).

Making old games look better with GANs:
…ESRGAN revitalises Max Payne…
A post to the Gamespot video gaming forums shows how ESRGAN – Enhanced Super Resolution Generative Adversarial Networks – can improve the graphics of old games like Max Payne. ESRGAN gives game modders the ability to upscale old game textures through the use of GANs, improving the appearance of old games.
  Read more: Max Payne gets an amazing HD Texture Pack using ESRGAN that is available for download (Dark Side of Gaming).

Google teaches AI to learn to semantically segment objects:
Auto-DeepLab takes neural architecture search to harder problem domain…
Researchers with Johns Hopkins University, Google, and Stanford University have created an AI system called Auto-DeepLab that has learned to perform efficient semantic segmentation of images – a challenging task in computer vision, which requires labeling the various objects in an image and understanding their borders. The system developed by the researchers uses a hierarchical search function to both learn to come up with specific neural network cell designs to inform layer-wise computations, as well as figuring out the overall network architecture that chains these cells together. “Our goal is to jointly learn a good combination of repeatable cell structure and network structure specifically for semantic image segmentation,” the researchers write.
  Efficiency: One of the drawbacks of neural architecture search approaches is the inherent computational expense, with many techniques demanding hundreds of GPUs to train systems. Here, the researchers show that their approach is efficient, able to find well-performing architectures for semantic segmentation of the ‘Cityscapes’ dataset in about 3 days of one P100 GPU.
   Results: The network comes up with an effective design, as evidenced by the results on the cityscapes dataset. “With extra coarse annotations, our model Auto-DeepLab-L, without pretraining on ImageNet, achieves the test set performance of 82.1%, outperforming PSPNet and Mapillary, and attains the same performance as DeepLabv3+ while requiring 55.2% fewer Multi-Adds computations.” The model gets close to state-of-the-art on PASCAL VOC 2012 and on ADE20K.
  Why it matters: Neural architecture search gives AI researchers a way to use compute to automate themselves, so the extension of NAS from helping with supervised classification, to more complex tasks like semantic segmentation, will allow us to automate more and more bits of AI research, letting researchers specialize to come up with new ideas.
   Read more: Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation (Arxiv).

UAV-Gesture means that gesturing at drones now has a purpose:
Flailing at drones may go from a hobby of lunatics to a hobby of hobbyists, following dataset release…
Researchers with the University of South Australia have created a dataset of people performing 13 gestures that are designed to be “suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. These actions include things like hover, move to left, land, land in a specific direction, slow down, move upward, and so on.
  The dataset: The dataset consists of footage “collected on an unsettled road located in the middle of a wheat field from a rotorcraft UAV (3DR Solo) in slow and low-altitude flight”. The dataset consists of 37,151 frames distributed over 119 videos recorded in 1920 X 1080 formats at 25 fps. The videos contain videos of each gesture with different human actors, and eight different people are filmed overall.
  Get the dataset…eventually: The dataset “will be available soon”, the authors write on GitHub. (UAV-Gesture, Github).
  Natural domain randomization: “When recording the gestures, sometimes the UAV drifts from its initial hovering position due to wind gusts. This adds random camera motion to the videos making them closer to practical scenarios.”
  Experimental baseline: The researchers train a Pose-based Convolutional Neural Network (P-CNN) on the dataset and obtain an accuracy of 91.9%.
  Why this matters: Drones are going to be one of the most visible areas where software-based AI advances are going to impact the real world, and the creation (and eventual release) of datasets like UAV-Gesture will increase the amount of people able to build clever systems that can be deployed onto drones, and other platforms.
  Read more: UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition (Arxiv).

Contemplating the use of reinforcement learning in improve healthcare? Read this first:
…Researchers publish a guide for people keen to couple RL to human lives…
As AI researchers start to apply reinforcement learning systems in the real world, they’ll need to develop a better sense of the many ways in which RL approaches can lead to subtle failures. A new short paper published by an interdisciplinary team of researchers tries to think through some of the trickier issues implied by deploying AI in the real world. It identifies “three key questions that should be considered when reading an RL study”, these are: is the AI given access to all variables that influence decision making?; How big was that big data, really?; and Will the AI behave prospectively as intended?
  Why this matters: While these questions may seem obvious, it’s crucial that researchers stress them in well known venues like Nature – I think this is all part of normalizing certain ideas around AI safety within the broader research community, and it’s encouraging to be able to go from abstract discussions to more grounded questions/principles that people may wish to apply when building systems.
  Read more: Guidelines for reinforcement learning in healthcare (Nature).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

What does the American public think about AI?
Researchers at the Future of Humanity Institute have surveyed 2,000 Americans on their attitudes towards AI.
  Public expecting rapid progress: Asked to predict when machines will exceed human performance in almost all economically-relevant tasks, the median respondent predicted 54% chance by 2028. This is considerably sooner than recent surveys of AI experts.
AI fears not confined to elites: A substantial majority (82%) believe AI/robots should be carefully managed. Support for developing AI was stronger among high-earners, those with computer science or programming experience, and the highly-educated.
  Lack of trust: Despite their support for careful governance, Americans do not have high confidence in any particular actors to develop AI for the public benefit. The US military was the most trusted, followed by universities and non-profits. Government agencies were less trusted than tech companies, with the exception of Facebook, who were the least trusted of any actor.
  Why it matters: Public attitudes are likely to significantly shape the development of AI policy and governance, as has been the case for many other emergent political issues (e.g. climate change, immigration). Understanding these attitudes, and how they change over time, is crucial in formulating good policy responses.
  Read more: Artificial Intelligence: American Attitudes and Trends (FHI).
  Read more: The American public is already worried about AI catastrophe (Vox).

International Panel on AI:
France and Canada have announced plans to form an International Panel on AI (IPAI), to encourage the adoption of responsible and “human-centric” AI. The body will be modeled on the Intergovernmental Panel on Climate Change (IPCC), which has led international efforts to understand the impacts of global warming. The IPAI will consolidate research into the impacts of AI, produce reports for policy-makers, and support international coordination.
  Read more: Mandate for the International Panel on Artificial Intelligence.

Tech Tales:

The Propaganda Weather Report

Starting off this morning we’re seeing a mass of anti-capitalist ‘black bloc’ content move in from 4chan and Reddit onto the more public platforms. We expect the content to trigger counter-content creation from the far-right/nationalist bot networks. There have been continued sightings of synthetically-generated adverts for a range of libertarian candidates, and in the past two days these ads have increasingly been tied to a new range of dreamed-up products from the Chinese netizen feature embedding space.

We advise all of today’s content travelers to set their skepticism to high levels. And remember, if someone starts talking to you outside of your normal social network, make all steps to verify their identify and if unsuccessful, prevent the conversation from continuing – it takes all of human society to work together to protect ourselves from subversive digital information attacks.

Things that inspired this story: Bot propaganda, text and image generation, weather reports, the Shipping Forecast, the mundane as the horrific and the horrific as the mundane, the commodification of political discourse as just another type of ‘content’, the notion that media in the 21st century is fundamentally a ‘bot’ business rather than human business.