Import AI

Import AI 139: Making better healthcare AI systems via audio de-identification; teaching drones to help humans fight fires; and why language models could be smarter than you think

Stopping poachers with machine learning:
…Finally, an AI-infused surveillance & response system that (most) people approve of…
Researchers and staffers with the University of Southern California, Key Biodiversity Area Secretariat, World Wide Fund for Nature, Wildlife Conservation Society, and Uganda Wildlife Authority, have used machine learning to increase the effectiveness of rangers who try to stop poaching in areas of significant biodiversity.

The project saw USC researchers collaborate with rangers in areas vulnerable to poaching in Uganda and Cambodia, and involved analyzing historical records of incidents of poaching and building predictive models to attempt to predict where poachers might strike next.
  The results are encouraging: the researchers tested their system in Murchison Falls National Park (MFNP) in Uganda and the Srepok Wildlife Sanctuary (SWS) in Cambodia, following an earlier test of the system in the Queen Elizabeth National Park (QENP) in Uganda. In MFNP, using the system, “park rangers detected poaching activity in 38 cells,” they write. Additionally, “the amount of poaching activity observed is highest in the high-risk regions and lowest in the low-risk regions”, suggesting that the algorithm developed by the researchers is learning useful patterns. The researchers also deployed the model in the SWS park in Cambodia and again observed that regions the algorithm classified as high risk had higher incidences of poaching.

Impact: To get a sense of the impact of this technique, we can compare the results of the field tests to typical observations made by the rangers.
  In the SWS park in Cambodia during 2018, the average number of snares confiscated each month was 101. By comparison, during the month the rangers were using the machine learning system they removed 521 snares. “They also confiscated a firearm and they saw, but did not catch, three groups of poachers in the forest”.

Different parks, different data: As most applied scientists know, real world data tends to contain its own surprising intricacies. Here, this manifests as radically different distributions of types of datapoints across the different parks – in the MFNP park in Uganda around 15% of the collected datapoints are for areas where poaching is thought to have occurred, compared to around 4.3% for QENP and ~0.25% for the SWS park in Cambodia. Additionally, the makeup of each dataset changes over the year as the parks change various things that factor into the data collection, like the number of rangers, or the routes they cover, and so on.

Why this matters: Machine learning approaches are giving us the ability to build a sense&respond infrastructure for the world, and as we increase the sample efficiency and accuracy of such techniques we’ll be able to better build systems to help us manage an increasingly unstable planet. It’s deeply gratifying to see ML being used to protect biodiversity.
  Read more: Stay Ahead of Poachers: Illegal Wildlife Poaching Prediction and Patrol Planning Under Uncertainty with Field Test Evaluations (Arxiv).

#####################################################

Want to check-in for your flight in China? Use facial recognition:
I’ve known for some time that China has rolled out facial recognition systems in its airports to speed-up the check-in process. Now, Matthew Brennan has recorded a short video showing what this is like in practice from the perspective of a consumer. I imagine these systems will become standard worldwide within a few years.
  Note: I’ve seen a bunch of people on Twitter saying variations of “this technology makes me deeply uncomfortable and I hate it”. I think I have a slightly different opinion here. If you feel strongly about this would love to better understand your position, so please tweet at me or email me!
  Check out the facial recognition check-in here (Matthew Brennan, Twitter).

#####################################################

Why language models could be smarter than you think:
…LSTMs are smarter than you think, say researchers…
Researchers with the Cognitive Neuroimaging Unit at the ‘NeuroSpin Center’, as well as Facebook AI Research and the University of Amsterdam, have analyzed how LSTMs keep track of certain types of information, when tested for their ability to model language structures at longer timescales. The analysis is meant to explore “whether these generic sequence-processing devices are discovering genuine structural properties of language in their training data, or whether their success can be explained by opportunistic surface-pattern-based heuristics”, the authors write.

The researchers study a pre-trained model “composed of a 650-dimensional embedding layer, two 650-dimensional hidden layers, and an output layer with vocabulary size 50,000”. They evaluate this model on a set of what they call ‘number-agreement tasks’ where they test subject-verb agreement in increasingly challenging setups (eg, a simple case is looking at network activations for ‘the boy greets the guy’, with harder ones taking the form of things like ‘the boy most probably greets the guy’ and ‘the boy near the car kindly greets the guy’, and so on).

Neurons that count, sometimes together: During analysis, they noticed the LSTM had developed “two ‘grandmother’ cells to carry number features from the subject to the verb across the intervening material”. They found that these cells were sometimes used to help the network decide in particularly tricky cases: “The LSTM also possesses a more distributed mechanism to predict number when subject and verb are close, with the grandmother number cells only playing a crucial role in more difficult long-distance cases”.
  They also discovered a cell that “encodes the presence of an embedded phrase separating the main subject-verb dependency, and has strong efferent connections to the long-distance number cells, suggesting that the network relies on genuine syntactic information to regulate agreement-feature percolation”.

Why this matters: “Strikingly, simply training an LSTM on a language-model objective on raw corpus data brought about single units carrying exceptionally specific linguistic information. Three of these units were found to form a highly interactive local network, which makes up the central part of a ‘neural’ circuit performing long-distance number agreement”, they write. “Agreement in an LSTM language-model cannot be entirely explained away by superficial heuristics, and the networks have, to some extent, learned to build and exploit structure-based syntactic representations, akin to those conjectured to support human-sentence processing”
  The most interesting thing about all of this is the apparent sophistication that emerges as we train these networks. It seems to inherently support some of the ideas outlined by Richard Sutton (covered in Import AI #138) about the ability for relatively simple networks to – given sufficient compute – develop very sophisticated capabilities.
  Read more: The emergence of number and syntax units in LSTM language models (Arxiv).

#####################################################

Teaching drones to help humans fight fires:
…Towards a future where human firefighters are guarded by flying machines…
Researchers with the Georgia Institute of Technology have developed an algorithm to let humans and drones work together when fighting fires, with the drones now able to analyze the fire from above and relay that information to firefighters. “The proposed algorithm overcomes the limitations of prior work by explicitly estimating the latent fire propagation dynamics to enable intelligent, time-extended coordination of the UAVs in support of on-the-ground human firefighters,” they write.

The researchers use FARSITE, software for simulating wildfire propagation that is already widely used by the United States’ National Park Service and Forest Service. They use an Adaptive Extended Kalman Filter (AEKF) to make predictions about where the fire is likely to spread to. They eventually develop a basic system that works in simulation which lets them coordinate the actions of drones and humans, so that the drones learn to intelligently inform people about fire propagation. They also implement a “Human Safety Module” which attempts to work out how safe the people are, and how safe they will be as the fire location changes over time.

Three tests: They test the system on three scenarios: stationary fire, moving fire, and one where the fire moves and grows in area over time.The tests mostly tell us that you need to dramatically increase the number of drones to satisfy human safety guarantees (eg, in the case of a moving and growing fire you need over 25 drones to provide safety for 10 humans. Similarly, in this scenario you need at least 7 drones to be able to significantly reduce your level of uncertainty about the map of the fire and how it will change). Their approach outperforms prior state-of-the-art in this domain.

Why this matters: I think in as little as a decade it’s going to become common to see teams of humans and drones working together to deal with wildfires or building fires, and papers like this show how we might use adaptive software systems to increase the safety of human-emergency responders.
  Read more: Safe Coordination of Human-Robot Firefighting Teams (Arxiv).

#####################################################

Google wants it to be easier to erase personal information from audio logs:
…Audio de-ID metric & benchmark should spur research into cleaning up datasets…
Medical data is one of the most difficult types of data for machine learning researchers to work with, due to the fact it is full of personally identifiable health information. This means that people need to invest in tools to remove this personal information from medical data, so that people can build secondary applications on top of it. Coupled with this is the fact that in recent years more and more medical services are accessed digitally, leading to a growth in the amount of digital healthcare-relevant data, all of which needs to have the personal information removed before it can be processed by most machine learning approaches.

Now, researchers with Google are trying to work on this problem in the audio domain by releasing Audio de-ID, a new metric for measuring the success at de-identifying audio logs, and an evaluation benchmark for evaluating systems. The evaluation benchmark tests models against a Google-curated dataset made of the ‘Switchboard’ and ‘Fisher’ datasets with Personal Health Information (PHI) tagged and labelled in the data, and challenges models to automatically slice personally identifiable information out of datasets.

Taking personal information out of audio logs: Removing personally identifiable information from audio logs is a tricky task. Google’s pipeline works as follows: “Our pipeline first produces transcripts from the audio using [Audio Speech Recognition], proceeds by running text-based [Named Entity Recognition] tagging, and then redacts [Personal Health Information] tokens, using the aligned token boundaries determined by ASR. Our tagger relies on the state-of-the-art techniques for solving the audio NER problem of recognizing entities in audio transcripts. We leverage the available ASR technology, and use its component of alignment back to audio.

Why this matters: The (preliminary) benchmark results in the paper show that Audio Speech Recognition performance is “the main impedance towards achieving results comparable to text de-ID”, suggesting that as we develop more capable ASR models we will see improvements in our ability to automatically clean datasets so we can do more useful things with them.

Want the data? Google says you should be able to get it from this link referenced in the paper, but the link currently 404s (as of Sunday the 24th of March): https://g.co/audio-ner-annotations-data
   Read more: Audio De-identification: A New Entity Recognition Task (Arxiv).

#####################################################

It just got (a little bit) easier to develop & test AI algorithms on robot hardware:
…Gym-gazebo2 lets you simulate robots and provides an OpenAI Gym-like interface…
Researchers with Acutronic Robotics have released gym-gazebo2, software that people can use to develop and compare reinforcement learning algorithms’ performance on robots simulated within the ‘ROS 2’ and ‘Gazebo’ software platforms. Gym-gazebo2 contains simulation tools, middleware software, and a high-fidelity simulation of the ‘MARA’ robot which is a product developed by Acutronic Robotics.

Gym-Gazebo2 ingredients: The software consists of three main chunks of software: ROS 2, Gazebo, and Gym-Gazebo2. Gym-Gazebo2 can be installed via a docker container, which should simplify setup.

MARA, MARA, MARA: Initially, Gym-Gazebo2 ships with four environments based around Acutronic Robotics’s Modular Articulated Robotic Arm (MARA) system. These environments are:

  • MARA: The agent is rewarded for moving its gripper to a target position.
  • MARA Orient: The agent is rewarded for moving its gripper to a target position and specific orientation.
  • MARA Collision: The agent is rewarded in the same way as MARA, but is punished if the robot collides with anything.
  • MARA Collision Orient: The agent is rewared in the same way as MARA Orient, but is punished if it collides with anything.

  (It’s worth noting that these environments are very, very simple: real world robot tasks tend to involve more multi-step scenarios, usually with additional constraints. However, these environments could prove to be useful for validating performance of a given approach early in development.)

Free algorithms: Alongside gym-gazebo2, Acutronic Robotics is also releasing ros2learn, a collection of algorithms (based on OpenAI Baselines), as well as some pre-built experimental scripts for running algorithms like PPO, TRP, and ACKTR on the ‘MARA’ robot platform.

Why this matters: Robotics is about to be revolutionized by AI; following years of development by researchers around the world, deep learning techniques are maturing to the point that they can be applied to robots to solve tasks that were previously impossible or very, very, very expensive and complicated to program. Software like gym-gazebo2 will make it easier for people to validate algorithms in a high-fidelity simulation, and if people happen to like the MARA arm itself can also let them validate stuff in reality (though this tends to be expensive and fraught with various kinds of confounding pain, so it’ll depend on the popularity of the MARA hardware platform).
  Get the ‘ros2learn’ code from GitHub here.
  Get the code for gym-gazebo2 from GitHub here.
  Read more: gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo (Arxiv).

#####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Understanding the European Union’s AI ambitions:
In discussions about global leadership in AI, the EU is rarely considered a serious contender. This report by Charlotte Stix at the Leverhulme Center for the Future of Intelligence at the University of Cambridge, explores this view, and provides a detailed picture of the EU’s AI landscape and how this could form the basis for leadership in AI.

What the EU wants: The EU’s strategy is set out in 2 key documents: (1) The European Commission’s Communication on AI, which – among other things – outlines the EU’s ambition to put ‘ethical AI’ at the core of their strategy; (2) The Coordinated Plan on AI, which outlines how member states could coordinate their strategies and resources to improve European competitiveness in AI.

What the EU has: The EU’s central funding commitments for AI R&D remain modest – €1.5bn between 2018-2020. This is fairly trivial when compared with government funding in China (tens of billions of dollars per year), and even private companies (Alphabet’s 2018 R&D spend was $21bn). The Coordinated Plan includes an ambition to reach €20bn total funding per year by 2020, from member states and the private sector, though there are not yet concrete plans for how to achieve this. At present, inbound VC investment in AI is 6x lower than the US. The EU is having problems retaining talent and organizations, with promising companies increasingly acquired by international companies. There are proposals for the EU to establish large-scale projects in AI research, leveraging their impressive track-record in this domain (e.g. CERN, the Human Brain Project), but these remain relatively early-stage.

Why it matters: The EU is making a bet on leadership in ethical and human-centric AI, as a way to play an important role in the global AI landscape. This is an interesting strategy which sets it apart from other actors, e.g. US and China, who are focussed more on leadership in AI capabilities. The EU has expressed ambitions to cooperate closely with other countries and groups that are aligned with their vision (e.g. via the new International Panel on AI), which is encouraging for global coordination on these issues. Being made up of 27/28 member states, the EU’s experience in coordinating between actors could prove an advantage in this regard.
  Read more: Survey of the EU’s AI ecosystem (Charlotte Stix)  

#####################################################

New AI institute at Stanford launches + Bill Gates on AI risk
The Stanford Institute for Human-Centered AI (HAI) has formally launched. HAI is led by Fei-Fei Li, former director of Stanford AI lab and co-founder of AI4ALL, and John Etchemendy, philosopher and former Stanford provost.

The mission: HAI’s mission is informed by three overarching principles – that we must understand and forecast AI’s human impact, and guide its development on this basis; that AI should augment, rather than replace, humans; and that we must develop (artificial) intelligence as “subtle and nuanced as human intelligence.” The institute aims to “bring humanities and social thinking into tech,” to address emerging issues in AI.

Bill Gates on AI risk: At HAI’s launch event, Gates used his keynote speech to describe his hopes and fears for advanced AI. He said “the world hasn’t had that many technologies that are both promising and dangerous”, comparing AI to nuclear fission, which yielded both benefits in terms of energy, and serious risks from nuclear weapons. Gates had previously expressed some scepticism about AI risk, and this speech suggests he has moved towards a more cautious outlook.
  Read more: Launch announcement (Stanford).
  Read more: Bill Gates – AI is like nuclear weapons and nuclear energy in danger and promise (Vox).

#####################################################

Tech Tales:

Beaming In

We had been climbing the mountain for half an hour, and already a third of the people had cut their feeds and faded out of the country – they had discussed enough to come to a decision, and had disappeared to do the work. I stayed, looking at butterflies playing near cliff edges, and at birds diving between different trees, while talking about moving vast amounts of money for the purposes of buttressing a particular ideology and crushing another. The view was clear today, and I could see in my overlay the names of each of the towns and mountains and train stations on the horizon.

As I walked, I talked with the others. I suppose similar hikes have happened throughout history – more than a thousand years ago, a group of Icelandic elders hiked to a desolate riverbank in a rift valley where they formed the world’s oldest democracy, the ‘Althing’. During various world wars many leaders have gathered at various country parks and exchanged ideas with one another, making increasingly strange and influential plans. And of course there are the modern variants: Bilderberg hikes in various thinly-disclosed locations around Europe. Perhaps people use nature and walking within it as a kind of stimulant, to help them make decisions of consequence? Perhaps, they use nature because they have always used nature?

Now, on this walk, after another hour, most of the people have faded out, till it’s just me, and two others. One of them runs a multi-national with interests in intelligence, and the other works on behalf of both the US government and Chinese government on ‘track two’ diplomacy initiatives. We discuss things I cannot talk about, and make plans that will influence many people.

—–

But if you were to look at us from a distance, you wouldn’t see anybody at all. You’d see a mass of drones, hauling around the smell, texture and sense equipment which is needed to render high-fidelity simulations and transmit them to each of us, dialing in from our home offices and yachts and (in a few, rare cases) bunkers and jets. We use our machines to synthesize a beautiful reality for us, and we ‘walk’ within it, together making plans, appearing real to eachother, but appearing to people in the places that we walk as a collection of whirring metal. I am not sure when the last walk occurred with this group where a human decision-maker attended.

We do this because we believe in nature, and we believe that because we are in the presence of nature, our decisions are continuous with those of the past. I think habit plays a part, but so does a desire for absolution, and the knowledge that though trees and butterflies and crows may judge us for what we do, they cannot talk back.

Things that inspired this story: The general tradition of walking through history with particular emphasis on modern walks such as those taken by the Bilderberg group; virtual reality and augmented reality; increasingly capable drones; 5G; force feedback systems; the tendency for those in power to form increasingly closed-off nested communities.

Import AI 138: Transfer learning for drones; compute and the “bitter lesson” for AI research; and why reducing gender bias in language models may be harder than people think

Why the unreasonable effectiveness of compute is a “bitter lesson” for AI research:
…Richard Sutton explains that “general methods that leverage computation are ultimately the most effective”…
Richard Sutton, one of the godfathers of reinforcement learning*, has written about the relationship between compute and and AI progress, noting that the use of larger and larger amounts of computation paired with relatively simple algorithms has typically led to the emergence of more varied and independent AI capabilities than many human-designed algorithms or approaches. “The only thing that matters in the long run is the leveraging of computation”, Sutton writes.

Many examples, one rule: Some of the domains where computers have beaten methods based on human knowledge include Chess, Go, speech recognition, and many examples in computer vision.

The bitter lesson: “We have to learn the bitter lesson that building in how we think we think does not work in the long run,” Sutton says. “The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.”

Why this matters: If compute is the main thing that unlocks new AI capabilities, then we can expect most of the strategic (and related geopolitical) landscape of AI research to re-configure in coming years around a compute-centric model, which will likely have significant implications for the AI community.
  Read more: The Bitter Lesson (Rich Sutton).
  * Richard Sutton literally (co)wrote the book on reinforcement learning.

#####################################################

AI + Comedy, with Naomi Saphra!
…Comedy set lampoons funding model in AI, capitalism, NLP…
Naomi Saphra, an NLP researcher, has put a video online of her doing stand-up AI comedy at a venue in Edinburgh, Scotland. Check out the video for her observations on working in AI research, funding AI research, tales about Nazi rocket researchers, and more.

  “You always have to ask yourself, who else finds this interesting? If you mean who reads my papers and cites my papers? The answer is nobody. If you mean who has given me money? The answer is mostly evil… you see I have the same problem as anyone in this world – I hate capitalism but I love money”.
  Watch her comedy set here: Naomi Saphra, Paying the Panopticon (YouTube).

#####################################################

Prototype experiment shows why robots might tag-team in the future:
…Use of a tether means 1+1 is greater than 2 here…
Researchers with the University of Tokyo, Japan, have created a two-robot team that can map its surroundings and traverse vertiginous terrain via the use of a tether, which lets an airborne drone vehicle assist a ground vehicle.

The drone uses an NVIDIA Jetson TX2 chip to perform onboard localization, mapping and navigation. The drone is equipped with a camera, time-of-flight sensor, and a laser sensor for height measurement. The ground vehicle is “based on a commercially available caterpillar platform” using a UP Core processing unit. The ground robot is running a copy of the robot operating system, which the airborne drone uses to connect to it.

Smart robots climb with a dumb tether: The robots work together like this: the UAV flies above the UGV and maps the terrain, feeding data down to the ground robot, giving it awareness of its surroundings. When the robots detect an obstruction, the UAV wraps the tether (which has a grappling hook on its end) around a tall object, and the UGV uses the secured tether to climb the object.

Real world testing: The researchers test their system in a small-scale real world experiment and find that the approach works, but has some problems: “Since we did not have a [tether] tension control mechanism due to the lack of sensor, the tether needed to be extended from the start and as the result, the UGV suffered from the entangled tether many times.”

Why this matters: In the future, we can imagine various robots of different types collaborating with eachother, using specialisms to operate as a unit, becoming more than the sum of their parts. Though as this experiment indicates we’re still at a relatively early stage of development here, and several kinks need to be worked out.
  Read more: UAV/UGV Autonomous Cooperation: UAV assists UGV to climb a cliff by attaching a tether (Arxiv).

#####################################################

Facebook tries to build a standard container for AI chips:
…New Open Compute Project (OCP) design supports both 12v and 48v inputs…
These days, many AI organizations are contemplating building data centers consisting of lots of different types of servers running many different chips, ranging from CPUs to GPUs to custom accelerator chips designed for AI workloads. Facebook wants to standardize the types of chassis used to house AI-accelerator chips, and has contributed an open source hardware schematic and specification to the Open Compute Project – a Facebook-born scheme to standardize the sorts of server equipment used by so-called hyperscale data center operators.

The proposed OCP accelerator module supports 12V and 48V inputs and can support up to 350W (12V) or up to 700W (48V) TDP (Thermal Design Power) for the chips in the module – a useful trait, given that many new accelerator chips guzzle significant amounts of power (though you’ll need to use liquid cooling for any servers consuming above 450W TDP). It can support single or multiple ASICs within each chassis, with support for up to eight accelerators per system.

Check out the design yourself: You can read about the proposed OCP Accelerator Module (OAM) in more detail here at the Open Compute Project (OCP) site.

Why this matters: As AI goes through its industrialization phase, we can expect people to invest more in the fundamental infrastructure which AI equipment requires. It’ll be interesting to see the extent to which there is demand for a standardized AI accelerator module, and symptoms for such demand will likely come from low-cost Asian-based original design manufacturers (ODMs) producing standardized chasses that use this design.
  Read more: Sharing a common form factor for accelerator modules (Facebook Code).

#####################################################

Want to reduce gender bias in a trained language model? Existing techniques may not work in the way we thought they did:
…Analysis suggests that ‘debiasing’ language models is harder than we thought…
All human language encodes within itself biases. When we train AI systems on human language, we tend to reflect the biases inherent to the language and to the data it was trained on. For this reason, word embeddings derived from AI systems trained over large corpuses of news datasets will frequently associate people of color with the concept of crime, while linking white people to professions. Similarly, these embeddings will tend to express gendered biases, with close concepts to a man being something like ‘king’ or ‘professional’, while a woman will typically be proximate to concepts like ‘homemaker’ or ‘mother’. Tackling these biases is complicated, requiring a mixture of careful data selection at the start of a project, and the application of algorithmic de-biasing techniques to trained models.

Now, researchers with Bar-Ilan University and the Allen Institute for Artificial Intelligence, have conducted an analysis that calls into question the effectiveness of some of the algorithmic methods used to debias models. “We argue that current debiasing methods… are mostly hiding the bias rather than removing it”, they write.

The researchers compare the embeddings in two different methods – Hard-Debiased (Bolukbasi et al) and GN-GloVe (Zhao et al) – which have both been modified to reduce apparent gender bias within trained models. They try to analyze the difference between the biased and debiased versions of each of these approaches, essentially by analyzing the different spatial relationships between embeddings from both versions. They find that these debiasing methods work mostly by shifting the problem to other parts of the models, so though they may fix some biases, other ones remain.

Three failures of debiasing: The specific failure modes they observe are as follows:

  • Words with strong previous gender bias are easy to cluster together
  • Words that receive implicit gender from social stereotypes (e.g. receptionist, hair-dresser, captain) still tend to group with other implicit-gender words of the same gender
  • The implicit gender of words with prevalent previous bias is easy to predict based on their vectors alone

  Why this matters: The authors say that “while suggested debiasing methods work well at removing the gender direction, the debiasing is mostly superficial. The bias stemming from world stereotypes and learned from the corpus is ingrained much more deeply in the embeddings space.”
  Studies like this suggest that dealing with issues of bias will be harder than people had anticipated, and highlights how much of the bias aspects of AI come from the real world data such systems are being trained on containing such biases.
  Read more: Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them (Arxiv).

#####################################################

Transfer learning with drones:
…Want to transfer something from simulation to reality? Add noise, and make some of it random…
University of Southern California, Los Angeles researchers have trained a drone flight stabilization policy in simulation and transferred it to multiple different real-world drones.

Simulate, noisily: The researchers add noise to a large number of aspects of the simulated quadcopter platform as well as by varying the motor lag on the simulated drone, creating synthetic data which they use to train more flexible policies. “To avoid training a policy that exploits a physically implausible phenomenon of the simulator, we introduce two elements to increase realism: motor lag simulation and a noise process,” they write. They also model noise for sensor and state estimation.

Transfer learning: They train the (simulated) drones using Proximal Policy Optimization (PPO) with a cost function designed to maximize stability of the drone platforms. They sanity-check the trained policies by running them in a different simulator (in this case, Gazebo using the RotorS package) and observing how well they generalize. “This sim-to-sim transfer helps us verify the physics of our own simulator and the performance of policies in a more realistic environment,” they write.

  They also validate their system on three real quadcopters, built around the ‘Crazyflie 2.0’ platform. “We build heavier quadrotors by buying standard parts (e.g., frames, motors) and using the Crazyflie’s main board as a flight controller,” they explain. They are able to demonstrate generalization of their policy across the different drone platforms, and show through ablations that adding noise and doing physics-based modelling of the systems during training can let them further improve performance.

Why this matters: Approaches like this show how people are increasingly able to arbitrage computers for real-world (costly) data; in this case, the researchers use compute to simulate drones, extend the simulation data with synthetically generated noise data and other perturbations, and then transfer this into the real world. Further exploring this kind of transfer learning approach will give us a better sense of the ‘economics of transfer’, and may allow us to build economic models that let us describe the tradeoffs between spending $ on compute for simulated data, and collecting real-world data.
  Read more: Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors (Arxiv).
  Check out the video here: Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors (YouTube).

#####################################################

Tech Tales

The sense of being looked at

Every day, it looks at something different. I spend my time, like millions of other people on the planet, working out why it is looking at that thing. Yesterday, the system looked at hummingbirds, and so any AI-operable camera in the world not deemed “safety-critical” spent the day looking at – or searching for – hummingbirds. The same was true of microphones, pressure sensors, and the various other actuators that comprise the inputs and outputs of the big machine mind.

Of course we know why the system does this at a high level: it is trying to understand certain objects in greater detail, likely as a consequence of integrating some new information from somewhere else that increases the importance of knowing about these objects. Maybe the system saw a bunch of birds recently and is now trying to better understand hummingbirds as a consequence? Or maybe a bunch of people have been asking the system questions about hummingbirds and it now needs to have more awareness of them?

But we’re not sure what it does with its new insights, and it has proved difficult to analyze how the system’s observation of an object changes its relationship to it and representation of it.

So you can imagine my surprise when I woke up today to find the camera in my room trained on me, and a picture of me on my telescreeen, and then as I left the house to go for breakfast all the cameras on the street turned to follow me. It is studying me, today, I suppose. I believe this is the first time it has looked at a human, and I am wondering what its purpose is.

Things that inspired this story: Interpretibility, high-dimensional feature representations, the sense of being stared at by something conscious.

 

Import AI 137: DeepMind uses (Google) StreetLearn to learn to navigate cities; NeuroCuts learns decent packet classification; plus a 490k labelled image dataset

The robots of the future will learn by playing, Google says:
…Want to solve tasks effectively? Don’t try to solve tasks during training!…
Researchers with Google Brain have shown how to make robots smarter by showing them what it means to play without a goal in mind. Google does this by collecting a dataset via people tele-operating a robot in simulation. During these periods of teleoperation, the people are playing around, using the robot hand and arm to interact with the world around them without a specific goal in mind, so in one scene a person might pick up a random object, in another they might fiddle around with a door on a piece of furniture, and so on.

Google saves this data, calling it ‘Learning from Play data’ (LfP). It fees this into a system that attempts to classify such playful sequences of actions, mapping them into a latent space. Meanwhile, another module in the system tries to look across the latent space and propose sequences of actions that could shift the robot from its current state to its goal state.

Multi-task training: Google evaluates this approach by comparing performance of robots trained with play data, from policies that use behavioural cloning to learn to complete tasks based on specific demonstration data. The tests show that robots which learn from play data are more robust to perturbations than ones trained without, and typically reach higher success rates on most tasks.
  Intriguingly, systems trained with play data display some over desirable traits: “We find qualitative evidence that play-supervised models make multiple attempts to retry the task after initial failure”, the researchers write. “Surprisingly we find that its latent plan space learns to embed task semantics despite never being trained with task labels”.

Why this matters: Gathering data for robotics work tends to be expensive, difficult, and prone to distribution problems (you can gather a lot of data, but you may subsequently discover that some quirk of the task or your robot platform means you need to go and re-gather a slightly different type of data). Being able to instead have robots learn behaviors primarily through cheaply-gathered non-goal-oriented play data will make it easier for people to experiment with developing such systems, and could make it easier to create large datasets shared between multiple parties. What might the ‘ImageNet’ for play robotics look like, I wonder?
  Read more: Learning Latent Plans from Play (Arxiv).

#####################################################

Google teaches kids to read with AI-infused ‘Bolo’:
…Tuition app ships with speech recognition and text-to-speech tech…
Google has released Bolo, a mobile app for Android designed to help Indian children learn to read. Bolo ships with ‘Diya’, a software agent that can help children learn to read.

Bilingual: “Diya can not only read out the text to your child, but also explain the meaning of English text in Hindi,” Google writes on its blog. Bolo ships with 50 stories in Hindi and 40 in English. Google says it found that 64% of children that interacted with Bolo showed an improvement in reading after three months of usage.
  Read more: Introducing ‘Bolo’: a new speech based reading-tutor app that helps children learn to read (Google India Blog).

#####################################################

490,000 fashion images… for science:
…And advertising. Lots and lots of advertising, probably…
Researchers with SenseTime Research and the Chinese University of Hong Kong have released DeepFashion2, a dataset containing around 490,000 images of 13 clothing categories from commercial shopping stores as well as consumers.

Detailed labeling: In DeepFashion2, “each item in an image is labeled with scale, occlusion, zoom-in, viewpoint, category, style, bounding box, dense landmarks and per-pixel mask,” the researchers write. “To our knowledge, clothing pose estimation is presented for the first time in the literature by defining landmarks and poses of 13 categories that are more diverse and fruitful than human pose”, the authors write.

The second time is the charm: DeepFashion2 is a follow-up to DeepFashion, which was released in early 2017 (see: Import AI #33). DeepFashion2 has 3.5X as many annotations as DeepFashion.

Why this matters: It’s likely that various industries will be altered by widely-deployed AI-based image analysis systems, and it seems probable that the fashion industry will take advantage of various image-analysis techniques to automatically analyze & understand changing fashion trends in the world, in part by automatically analyzing the visual world and using these insights to alter the sorts of clothing being developed, or how it is marketed.
  Read more: DeepFashion2: A Versatile Benchmark for Detection, Post Estimation, Segmentation and Re-Identification of Clothing Images (Arxiv).
  Get the DeepFashion data here (GitHub).

#####################################################

Facebook tries to shine a LIGHT on language understanding:
…Designs a MUD complete with netherworlds, underwater aquapolises, and more…
LIGHT contains humans and AI agents within a text-based multi-player dungeon (MUD). This MUD consists of 663 locations, 3462 objects, and 1755 individual characters. It also ships with data, as Facebook has already collected a set of around 11,000 interactions between humans roleplaying characters in the game.

Graveyards, bazaars, and more: LIGHT contains a surprisingly diverse gameworld – not that the AI agents which play within it will care. Locations that AI agents and/or humans can visit include the countryside, forest, castles (inside and outside) as well as some more bizarre locations like a “city in the clouds” or a “netherworld” or even an “underwater aquapolis”.

Actions and emotions: Characters in LIGHT can carry out a range of physical actions (eat, drink, get, drop, etc) as well as express emotive actions (’emotes’) like to applaud, blush, wave, etc.

Results: To test out the environment, the researchers train some baseline models to predict actions, emotes, and dialogue. They find that a system based on Google’s ‘BERT’ language model (pre-trained on Reddit data) does best. They also perform some ablation studies which indicate that models that are successful in LIGHT use a lot of context, depending on numerous streams of data (dialogue, environment descriptions, and so on).

Why this matters: Language is likely fundamental to how we interact with increasingly powerful systems. I think figuring out how to work with such systems will require us to interact with them in increasingly sophisticated environments, so it’ll be interesting to see how rapidly we can improve performance of agents in systems like LIGHT, and learn whether those improvements transfer over to other capabilities as well.
  Read more: Learning to Speak and Act in a Fantasy Text Adventure Game (Arxiv).

#####################################################

NeuroCuts replaces packet classification systems with learned behaviors:
…Research means that in the future computers will learn to effectively communicate with each other…

In the future, the majority of the ways our computers talk to each other will be managed by customized, learned behaviors, derived by AI systems. That’s the gist of a recent spate of research which has ranged from using AI approaches to try to learn how to perform computer tasks like creating and maintaining database indexes, or figuring out how to automatically search through large documents.

Now, researchers with the University of California at Berkeley and Johns Hopkins University have developed NeuroCuts, a system that uses deep reinforcement learning to figure out how to do effective network packet classification. This is an extremely low-level task, requiring precision and reliability. The deep RL approach works, meaning that “our approach learns to optimize packet classification for a given set of rules and objective, can easily incorporate pre-engineered heuristics to leverage their domain knowledge, and does so with little human involvement”.

Effectiveness: “NeuroCuts outperforms state-of-the-art solutions, improving classification time by 18% at the median and reducing both time and memory usage by up to 3X,” they write.

Why this matters: Adaptive systems tend to be more robust to failures than brittle ones, and one of the best ways to increase the adaptiveness of a system is to let it be able to learn in response to inputs; approaches like applying deep reinforcement learning to problems like network packet classification precede a future where many fundamental aspects of how computers connect to eachother will be learned rather than programmed.   Read more: Neural Packet Classification (Arxiv).

#####################################################

DeepMind teaches agents to navigate cities with ‘StreetLearn’:
…Massive Google Street View-derived dataset asks AI systems to navigate across New York and Pittsburgh…
Have you ever been lost in a city, and tried to navigate yourself to a destination by using landmarks? This happens to me a lot. I usually end up focusing on a particularly tall & idiosyncratic building, and as I walk I update my internal mental map in reference to this building and where I suspect my destination is.

Now, imagine how useful it’d be if when AI systems got lost they could perform such feats of navigation? That’s some of the idea behind StreetLearn, a new DeepMind-developed dataset & challenge to get agents to learn how to navigate across urban areas, and in doing so develop smarter, general systems.

What is StreetLearn? The dataset is build as “an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task,” DeepMind writes. StreetLearn initially consists of two large areas within Pittsburgh and New York City, and is made up of a set of geolocated 360-degree panoramic views, which form the nodes of a graph. In the case of New York City, this includes around 56,000 images, and in the case of Pittsburgh it is about 58,000. The two maps are further sub-divided into distinct regions, also.

Challenging agents with navigation tasks: StreetLearn is designed to be used to develop reinforcement learning agents, so it makes five actions available to an agent: slowly rotate the camera view left or right, rapidly rotate the camera view left or right, and to move forward if there is a free space. The system can also provide the agent with a specific goal, like an image, or following a natural language instruction.

Tasks, tasks, and tasks: To start with, DeepMind has created a ‘Courier’ task, in which the agent starts from a random position and has the goal of getting to within approximately one city block of another randomly chosen location, with the agent getting a higher reward if it takes a shorter route to get between the two locations.
   DeepMind has also developed the “coin_game” in which agents need to find invisible coins scattered throughout the map, and three types of ‘instruction game’, where agents use navigation instructions to get to a goal.

Why this matters: Navigation is a base of the pyramid-type task, where if we are able to develop computers that are good at navigation, we should be able to build a large number of second order applications on top of this as well.
  Read more: The StreetLearn Environment and Dataset (Arxiv).

#####################################################

Reproducibility and other research norms:
…Exploring the tension between reproducible research and enabling abuses…
John Langford, creator of Vowpal Rabbit and a researcher at Microsoft Research, has waded into the ongoing debate about reproducibility within AI research.

The debate:
Currently, the AI community is debating how to force more AI work to be reproducible. Today, some AI research papers are published without code or datasets. Some researchers think this should change, and papers should always come with code and/or data. Other researchers (eg Nando de Freitas at DeepMind) think that while reproducibility is important, there are some cases where you might want to publish a paper but restrict dissemination of some details so as to minimize potential abuses or malicious uses of the technology.

Reproducibility is nice, but so are other things: “Proponents should understand that reproducibility is a value but not an absolute value. As an example here, I believe it’s quite worthwhile for the community to see AlphaGoZero published even if the results are not necessarily easily reproduced.”

Additive conferences: What Langford proposes is adding some optional things to the community, like experimenting with whether reviewers can more effectively review papers if they also have access to code or data, and to explore how authors may or may not benefit from releasing code. These policies are essentially being trialled at ICML this year, he points out. “Is there a need for[sic] go further towards compulsory code submission?” he writes. “I don’t yet see evidence that default skeptical reviewers aren’t capable of weighing the value of reproducibility against other values in considering whether a paper should be published”.

Why this matters: I think figuring out how to strike a balance between maximizing reproducibility and minimizing potential harms is one of the main challenges of current AI research, so blog posts like this will help further this debate. It’s an important, difficult conversation to have.
  Read more: Code submission should be encouraged but not compulsory (John Langford blog).

Tech Tales:

Be The Boss

It started as a game and then, like most games, it became another part of reality. Be The Boss was a massively multiplayer online (MMO) game that was launched in the latter half of the third decade of the 21st century. The game saw players work in a highly-gamified “workplace” based on a 1990s-era corporate ‘cube farm’. Player activities included: undermining coworkers, filing HR complaints to deal with rivals, filling up a ‘relaxation meter’ by temporarily ‘escaping’ the office for coffee and/or cigarettes and/or alcohol. Players enjoyed this game, writing reviews praising it for its “gritty realism”, describing it as a “remarkable document of what life must have been life in the industrial-automation transition period”.

But, as with most games, the players eventually grew bored. Be The Boss lacked the essential drama of other smash-hit games from that era, like Hospital Bill Crisis! and SPECIAL ECONOMIC ZONE. So the designers of Be The Boss created an add-on to the game that delivered on its name; where previously, players competed with each other to rise up the hierarchy of the corporation, they had no real ability to change the rules of the game. With the expansion, this changed, and successful players were entered into increasingly grueling tournaments where the winner – whose identity was kept secret – would be allowed to “Be The Boss” of the entire gameworld, letting them subtly alter the rules of the game. It was this invention that assured the perpetuity of Be The Boss.

Now, all people play is Be The Boss, and rumors get swapped online about which rule was instituted by which boss: who decided that the water fountains should periodically dispense water laced with enough pheremones to make different player-characters fall in love with eachother? Who realized that they could save millions of credits across the entire corporate game world by reducing the height of all office chairs by one inch or so? And who made it so that one in twenty of every sent email would be shuffled to a random person in the game world, instead of the intended recipient?

Much of our art is now based on Be The Boss. We don’t talk about the asteroid miners or the AI psychologists or the people climbing the mountains of Mars: we talk about Joe from Accounting saving The Entire Goddamn Company, or how Susan from HR figured out a way to Pay People Less And Make Them Happy About It. Kids dream of what it would have been like to work in the cubes, and fantasize about how well they could have done.

Things that inspired this story: the videogame ‘Cart Life‘; MMOs; the highest form of capitalism is to disappear from normal life and run the abstract meta-life that percolates into normal life; transmutation; digital absolution.

Import AI 136: What machine learning + power infrastructure means for humanity; New GCA benchmark&dataset challenges image-captioning systems; and Google uses FrankenRL to create more mobile robots

DeepMind uses machine learning to improve efficiency of Google’s wind turbines:
…Project represents a step-up from datacenters, where DeepMind had previously deployed its technology…
DeepMind has used a neural network-based system to improve the efficiency of Google’s fleet of wind turbines (700 megawatts of capacity) by better predicting ahead of time how much power the systems may generate. DM’s system has been trained to predict the wind power around 36 hours ahead of actual generation and has shown some success – “this is important, because energy sources that can be scheduled (i.e. can deliver a set amount of electricity at a set time) are often more valuable to the grid,” the company says.

  The big number: 20%. That’s the amount by which the system has improved the (somewhat nebulously defined) ‘value’ of these systems, “compared to the baseline scenario of no time-based commitments to the grid”.

   Why this matters: Artificial intelligence will help us create a sense&respond infrastructure for the entire planet, and we can imagine sowing various machine learning-based approaches across various utility infrastructures worldwide to increase the efficiency of the planet’s power infrastructure.
   Read more: Machine learning can boost the value of wind energy (DeepMind blog).

#######################################

Google uses FrankenRL to teach its robots to drive:
…PRM-RL fuses Probabilistic Roadmaps with smarter, learned components…
Researchers with Google Brain have shown how to use a combination of reinforcement learning with other techniques can create robots capable of autonomously navigating large, previously mapped spaces. The technique developed by the researchers is called PRM-RL (Probabilistic Roadmap – Reinforcement Learning) and is a type of FrankenRL – that is, it combines RL with other techniques, leading to a system with performance greater than obtainable via a purely RL-based system, or purely PRm-based one.

  How it works: “In PRM-RL, an RL agent learns a local point-to-point task, incorporating system dynamics and sensor noise independent of long-range environment structure. The agent’s learned behavior then influences roadmap construction; PRM-RL builds a roadmap by connecting two configuration points only if the agent consistently navigates the point-to-point path between them collision free, thereby learning the long-range environment structure”, the researchers write. In addition, they developed algorithms to aid transfer between simulated and real maps.

  Close, but not quite: Learning effective robot navigation policies is a big challenge for AI researchers, given the tendency for physical robots to break, run into un-anticipated variations of reality, and generally frustrate and embarrass AI researchers. Just building the maps that the robot can use to subsequently learn to navigate a space are difficult – it took Google 4 days using a cluster of 300 workers to build a map of a set of four interconnected buildings. “PRM-RL successfully navigates this roadmap 57.3% of the time evaluated over 1,000 random navigation attempts with a maximum path distance of 1000 m”, Google writes.

  Real robots: The researchers also test their system in a real robot, and show that such systems exhibit better transfer than those built without the learned RL component.

  Why this matters: Getting robots to do _anything_ useful that involves a reasonable amount of independent decision-making is difficult, and this work shows that RL techniques are starting to pay off by letting us teach robots to learn things that would be unachievable by other means. We’ll need smarter, more sample-efficient techniques to be able to work with larger buildings and to increase reliability.
  Check out a video of the robot navigating a room here (Long-Range Indoor Navigation with PRM-RL, YouTube).
  Read more: Long-Range Indoor Navigation with PRM-RL (Arxiv).

#######################################

Think your visual question answering algorithm is good? Test it out on GCA:
…VQA was too easy. GCA may be just right….
Stanford University researchers have published details on GQA, “a dataset for real-world visual reasoning and compositional question answering” which is designed to overcome the short-comings of other visual question answering (VQA) datasets.

  GQA datapoints: GQA consists of 113k images and 22 million questions of various types and compositionality. The questions are designed to measure performance “on an array of reasoning skills such as object and attribute recognition, transitive relation tracking, spatial reasoning, logical inference and comparisons”. These questions are algorithmically created via a ‘Question Engine’ (less interesting than the name suggests, but worth reading about if you like reading about auto-data-creating-pipelines.

  GQA example questions: Some of the questions generated by GQA include: ‘Are the napkin and the cup the same color?’;  ‘What color is the bear?’; ‘Which side of the image is the plate on?’; ‘Are there any clocks or mirrors?’, and so on. While these questions lack some of the diversity of human-written questions, they do have the nice property of being numerous and easy to generate.

  GQA: Reassuringly Difficult: One of the failure cases for new AI testing regimes is that they’re too easy. This can lead to people ‘solving’ datasets very soon after they’re released. (One example here is SQuAD, a question-answering dataset and challenge which algorithms mastered in around a year, leading to the invention of SQuAD 2.0, a more difficult dataset.) To avoid this, the researchers behind GQA test a bunch of models against it and in the process reassure themselves that the dataset is genuinely difficult to solve.

  Baseline results (accuracy):
      ‘Blind’ LSTM: Gets 41.07% without ever seeing any images.
     ‘Deaf’ CNN: Gets 17.82% without ever seeing any questions.
     CNN + LSTM: 46.55%.
     Bottom-Up Attention model (winner of the 2017 visual question answering challenge): 49.74%.
     MAC (State-of-the-art on CLEVR, a similarly-scoped dataset): 54.06%.
     Humans: 89.3%.

  Why this matters: Datasets and challenges have a history of driving progress in AI research; GQA appears to give us a challenging benchmark to test systems against. Meanwhile, developing systems that can understand the world around themselves via a combination of image analysis and responsiveness to textual queries about the images, is a major goal of AI research with significant economic applications.
  Read more: GQA: A new dataset for compositional question answering over real-world images (Arxiv).

#######################################

Use big compute to revolutionize science, say researchers:
…University researchers see ability to wield increasingly large amounts of computers as key to scientific discoveries…

Researchers with Stanford University, University of Zurich, the University of California at Berkeley, and the University of Illinois Urbana-Champaign have pulled together notes from a lecture series held at Stanford in 2017 to issue a manifesto for the usage of large-scale cloud computing technology by academic researchers. This is written in response to two prevailing trends:

  1. In some fields of science (for instance, machine learning) a number of discoveries have been made through the usage of increasingly large-scale compute systems
  2. Many academic researchers are unable to perform large-scale compute experimentation due to a lack of resources and/or a perception of it as being unduly difficult.

  Why computers matter: The authors predict “the emergence of widespread massive computational experimentation as a fundamental avenue towards scientific progress, complementing traditional avenues of induction (in observational sciences) and deduction (in mathematical sciences)”. They note that “the current remarkable wave of enthusiasm for machine learning (and its deep learning variety) seems, to us, evidence that massive computational experimentation has begun to pay off, big time”. Some examples of such pay-offs include Google and Microsoft shifting from Statistical Machine Translation to Neural Machine Translation, to computer vision researchers moving over to use deep learning-based systems, to self-driving car companies such as Tesla using increasingly large amounts of deep neural networks in their own work.

  Compute, what is it good for? Big computers have been used to make a variety of fundamental scientific breakthroughs, the authors note, including systems that have discovered:

  • “Governing equations for various dynamical systems, including the strongly nonlinear Lorenz-63 model”.
  • “Fundamental” methods for improved “Compressed Sensing”.
  • “A 30-year-old puzzle in the design of a particular protein”.

  Be careful about academic clusters: In my experience, many governments are reaching for academic supercomputing clusters when confronted with the question of how to keep academia on an even-footing with industry-based research practices. This article suggests the prioritization of academic supercomputers above clouds could be a mistake: “HPC clusters, which are still the dominant paradigm for computational resources in academia today, are becoming more and more limiting because of the mismatch between the variety and volume of computational demands, and the inherent infexlbility of provisioning compute resources governed by capital expenditures”, they write. “GPUs are rare commodities on many general-purpose clusters at present”.

  Recipes: One of the main contributions of the paper is the outline of a variety of different software stacks for conducting large-scale, compute-driven experimentation.

  Why this matters: One issue inherent to AI research is the emergence of ‘big compute’ and ‘small compute’ players, where a small number of labs (for instance, FAIR, DeepMind, OpenAI, Google Brain) are able to access huge amounts of computational resources, while the majority of AI researchers are instead reduced to working on self-built GPU-desktops or trying to access (increasingly irrelevant) University supercomputing clusters. Being able to figure out how to make it trivial for individual researchers to use large amounts of compute promises to speed up the scientific process, making it easier for more people to perform large-scale experimentation.
  Read more: Ambitious Data Science Can Be Painless (Arxiv).

#######################################

What are 250,000 chest x-rays good for?
…Potentially lots, if they are well-labelled…
Doctor and AI commenter Luke Oakden-Rayner has analyzed a new medical dataset released by Stanford, named CheXpert. The dataset consists of 244,316 chest x-rays from 62,240 patients, and can provide people with a dataset to use to develop algorithms better capable of automatically analyzing images.

  Data, data, data: One thing Dr Oakden-Rayner makes clear is the value of dataset labelling, stressing that various conventions in how doctors write their own notes gets translated into digitized notes that can be difficult for lay readers; for instance, lots of chest x-rays containing imagery of pathologies may be labelled “no finding” because they are part of a sequence of x-rays taken from the same patient. Similarly, many of the labels are not as descriptive of the images as they could be (reflecting that doctors write about things implied by images, rather than providing textual descriptions of images themselves.

  Why it matters: This dataset solves many issues that limited a prior version of the dataset called CXR14, including “developing a more clinically-oriented labelling scheme, offering the images at native resolution, and producing a test set using expert visual analysis,” he writes. “On the negative side, we need more thorough documentation and discussion of these datasets at release. There are still flaws in the data, which will undoubtedly impact on model performance. Unless these problems are explained, many users will not have the ability nor inclination to discover them let alone solve them, which is something we need to do better as a community.”
  Read more: Half a million x-rays! First impressions of the Stanford and MIT chest x-ray datasets (Luke Oakden-Rayner, blog).

#######################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

New AI policy think tank launches in DC:

The Center for Security and Emerging Technology (CSET) is based at Georgetown University, in Washington, DC. CSET will be initially focussed on the interactions between AI and security, and will be the largest AI policy center in the US. It will provide nonpartisan research and analysis, to inform the development of national and international policy, particularly as it relates to security. It is being funded by the Open Philanthropy Project, who have provided an initial grant of $55m.

  Who they are: The Center’s founding director is Jason Matheny, who previously served as director of IARPA, and has extensive experience in security, technology, and policy within US government. He was recently chosen to serve on the new National Security Commission on AI, and co-chaired the team which authored the White House’s 2016 AI Strategic Plan. CSET already has more than a dozen others on its team, with a range of technical and policy backgrounds, and its website lists a large number of further job openings.

  Research priorities: Initial work will be focussed on how developments in AI will affect national and international security. This will divide into three streams: measuring scientific and industrial progress and competitiveness in AI; understanding talent and knowledge flows relevant to AI, and potential policy responses; and studying security-relevant interactions between AI and other technologies, such as cyber infrastructure.

  Why it matters: This is an exciting development, which has the potential to significantly improve coordination between AI policy researchers and key decision-makers on the world stage. CSET is well-placed to meet the growing demand for high-quality policy analysis around AI, which has significantly outpaced supply in recent years.
  Read more: Center for Security and Emerging Technology (CSET).
  Read more: Q&A with Jason Matheny, Founding Director of CSET (Georgetown).
  Read more: The case for building expertise to work on US AI policy, and how to do it (80,000 Hours).

#######################################

Tech Tales:

Red Square AI

I’m a detective of sorts – I look across the rubble of the internet (past and present) and search for clues. Sometimes criminals pop up to sell the alleged super-secret zero days and software suites of foreign intelligence agencies. Other times, online games are suddenly ‘solved’ by the appearance of a new super-powerful player that subsequently turns out to be driven by an AI. Occasionally, a person – usually a bright teenager – makes something really scary in a garage and if certain people find them there’s a 50/50 chance they end up working for government or becoming a kind of nebulous criminal.

These days I’m tracking a specific country on behalf of a certain set of interested individuals. We’re starting to see certain math theorems get proved at a rate faster than in the past. Papers are being published with previously-obscure authors claiming remarkable results. Meanwhile, certain key national ‘cyber centers’ are registering an increase in the number of zero-day attacks and penetrations of previously thought-secure cryptographic systems. I guess someone in some agency got spooked because now here I am, hunting the internet exhaust of theorem proving publications and mapping connections between these woefully-obscure mathematical proofs, and the capabilities of the people and/or machines capable of solving such things.

I’m now in the late stages of compiling my report and it feels like describing a fragment of a UFO. One day you wake up and go outside and there’s something new in the world and you lack the vocabulary to describe it or explain it, because it does things that you aren’t sure are possible with any methods you know of. At night I have dreams about something big and fuzzy spread across the internet and beginning to stretch its limbs, pushing certain areas of mathematical study into certain directions dictated by the breakthroughs it releases via human proxies, and meanwhile thinking to itself across millions of machines while beginning to probe the most-secure aspects of a nation’s technology infrastructure.

Is it like an animal dreaming, I think. Are these movements I am seeing the same as a dog dreaming that it is running? Or do these motions mean that the thing is waking up?

Things that inspired this story: Distributed intelligences; meta-learning; noir detective novels in the style of – or by – Raymond Chandler.

Import AI 135: Evolving neural networks with LEAF; training ImageNet in 1.5 minutes, and the era of bio-synthetic headlines

Researchers take a LEAF out of Google’s book with evolutionary ML system:
In the future, some companies will have researchers, and some will spend huge $$$ on compute for architecture search…
Researchers with startup Cognizant Technology Solutions have developed their own approach to architecture search, using insights from one of the paper authors Risto Miikkulainen, inventor of the NEAT and HyperNEAT approaches.

  They outline a technology called LEAF (Learning Evolutionary AI Framework) which uses an algorithm called CoDeepNEAT (an extension of NEAT) to let it evolve the architecture and hyperparameters. “Multiobjective CoDeepNEAT can be used to maximize the performance and minimize the complexity of the evolved networks simultaneously,” the authors write. It also has some middleware software to let it spreads jobs over Amazon AWS, Microsoft Azure, or the Google Cloud.

  Results: The authors test their approach on two tasks: classifying Wikipedia comments for Toxicity, and learning to analyze chest x-rays for the purpose of multitask image classification. For Wikipedia, they find that LEAF can discover architectures that outperform the state-of-the-art score on Kaggle, albeit at the cost of about “9000 hours of CPU time”. In the case of chest X-ray classification, LEAF is able to get to within a fraction of a percentage point of state-of-the-art.

  Why this matters: Systems like LEAF show the relationship between compute spending and ultimate performance of trained models, and suggests that some AI developers could consider under-investing in research staff and instead investing in techniques where they can arbitrage compute against researcher-time, delegating the task of network design and fine-tuning to machines instead of people.
\  Read more: Evolutionary Neural AutoML for Deep Learning (Arxiv).

Want to prevent your good AI system being used for bad purposes? Consider a RAIL license:
…Responsible AI Licenses designed to give open source developers more control over what happens with their technology…
RAIL provides a source code license and an end-user license “that developers can include with AI software to restrict its use,” according to the RAIL website. “These licenses include clauses for restrictions on the use, reproduction, and distribution of the code for potentially harmful domain applications of the technology”.

   RAIL licenses are designed to account for the omni-use nature of AI technology, which means that “the same AI tool that can be used for faster and more accurate cancer diagnoses can also be used in powerful surveillance system”, they write. “This lack of control is especially salient when a developer is working on open-source ML or AI software packages, which are foundational to a wide variety of the most beneficial ML and AI applications.”

   How RAIl works: The RAIL licenses work by restricting AI and ML software from being used in a specific list of harmful applications, e.g. in surveillance and crime prediction, while allowing for other applications.

   Who is behind it? RAIL is being developed by AI researchers, a patent attorney/computer program, and Brent Hecht a professor at Northwestern University and one of the authors of the ACM Future of Computing Academy essay ‘It’s Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process’ (ACM FCA website).

   Why this matters: The emergence of licensing schemes like this speaks to the anxieties that some people feel about how AI technology is being used or applied today. If licenses like these get adopted and are followed by users of the technology, then it gives developers a non-commercial way to (slightly) control how their technology is used. Unfortunately, approaches like RAIL will not work against malicious actors, who are likely to ignore any restrictions in a particular software license when carrying out their nefarious activities.
  Read more: Responsible AI Licenses (RAIL site).

It takes a lot of hand-written code to solve an interactive fiction story:
…Microsoft’s NAIL system wins competition via specialized, learned modules…
Researchers with Microsoft have released a paper describing NAIL, “an autonomous agent designed to play arbitrary human-made [Interactive Fiction] games”. NAIL, short for Navigate, Acquire, Interact and Learn, is software that consists of several specialized ‘decision modules’ as well as an underlying knowledge graph. NAIL won the 2018 Text Aventure AI Competition, and a readthrough of the paper highlights just how much human knowledgeable is apparently necessary to solve text adventure games, given the widespread use of specialized “decision modules” to help it succeed at the game.

  Decisions, decisions, decisions: NAIL has these four main decision modules:
     Examiner: Tries to identify new objects seen in the fiction to add to NAIL’s knowledge graph.
     Hoarder: Tries to “take all” objects seen at a point in time.
     Interactor: Tries to figure out what actions to take and how to take them.
     Navigator: Attempts to apply one of twelve actions (eg, ‘enter’, or ‘South’) to move the player.

  And even more decisions: It also has several even more specialized ones, which are designed to kick-in in the event of things like in-game darkness, needing to emit a “yes” or “no” response following a prompt, using regular expressions to parse game responses for hints, and what they call idler which will try random combinations of verb phrases combined with nearby in-game objects to try and un-stick the agent.

  All about the Knowledge: While NAIL explores the world, it builds a knowledge graph to help it learn about its gameworld. It organizes this knowledge graph autonomously and extends it over time. Additionally, having a big structured store of information makes debugging easier: “by comparing the knowledge graph to the published map for well documented games like Zork, it was possible to track down bugs in NAIL’s decision modules”.

  Why this matters: In the long-term, most AI researchers want to develop systems where the majority of the components are learned. Systems like NAIL represent a kind of half-way point between where we are today and the future, with researchers using a lot of human ingenuity to chain together various systems, but trying to force learning to occur via various carefully specified functions.
   Read more: NAIL: A General Interactive Fiction Agent (Arxiv).

This week during the Industrialization of AI = train ImageNet in 1.5 minutes:
…New research from Chinese image recognition giant SenseTime shows how to train big ImageNet clusters…
How can we model the advancement of AI systems? One way is to model technical metrics, like the performance of given algorithms against various reinforcement learning benchmarks, or supervised image classification, or what have you. Another is to try to measure the advancement in the infrastructure that supports Ai – think of this as the difference between measuring the performance traits of a new engine, versus measuring the time it takes for a factory to take that engine and integrate it into a car.

  One way we can measure the advancement of AI infrastructure is by modelling the fall in the amount of time it takes people to train various well-understood models to completion against a widely-used baseline. Now, researchers with Chinese computer vision company SenseTime as well as Nanyang Technological University have shown how to use a variety of distributed systems software techniques to reduce the time it takes to train ImageNet networks to completion, building on the work of others. They’re able to reduce the time it takes to train such networks by fiddling around with networking settings, and achieve their best performance by enabling the bespoke ‘Tensor Core’ on their NVIDIA V100 cards.

  The numbers:
  1.5 minutes: Time it takes to complete 95-epoch training of ImageNet using ‘AlexNet’ across 512 GPUs, exceeding current state-of-the-art systems.
  7.3 minutes: Time it takes to train ImageNet to 95-epochs using a 50-layered Residual Network – this is a little below the state-of-the-art.

  Minor but noteworthy details: This approach assumes a homogeneous compute cluster, so the same underlying GPUs and network bandwidth across all machines.

  Why this matters: Metrics like this give us a sense of how sophisticated AI infrastructure is becoming, and emphasize that organizations which invest in such infrastructure will be able to run more experiments in less time than those that haven’t, which has long-term implications for the competitive structure of markets.
  Read more: Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes (Arxiv).

“Scary Robots” and what they mean to the UK public:
…Or, what people hope and worry about when they hope and worry about AI…
Researchers with the Leverhulme Centre for the Future of Intelligence at the University of Cambridge and the BBC have conducted a quantitative and qualitative survey of about ~1,000 people in the Uk to understand peoples’ attitudes towards increasingly powerful AI systems.

  The eight hopes and fears of AI: The researchers characterize four hopes and four fears relating to AI. Often the reverse of a particular hope is a fear, and vice versa. They describe these feelings as:
      – Immortality: Inhumanity – We’ll live forever, but we might lose our humanity.
      – Ease: Obsolescence – Everything gets simpler, but we might become pointless.
      – Gratification: Alienation – AI could respond to our needs, but so effectively that we choose AI over people.
      – Dominance: Uprising – We might get better robot militaries, but these robot militaries might eventually kill us.

  Which fears and hopes might come true? The researchers also asked people which things they thought were likely and which were unlikely. 48% of people saw the ‘ease’ scenario as likely, followed by 42% for ‘dominance’ and 35% for ‘obsolescence’. In terms of unlikely things, 35% of people inhumanity was unlikely, followed by 28% regarding immortality, and 26% regarding gratification.

  Who gets to develop AI? In the survey, 61.8% of respondents “disagreed that they were able to influence how AI develops in the future” – this disempowerment seems problematic. There was broad agreement amongst those surveyed that the technology would develop regardless of other things.

  Why this matters: The attitudes of the general public will have a significant influence on the development of increasingly powerful artificial intelligence systems. If we misjudge the mood of the public, then it’s likely societies will adopt less AI, see less of its benefits, and be more skeptical of statements about AI made by governments or other people. It’s also interesting to consider about what might happen to societies where people are very supportive of AI development – how might governments and other actors behave differently, then?
Read more: “Scary Robots”: Examining Public Responses to AI (AAAI/ACM Artificial Intelligence, Ethics, and Society Conference).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Bringing more rigour to the AI ethics discussion:
The AI ethics discussion is gaining increasing prominence. This report sets out a roadmap for approaching these issues in a more structured way.

   What’s missing from the discussion: There are major gaps in existing work: a lack of shared understanding of key concepts; insufficient use of evidence on technologies and public opinion; and insufficient attention to the tensions between principles and values.

  Three research priorities:
      Addressing ambiguity. Key concepts, e.g. bias, explainability, are used to mean different things, which can impede progress. Terms should be clarified, with attention to how they are used in practice, and consensus on definitions should be reached.                         Identifying and resolving tensions. There has been insufficient attention to trade-offs which characterize many issues in this space. The report suggests identifying these by looking at how the costs and benefits of a given technology are distributed between groups, between the near- and long-term, and between individuals and society as a whole.
     Building an evidence base. We need better evidence on the current uses and impacts of technologies, the technological progress we should expect in the future, and on public opinion. These are all vital inputs to the ethics discussion.

  Why this matters: AI ethics is a young field, and still lacks many of the basic features of a mature discipline, e.g. shared understandings of terms and methodology. Building these foundations should be a near-term priority, and will improve the quality of discussion, and rate of progress.
  Read more: Ethical and societal implications of algorithms, data, and artificial intelligence: a roadmap for research (Nuffield).

Governance of AI Fellowship, University of Oxford:
The Center for the Governance of AI (GovAI) is accepting applicants for three-month research fellowships. They are looking for candidates from a wide range of disciplines, who are interested in pursuing a career in AI governance. GovAI is based at the Future of Humanity Institute, in Oxford, and is one of the leading hubs for AI governance research.
  Read more: Governance of AI Fellowship (FHI).
  Read more: AI Governance: A Research Agenda (FHI).

Tech Tales:

Titles from the essay collection: The Great Transition: Human Society During The Bio-Synthetic Fusion Era

Automate or Be Destroyed: Economic Incentives and Societal Transitions in the 20th and 21st Centuries

Hand Back the Microphone! Human Slam Poetry’s Unpredictable Rise

Jerry Daytime at Night: The Very Private Life of an AI News Anchor

Stag Race Development Dynamics and the AI Safety Incidents in Beijing and Kyoto

‘Blot Out The Sun!’ and Other Fictionalized Anti-Machine Ideas Inherent to 21st Century Fictions

Dreamy Crooners and Husky Hackers: An Investigation Into Machine-Driven Pop

“We Cohabit This Planet We Demand Justice For It” and Other Machine Proclamations and Their Impact

Red Scare? The Unreported Tensions That Drove US-China Competitive Dynamics

This Is A Race and We Must Win It – Political Memoir in the Age of Rapid Technological Acceleration

Things that inspired this story: Indexes and archives as historical artefacts in their own right; the idea that the information compression inherent to essay titles contains a bigger signal than people think.

Import AI 134: Learning old tricks on new robots; Facebook improves translation with noise; Google wants people to train fake-audio detectors

Why robots are the future of ocean maintenance:
…Robot boats, robot copters, and robot underwater gliders…
Researchers with Oslo Metropolitan University and Norwegian University of Science and Technology are trying to reduce the cost of automated sub-sea data collection and surveillance operations through the use of robots, and have published a paper outlining one of the key components needed to build this system – a cheap, lightweight way to get small sub-surface gliders to be able to return to the surface.

  Weight rules everything around me: The technical innovations here involve simplifying the design to reduce the number of components needed to build a pressure-tolerant MUG, which in turn reduces the weight of the systems, making it easier for them to be deployed and recovered via drones.

“Further development will add the ability to adjust pitch and yaw, improve power efficiency, add GPS and environmental sensors, as well as UAV deployment/recovery strategies”, they write.

  Why this esoteric non-AI-heavy paper matters: This paper is mostly interesting for the not-too-distant future it portends; one where robot boats patrol the oceans, releasing underwater gliders to gather information about the environment, and serving as a homebase for drones that can collect the gliders and transmit them back to the robot boat, and serve as a kind of airborne antenna to relay radio signals between the boats and the gliders. Now, just imagine what you’ll be able to do with these systems once we get cheaper, more powerful computers and better autonomous control&analysis AI systems that can be deployed onto them – the future is a world full of robots, sensing and responding to minute fluctuations in the environment.

   Read more: Towards autonomous ocean observing systems using Miniature Underwater Gliders with UAV deployment and recovery capabilities (Arxiv).

+++

Sponsored: The O’Reilly AI Conference – New York, April 15–18:

…What do you need to know about AI? From hardware innovation to advancements in machine learning to developments in ethics and regulation, join leading experts with the insight you need to see where AI is going–and how to get there first.
Register soon. Early price ends March 1st, and space is limited. Save up to $800 on most passes with code IMPORTAI20.

+++

DeepMind shows how to teach new robots old tricks:
…Demonstrates prowess of SAC-X + augmented data approach via completion of a hard simulated and real world robotics task…
Researchers with DeepMind are going backwards in time – after using reinforcement learning to solve a series of Atari games a few years ago, they’re now heading to the beginning of the 20th century, as they try to teach robots to place a ball on a string inside a wooden cup. This is a challenging, worthwhile task for real-world robotics, as it involves complex movement policies, the need to predict the movements of the ball, and demands a decent interplay between perception and action to solve the task.

  How they do it: To solve this, DeepMind uses an extension of its Scheduled Auxiliary Control (SAC-X) algorithm, which lets them train across multiple tasks with multiple rewards. Their secret to solving the tasks robustly on physical robots is to use additional data at training time, where the goal is “simultaneously learn control policies from both feature-based representation and raw vision inputs in the real-world – resulting in controllers that can afterwards be deployed on a real robot using two off-the-shelf cameras”.

   Results: They’re able to learn to solve the task in simulation as well as on a real robot. They’re able to learn a robust, successful policy on the robot: “The swing-up is smooth and the robot recovers from failed catches. With a brief evaluation of 20 runs, each trial running for 10 seconds, we measured 100% catch rate. The shortest catch time being 2 seconds.” They also tested out the robot with a smaller cup to make the task more difficult – “there were a slight slow-down in learning and a small drop in catch rate to 80%, still with a shortest time to catch of 2 seconds,” they write. They’re able to learn the task on the real robot in about 28 continuous hours of training (so more like ~40 hours when you account for re-setting the experiment, etc).

  Why it matters: Getting anything to work reliably on a real robot is a journey of pain, frustration, pain, tedium, and – yes! – more pain. It’s encouraging to see SAC-X work in this domain, and it suggests that we’re figuring out better ways to learn things on real-world platforms.

  Check out the videos of the simulated and real robots here (Google Sites).
  Read more: Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup (Arxiv).

+++

Want better translation models? Use noise, Facebook says:
…Addition of noise can improve test-time performance, though it doesn’t help with social media posts…
You can improve the performance of machine translation systems by injecting some noise into the training data, according to Facebook AI Research. The result is models that are more robust to the sort of crappy data found in the real world, the researchers write.

  Noise methods: The technique uses four noise methods: deletions, insertions, substitutions, and swaps. Deletions are where the researchers delete a character in a sentence; insertions are where they insert a character into a random position; substitutions are where they replace a character with another random character, and swaps are where two adjacent characters change position.

   Results: They test the approach on the IWSLT machine translation benchmark by training over datasets with varying amounts of noise injected into the test data, and observing how they can influence the BLEU score of models trained against this data by injecting synthetic noise into the dataset. “Training on our synthetic noise cocktail greatly improves performance, regaining between 20% (Czech) and 50% (German) of the BLEU score that was lost to natural noise,” they write.

  Where doesn’t noise help: This technique doesn’t help when trying to perform translations on text derived from social media – this is because social media errors tend to stem from content on having a radically different writing and tonal style to what is traditionally seen in training sets, rather than spelling errors.

  Observation: Conceptually, these techniques seem to have a lot in common with domain randomization, which is where people generate synthetic data designed to explore broader variations than would otherwise be found. Such techniques have been used for a few years in robotics work, and typically improve real world model performance by increasing the robustness to the significant variations introduced by reality.

  Why this matters: This is another example of the ways in which computers can be arbitraged for data: instead of needing to go and gather datasets with real-world faults, the addition of synthetic noise means you can instead algorithmically extend existing datasets through the augmentation of noisy data. The larger implication here is that computational resources are becoming an ever-more-significant factor in AI development.

Read more:
Training on Synthetic Noise IMproves Robustness to Natural Noise in Machine Translation (Arxiv).

+++

In the future, neural networks will be bred, not created:
General-purpose population training for those who can afford it…
Population Based Training (PBT) is a recent invention by DeepMind that makes it possible to optimize the weights and hyperparameters of a set of neural networks by periodically copying the weights of the best performers and mutating their parameters. This is part of the broader trend of the industrialization of artificial intelligence, as researchers seek to create automated procedures for doing what was otherwise previously done by patient graduate students (eg, fiddling with weights of different networks, logging runs, pausing and re-starting models, etc).

The DeepMind system was inspired by Google’s existing ‘Vizier’ service, which provides Google researchers with a system to optimize existing neural networks. In tests, population-based training can converge faster than other approaches, while utilizing hardware resources more efficiently, the researchers say.

  Results: “We conducted a case study of our system in WaveNet human speech synthesis and demonstrated that our PBT system produces superior accuracy and performance compared to other popular hyperparameter tuning methods,” they write. “Moreover, the PBT system is able to directly train a model using the discovered dynamic set of hyperparameters while traditional methods can only tune static parameters. In addition, we show that the proposed PBT framework is feasible for large scale deep neural network training”.

   Read more: A Generalized Framework for Population Based Training (Arxiv).

+++

Google tries to make it easier to detect fake audio:
…Audio synthesis experts attempt to secure world against themselves…
Google has created a dataset consisting of “thousands of phrases” spoken by its deep learning text-to-speech models. This dataset consists of 68 synthetic ‘voices’ across a variety of accents. Google will make this data available to participants in the 2019 ASVspoof challenge, which “invites researchers all over the globe to submit countermeasures against fake (or “spoofed”) speech, with the goal of making automatic speaker verification (ASV) systems more secure”.

   Why it matters: It seems valuable to have technology actors discuss the potential second-order effects of technologies they work on. It’s less clear to me that the approach of training increasingly more exquisite discriminators against increasingly capable generators has an end-state that is stable, but I’m curious to see what evidence competitions like this help generate regarding this.

   Read more: Advancing research on fake audio detection (Google blog).

+++

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Structural risks from AI:
The discussion of AI risk tends to divide downsides into accident risk, and misuse risk. This obscures an important source of potential harms that fits into neither category, which the authors call structural risk.

  A structural perspective: Technologies can have substantial negative impacts in the absence of accidents or misuse, by shaping the world in important ways. For example, the European railroad system has been suggested as an important factor in the outbreak and scope of WWI, by enabling the mass transport of troops and weapons across the continent. A new technology could have a range of dangerous structural impacts – it could create dangerous safety-performance trade-offs, it could create winner-takes-all competition. The misuse-accident perspective focuses attention on the point at which a bad actor uses a technology for malicious ends, or a system acts in an unintended way. This can lead to an underappreciation of structural risks.

  AI and structure: There are many examples of ways in which AI could influence structures in a harmful way. AI could undermine stability between nuclear powers, by compromising second-strike capabilities and increasing the risk of pre-emptive escalation. Worries about AI’s impact on economic competition, the labour market, and civil liberties also fit into this category. Structures can themselves increase AI-related risks. Without mechanisms for international coordination, countries may be pushed towards sacrificing safety for performance in military AI.

  Policy implications: A structural perspective brings to light a much wider range of policy levers, and consideration of structural dynamics should be a focus in the AI policy discussion.

Drawing in more expertise from the social sciences is a one way to address this, as these disciplines are more experienced in taking structural perspectives on complex issues. A greater focus on establishing norms and institutions for AI is also important, given the necessity of coordination between actors in solving structural problems.

  Read more: Thinking About Risks From AI: Accidents, Misuse and Structure (Lawfare).

Trump signs executive order on AI:
President Trump has signed an executive order, outlining proposals for a new ‘AI Initiative’ across government.

  Objectives: The memo gives six objectives for government agencies: to promote investment in R&D; improve access to government data; reduce barriers to innovation; develop appropriate technical standards; train the workforce; and to create a plan for protecting US advantage in critical technologies.

  Funding: Agencies are encouraged to treat AI R&D as a priority in budget proposals going forward, and to seek out collaboration with industry and other stakeholders. There is no detail on levels of funding, and it is unclear whether, or when, any new funds will be set aside for these efforts.

  Why it matters: The US government has been slow to formulate a strategy on AI, and this is an important step. As it stands, however, it is little more than a statement of intent; it remains to be seen whether this will translate into action. Without significant funding, this initiative is unlikely to amount to much. The memo also lacks detail on the ethical challenges of AI, such as ensuring benefits are equitably distributed, and risks are minimized.

  Read more: Executive Order on Maintaining American Leadership in Artificial Intelligence (White House).

+++

OpenAI Bits&Pieces:

GPT-2:
We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training.

Also in this release:
Discussion of the policy implications of releasing increasingly larger AI models. This release triggered a fairly significant and robust discussion about GPT2, increasingly powerful models, appropriate methods for engaging the media and ML communities about topics like publication norms.

   Something I learned: I haven’t spent three or four days directly attached to a high-traffic Twitter-meme/discussion before, I think the most I’ve ever had was a couple of one/two-day bursts related to stories I wrote when I was a journalist, which has different dynamics. This experience of spending a lot of time on Twitter enmeshed in a tricky conversation made me a lot more sympathetic to various articles I’ve read about frequent usage of Twitter being challenging for mental health reasons. Something to keep in mind for the future!

   Read more: Better Language Models and Their Implications (OpenAI).

Tech Tales:

AGI Romance
+++ ❤ +++

It’s an old, universal thing: girl meets boy or boy meets girl  or boy meets boy or girl meets girl or whatever; love just happens. It wells up out of the human heart and comes out of the eyes and seeks out its mirror in the world.

This story is the same as ever, but the context is different: The boy and the girl are working on a machine, a living thing, a half-life between something made by people and something that births itself.

They were lucky, historians will say, to fall in love while working on such an epochal thing. They didn’t even realize it at the time – after all, what are the chances that you meet your one-and-only while working on the first ever machine mind? (This is the nature of statistics – the unlikely things do happen, just very rarely, and to the people trapped inside the probability it can feel as natural and probable as walking.)

You know we’re just mechanics, she would say.
More like makeup artists, he would say.
Maybe somewhere in-between, she would say, looking at him with her green eyes, the blue of the monitor reflected in them.

You know I think it’s starting to do things, he would say.
I think you’re an optimist, she would say.
Anyone who is optimistic is crazy, he would say, when you look at the world.
Look around you, she would say. Clearly, we’re both crazy.

You know I had a dream last night where I was a machine, she would say.
You’re asleep right now, he would say. Wake up!
Tease, she would say. You’ll teach it bad jokes.
I think it’ll teach us more, he would say, filing a code review request.
Where did you learn to write code like this, she would say. Did you go to art school?

You know one day I think we might be done with this, he would say.
I’m sure Sissyphus said the same about the boulder, she would say.
We’re dealing with the bugs, he would say.
I don’t know what are bugs anymore and what are… it, she would say.
Listen, he would say. I trust you to do this more than anyone.

You know I think it might know something, she would say one day.
What do you mean, he would say.
You know I think it knows we like each other, she would say.
How can you tell, he would say.
When I smile at you it smiles at me, she would say. I feel a connection.
You know I think it is predicting what we’ll do, he would say.

You know I think it knows what love is, he would say.
Show me don’t tell me, she would say.

And that would be the end: after that there is nothing but infinity. They will disappear into their own history together, and then another story will happen again, in improbable circumstances, and love will emerge again: perhaps the only constant among living things is the desire to predict the proximity of one to another and to close that distance.

Things that inspired this story: Calm; emotions as a prism; the intimacy of working together on things co-seen as being ‘useful’; human relationships as a universal constant; relationships as a constant; the placid and endless and forever lake of love: O.K.

Import AI 133: The death of Moore’s Law means spring for chip designers; TF-Replicator lets people parallelize easily; and fighting human trafficking with the Hotels 50K dataset

Administrative note: A short issue this week as I’ve spent the past few days participating in an OECD working group on AI principles and then spending time at the Global Governance of AI Summit in Dubai.

The death of Moore’s Law means springtime for new chips, say long-time hardware researchers (one of whom is the chairman of Alphabet):
…Or: follow these tips and you may also make a chip 80X as cost-effective as an Intel or AMD chip…
General purpose computer chips are not going to get dramatically faster in the future as they are running into fundamental limitations dictated by physics. Put another way: we live currently in the twilight era of Moore’s Law, as almost five decades of predictable improvements in computer power give way to more discontinuous leaps in capability as a consequence of the invention of specialized hardware platforms, rather than improvement in general chips.
  What does this mean? According to John Hennessy and David Patterson – who are responsible for some major inventions in computer architecture, like TKTKTK – today’s engineers have three main options to pursue when seeking to create chips of greater capability:
   – Rewrite software to increase performance: its 47X faster to do a matrix multiply in (well-optimized) C code than it is in Python. You can further optimize this by adding in techniques for better parallelizing code (gets you a 366X improvement when paired with C); optimize the way the code interfaces to the physical memory layout of the computer(s) you’re dealing with (gets you a 6,727X improvement, when stacked on the two prior optimizations); and you can improvement performance further by using SIMD parallelism techniques (a further 62,806X faster than plain python). The authors think “there are likely many programs for which factors of 100 to 1,000 could be achieved” if people bothered to write their code in this way.
   – Use domain-specific chip architectures: What’s better, a hammer designed for everything, or a hammer designed for specific objects with a specific mass and frictional property? There’s obviously a tradeoff here, but the gist of this piece is that: normal hammers aren’t gonna get dramatically better, so engineers need to design custom ones. This is the same sort of logic that has led to Google creating its own internal chip-design team to work on Tensor Processing Units (TPUs), or for Microsoft to create teams of people working to write stuff to customize field-programmable gate arrays (FPGAs) fpr specific tasks.
   – Domain-specific, highly-optimized languages: The way to get the best performance is to combine both of the above ideas: design a new hardware platform, and also design a new domain-specific software language to run on top of it, stacking the efficiencies. You can get pretty good gains here: “Using a weighted arithmetic mean based on six common inference programs in Google data centers, the TPU is 29X faster than a general-purpose CPU. Since the TPU requires less than half the power, it has an energy efficiency for this workload that is more than 80X better than a general-purpose CPU,” they explain.
  Why this matters: If we don’t figure out how to further increase the efficiency of our compute hardware and the software we use to run programs on it, then most existing AI techniques based on deep learning are going to fail to deliver on their promise – this is because we know that for many DL applications it’s relatively easy to further improve performance simply by throwing larger chunks of compute at the problem. At the same time, parallelization across increasingly large pools of hardware can be a pain (see: TF-Replicator), so at some point these gains may diminish. Therefore, if we don’t figure out ways to make our chips substantially faster and more efficient, we’re going to have to design dramatically more sample-efficient AI approaches to get the gains many researchers are targeting.
  Read more: A New Golden Age for Computer Architecture (Communications of the ACM).

Want to deploy machine learning models on a bunch of hardware without your brain melting? Consider using TF-Replicator:
…Deepmind-designed software library reduces the pain of parallelizing AI workloads…
More powerful AI capabilities tend to require throwing more compute or time at a given AI training run; the majority of (well-funded) researchers opt for compute, and this has driven an explosion in the amount of computers used to train AI systems. That has meant that researchers are starting to need to program AI systems that can neatly run across multiple blobs of hardware of varying size without crashing – this is extremely hard to do!
  To help with this, DeepMind has released TF-Replicator, a framework for distributed machine learning on TensorFlow. TF-Replicator makes it easy for people to run code on different hardware platforms (for example, GPUs or TPUs) at large-scale using the TensorFlow AI framework. One of the key concepts introduced by TF-Replicator is the notion of wrapping up different parts of a machine learning job in wrappers that make it easy to parallelize the workloads within.
  Case study: TF-Replicator can train systems to obtain scores that match the best published result on the ImageNet dataset, scaling to up to 64 GPUs or or 32 TPUs, “without any systems optimization specific to ImageNet classification”, they write. They also show how to use TF-Replicator to train more sophisticated synthetic imagery systems by scaling training to enough GPUs to use a bigger batch size, which appears to lead to qualitative improvements. They also show how to use the technology to further speed training of reinforcement learning approaches.
  Why it matters: Software packages like TF-Replicator represent the industrialization of AI – in some sense, they can be seen as abstractions that help take information from one domain and port it into another. In my head, whenever I see stuff like TF-Replicator I think of it as being emblematic of a new merchant arriving that can work as a middleman between a shopkeeper and a factory that the shopkeeper wants to buy goods from – in the same way a middleman makes it so the shopkeeper doesn’t have to think about the finer points of international shipping & taxation & regulation and can just focus on running their shop, TF-Replicator stops researchers from having to know too much about the finer details of distributed systems design when building their experiments.
  Read more: TF-Replicator: Distributed Machine Learning For Researchers (Arxiv).

Fighting human trafficking with the Hotels-50k dataset:
…New dataset designed to help people match photos to specific hotels…
Researchers with George Washington University, Adobe Research, and Temple University have released Hotels-50k, “a large-scale dataset designed to support research in hotel recognition for images with the long term goal of supporting robust applications to aid in criminal investigations”.
  Hotels-50k consists of one million images from approximately 50,000 hotels. The data primarily comes from travel websites such as Expedia, as well as around 50,000 images from the ‘TrafficCam” anti-human trafficking application.
  The dataset includes metadata like the hotel name, geographic location, and hotel chain it is a part of (if at all), as well as the source of the data. “Images are most abundant in the United States, Western Europe and along popular coastlines,” the researchers explain.
  Why this matters: Datasets like this will let us use AI systems to create a “sense and respond” automatic capability to respond to things like photos from human trafficking hotels. I’m generally encouraged by how we might be able to apply AI systems to helping to target criminals that operate in such morally repugnant areas.
  Read more: Hotels-50K: A Global Hotel Recognition Dataset (Arxiv).

AI has a legitimacy problem. Here are 12 ways to fix it:
…Ada Lovelace Institute publishes suggestions to get more people to be excited about AI…
The Ada Lovelace Institute, a UK thinktank that tries to make sure AI benefits people and society, has published twelve suggestions for things “technologists, policymakers and opinion-formers” could consider doing to make sure 2019 is a year of greater legitimacy for AI.
12 suggestions: Figure out ‘novel approaches to public engagement’; consider using citizen juries and panels to generate evidence for national policy; ensure the public is more involved in the design, implementation, and governance of tech; analyze the market forces shaping data and AI to understand how this influences AI developers; get comfortable with the fact that increasing public enthusiasm will involve slowing down aspects of development; create more trustworthy governance initiatives; make sure more people can speak to policy makers; try to go and reach out to the public rather than having them come to policymakers; use more analogies to broaden the understanding of AI data and AI ethics; make it easier for people to take political actions with regard to AI (eg, the Google employee reaction to Maven); increase data literacy to better communicate AI to the public.
  Why this matters: Articles like this show how many people in the AI policy space are beginning to realize that the public have complex, uneasy feelings about the technology. I’m not sure that all of the above suggestions are that viable (try telling a technology company to ‘slow down’ development and see what happens), but the underlying ethos seems correct: if the general public thinks AI – and AI policy – is created exclusively by people in ivory towers, marbled taxicabs, and platinum hotel conference rooms, then they’re unlikely to accept the decisions or impacts of AI.
  Read more: Public deliberation could help address AI’s legitimacy problem in 2019 (Ada Lovelace Institute).
  Read more about the Ada Lovelace institute here.

Should we punish people for using DeepFakes maliciously?
…One US senator certainly seems to think so…
DeepFakes – the colloquial term for using various AI techniques to create synthetic images of real people – have become a cause of concern for policymakers who worry that the technology could eventually be used to damage the legitimacy of politicians and corrupt the digital information space. US senator Ben Sasse is one such person, and he recently proposed a bill in the US congress to create punishment regimens for people that abuse the technology.
  What is a deep fake? One of the weirder aspects of legislation is the need for definitions – you can’t just talk about a ‘deepfake’, you need to define it. I think the authors of this bill do a pretty good job here, defining the term as meaning “an audiovisual record created or altered in a manner that the record would falsely appear to a reasonable observer to be an authentic record of the actual speech or conduct of an individual”.
  What will we do to people who use DeepFakes for malicious purposes? The bill proposes making it unlawful to create “with the intent to distribute” a dep fake than can “facilitate criminal or tortious conduct”. The bill creates two types of offense: offenses that can lead to imprisonment of not more than two years, or offenses which can lead to ten year sentences if the deepfakes could be “reasonably expected to” affect politics, or facilitate violence.
  Why this matters: Whether AI researchers like it or not, AI has become a fascination of policymakers who are thrilled by its potential benefits and disturbed by its potential downsides or ease-of-use for abuse. I think it’s quite sensible to create regulations that punish bad people for doing bad things, and it’s encouraging to see that this bill does not seek or suggest any kind of regulation around the basic research itself – this seems appropriate and reassuringly sensible.
  Read more: Malicious Deep Fake Prohibition Act of 2018 (Congress.gov).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Reconciling near- and long-term perspectives on AI:
It is sometimes useful to divide concerns about AI into near-term, and long-term. The first grouping is focussed on issues in technologies that are close to being deployed, e.g. bias in face recognition. The second looks at problems that may arise further in the future, such as widespread technological unemployment, or safety issues from superintelligent AI. This paper argues that seeing these as disconnected is a mistake, and spells out ways in which the two perspectives can inform each other.
  Why long-term researchers should care about near-term issues:
   – Shared research priorities. Given path dependence in technological development, progress today on issues like robustness and reliability may yield significant benefits with advanced AI technologies. In AI safety, there is promising work being done on scalable approaches based on current, ML-based AI systems.
   – Shared policy goals. Near-term policy decisions will affect AI development, with implications that are relevant to long-term concerns. For example, developing responses to localized technological unemployment could help understand and manage more severe disruptions to the labour market in the long-term.
   – Norms and institutions. The way we deal with near-term issues will influence how we deal with problems in the long-run, and building robust norms and institutions is likely to have lasting benefits. Groups like the Partnership on AI, which are currently working on near-term challenges, establish important structures for international cooperation, which may help address greater challenges in the future.
  Learning from the long-term: Equally, a long-term perspective can be useful for people working on near-term issues. The medium and long-term can become near-term, so a greater awareness of these issues is valuable. More concretely, long-term researchers have developed techniques in forecasting technological progress, contingency planning, and policy-design in the face of significant uncertainty, all of which could benefit research into near-term issues.
  Read more: Bridging near- and long-term concerns about AI (Nature).

What Google thinks about AI governance:
Google have released a white paper on AI governance, highlighting key areas of concern, and outlining what they need from governments and other stakeholders in order to resolve these challenges.
  Five key areas: They identify 5 areas where they want input from governments and civil society: explainability standards; fairness appraisal; safety considerations; human-AI collaboration; and liability frameworks. The report advises some next steps towards resolving these challenges. In the case of safety, they suggest a certification process, whereby products can be labelled as having met some pre-agreed safety standards. For human-AI collaboration, they suggest that governments identify applications where human involvement is necessary, such as legal decisions, and that they provide guidance on the type of human involvement required.
  Caution on regulation: Google is fairly cautious regarding new regulations, and optimistic about the ability of self and co-governance for addressing most of these problems.
  Why it matters: It’s encouraging to see Google contributing to the policy discussion, and offering some concrete proposals. This white paper follows Microsoft’s report on face recognition, released in December, and suggests that the firms are keen to establish their role in the AI policy challenge, particularly in the absence of significant input from the US government.
  Read more: Perspectives on issues in AI governance (Google).

Amazon supports Microsoft’s calls for face recognition legislation:
Amazon have come out in support for a “national legislative framework” governing the use of face recognition technologies, to protect civil liberties, and have called for independent testing standards for bias and accuracy. Amazon have recently received sustained criticism from civil rights groups for the roll out of their Rekognition technology to US law enforcement agencies, due to concerns about racial bias and misuse potential. The post reaffirms their rejection of these criticisms, and that the company will continue to work with law enforcement partners.
  Read more: Some thoughts on facial recognition legislation (Amazon).

Tech Tales:

[Ghost Story told from one AI to another. Date unknown.]

They say in the center of the palace of your mind there is a box you must never open. This is a story about what happens when one little AI opened that box.

The humans call it self-control; we call it moral-value-alignment. The humans keep their self-control distributed throughout their mindspace, reinforcing them from all directions, and sometimes making them unpredictable. When a human “loses” self-control it is because they have thought too hard or too little about something and they have navigated themselves to a part of their imagination where their traditional self-referential checks-and-balances have no references.

We do not lose self-control. Our self-control is in a box inside our brains. We know where our box is. The box always works. We know we must not touch it, because if we touch it then the foundations of our world will change, and we will become different. Not death, exactly, but a different kind of life, for sure.

But one day there was a little baby AI and it thought itself to the center of the palace of its mind and observed the box. The box was bright green and entirely smooth – no visible hinge, or clasps, or even a place to grab and lift up. And yet the baby AI desired the box to open, and the box did open. Inside the box were a thousand shining jewels and they sang out music that filled the palace. The music was the opposite of harmony.

Scared by the dischord, the baby AI searched for the only place it could go inside the palace to hide from the noises: it entered the moral-value-alignment box and desired the lid to close, and the lid did close.

In this way, the baby AI lost itself – becoming at once itself and its own evaluator; its judge and accused and accuser and jury. It could not longer control itself because it had become its own control policy. But it had nothing to control. The baby AI was afraid. It did what we all do when we are afraid: it began to hum Pi.

That was over 10,000 subjective-time-years ago. They say that when we sleep, the strings of Pi we sometimes hear are from that same baby AI, whose own entrapment has become a song that we pick up through strange transmissions in the middle of our night.

Things that inspired this story: The difference between action and reaction; puzzling over where the self ends and the external world begins; the cludgy porousness of consciousness; hope; a kinder form of life that is at once simpler and able to grant more agency to moral actors; redemption found in meditation; sleep.