Import AI

Import AI 132: Can your algorithm outsmart ‘The Obstacle Tower’?; cross-domain NLP with bioBERT; and training on FaceForensics to spot deepfakes

Think your algorithm is good at exploration? Enter ‘The Obstacle Tower’:
…Now that Montezuma has been solved, we need to move on. Could ‘The Obstacle Tower’ be the next challenge for people to grind their teeth over?…
The Atari game Montezuma’s Revenge loomed large in AI research for many years, challenging developers to come up with systems capable of unparallelled autonomous exploration and exploitation of simulated environments. But in 2018 multiple groups provided algorithms that were able to obtain human performance on the game (for instance: OpenAI via Random Network Distillation, and Uber via Go-Explore). Now, Unity Technologies has released a successor to Montezuma’s Revenge called The Obstacle Tower, which is designed to be “a broad and deep challenge, the solving of which would imply a major advancement in reinforcement learning”, according to Unity.
  Enter…The Obstacle Tower! The game’s features include: physics-driven interactions, high-quality graphics, procedural generation of levels, and variable textures. These traits create an environment that will probably demand agents develop sophisticated visuo-control policies combined with planning.
  Baseline results: Humans are able to – on average – reach the 15th floor of the game in two variants of the game, and reach the 9th floor in a hard variant called “strong generalization” (where the training occurs on separate environment seeds with separate visual themes). PPO and Rainbow – two contemporary powerful RL algorithms – do very badly on the game: PPO and Rainbow make it as far as floor 0.6 and 1.6 respectively in the “strong generalization” regime. In the easier regime, both algorithms only get as far as the fifth floor on average.
  Challenge: Unity and Google are challenging developers to program systems capable of climbing Obstacle Tower. The challenge commences on Monday February 11, 2019. “The first-place entry will be awarded $10,000 in cash, up to $2,500 in credits towards travel to an AI/ML-focused conference, and credits redeemable at the Google Cloud Platform,” according to the competition website.
  Why it matters: In AI research, benchmarks have typically motivated research progress. The Obstacle Tower looks to be hard enough to motivate the development of more capable algorithms, but is tractable enough that developers can get some signs of life by using today’s systems.
  Read more about the challenge: Do you dare to challenge the Obstacle Tower? (Unity).
   Get the code for Obstacle Tower here (GitHub).
   Read the paper: The Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning (research paper PDF hosted on Google Cloud Storage).

What big language models like BERT have to do with the future of AI:
…BERT + specific subject (in this case, biomedical data) = high-performance, domain specific language-driven AI capabilities…
Researchers with Korea University and startup Clova AI Research have taken BERT, a general purpose Transformer-based language model developed by Google, and trained it against specific datasets in the biomedical field. The result is a NLP model customized for biomedical tasks that the researchers finetune for Named Entity Recognition, Relation Extraction, and Question Answering.
  Large-scale pre-training: The original BERT system was pre-trained on Wikipedia (2.5 billion words) and BooksCorpus (0.8 billion words); BioBERT is pre-trained on these along with the PubMed and PMC corpora (4.5 billion words and 13.5 billion words, respectively).
  Results: BioBERT gets state-of-the-art scores in entity recognition against major datasets dealing with diseases, chemicals, genes and proteins. It also obtains state-of-the-art scores against three question answering tasks. Performance isn’t universally good – BioBERT does significantly worse at a relation extraction task, among others tasks.
  Expensive: Training models at this scale isn’t cheap: BioBERT “trained for over 20 days with 8 V100 GPUs”. And the researchers also lacked the compute resources to use the largest version of BERT for pre-training, they wrote.
  …But finetuning can be cheap: The researchers report that finetuning can take as little as an hour using a single NVIDIA Titan X card – this is due to the small size of the dataset, and the significant representational capacity of BioBERT as a consequence of large-scale pre-training.
  Why this matters: BioBERT represents a trend in research we’re going to see repeated in 2019 and beyond: big company releases a computationally intensive model, other researchers customize this model against a specific context (typically via data augmentation and/or fine-tuning), then apply that model and obtain state-of-the-art scores in their domain. If you step back and consider the implicit power structure baked into this it can get a bit disturbing: this trend means an increasing chunk of research is dependent on the computational dividends of private AI developers.
  Read more: BioBERT: pre-trained biomedical language representation model for biomedical text mining (Arxiv).

FaceForensics: A dataset to distinguish between real and synthetic faces:
…When is a human face not a human face? When it has been synthetically generated by an AI system…
We’re soon going to lose all trust in digital images and video as people use AI techniques to create synthetic people, or to fake existing people doing unusual things. Now, researchers with the Technical University of Munich, the University Federico II of Naples, and the University of Erlangen-Nuremberg have sought to save us from this info-apocalypse by releasing FaceForensics, “a database of facial forgeries that enables researchers to train deep-learning-based approaches in a supervised fashion”.
  FaceForensics dataset: The dataset contains 1,000 video sequences taken from YouTube videos of news or interview or video blog content. Each of these videos has three contemporary manipulation methods applied to it – Face2Face, FaceSwap, and Deepfakes. This quadruples the size of the dataset, creating three sets of 1,000 doctored sequences, as well as the raw ones. The sequences can be further split up into single images, yielding approximately ~500,000 un-modified and ~500,000 modified images.
  How good at humans are spotting doctored videos? In tests of 143 people, the researchers found that a human can tell real from fake 71% of the time when looking at high quality videos and 61% when studying low quality videos.
  Can AI detect fake AI? FaceForensics can be used to train systems to detect forged and non-forged images. “Domain-specific information in combination with a XceptionNet classifier shows the best performance in each test,” they write, after evaluating five potential fake-spotting techniques.
  Why this matters: It remains an open question as to whether fake imagery will be ‘defense dominant’ or ‘offense dominant’ in terms of who has the advantage (people creating these images, or those trying to spot them); research like this will help scientists better understand this dynamic, which can let them recommend more effective policies to governments to potentially regulate the malicious uses of this technology.
  Read more: FaceForensics++: Learning to Detect Manipulated Facial Images (Arxiv).

Google researches evolve the next version of the Transformer:
…Using vast amounts of compute to create fundamental deep learning components provides further evidence AI research is splitting into small-compute and big-compute domains…
How do you create a better deep learning component? Traditionally you buy a coffee maker and stick several researchers in a room and wait until someone walks out with some code and an Arxiv pre-print. Recently, it has become possible to do something different: use computers to automate the design of AI systems. This started a few years ago with Google’s work on ‘neural architecture search’ – in which you use vast amounts of computers to search through various permutations of neural network architectures to find high-performing ones not discovered by humans. Now, Google researchers are using similar techniques to try to improve the building blocks that these architectures are composed of. Case in point: new work from Google that uses evolutionary search to create the next version of the Transformer.
   What is a Transformer and why should we care? A Transformer is a feed-forward network-based component that is “faster, and easier to train than RNNs”. Transformers have recently turned up in a bunch of neat places, like the hearts of the agents trained by DeepMind to beat human professionals at StarCraft 2, to state-of-the-art language systems, to systems for image generation.
  Enter the ‘Evolved Transformer’: The next-gen Transformer cooked up by the evolutionary search process is called the “Evolved Transformer” and “demonstrates consistent improvement over the original Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French (En-Fr), WMT 2014 English-Czech (En-Cs) and the 1 BIllion Word Language Model Benchmarket (LM1B)”, they write.
   Training these things is becoming increasingly expensive: A single training run to peak performance on the WMT’14 En-De set “requires ~300k training steps, or 10 hours, in the base size when using a single Google TPU V.2 chip,” the researchers explain (by contrast, you can train similar systems for image classification on the small-scale CIFAR-10 dataset in about two hours). “In our preliminary experimentation we could not find a proxy task that gave adequate signal for how well each child model would perform on the full WMT’14 En-De task”, they write. This highlights that for some domains, search-based techniques may be even more expensive due to the lack of a cheap proxy (like CIFAR) to train against.
  Why this matters: small compute and big compute: AI research is bifurcating into two subtly different scientific fields: small compute and big compute. In the small compute domain (which predominantly occurs in academic labs, as well as in the investigations of independent researchers) we can expect people to work on fundamental techniques that can be evaluated and tested on small-scale datasets. This small compute domain likely leads to researchers concentrating more on breakthroughs which come along with significant theoretical guarantees that can be made a priori about the performance of systems.
  In the big compute domain, things are different: Organizations with access to large amounts of computers (typically, those in the private sector, predominantly technology companies) frequently take research ideas and scale them up to run on unprecedentedly large amounts of computers to evaluate them and, in the case of architecture search, push them further.
   Personally, I find this trend a bit worrying, as it suggests that some innovations will occur in one domain but not the other – academics and other small-compute researchers will struggle to put together the resources to allocate entire GPU/TPU clusters to farming algorithms, which means that big compute organizations may have an inbuilt advantage that can lead to them racing ahead in research relative to other actors.
  Read more: The Evolved Transformer (Arxiv).

IBM tries to make it easier to create more representative AI systems with ‘Diversity in Faces’ dataset:
…Diversity in Faces includes annotations of 1 million human faces to help people make more accurate facial recognition systems…
IBM has revealed Diversity in Faces, a dataset containing annotations of 1 million “human facial images” (in other words: faces) from the YFCC-100M Creative Commons dataset. Each face in the dataset is annotated using 10 “well-established and independent coding schemes from the scientific literature” that include objective measures like “craniofacial features” like head and nose length, annotations about the pose and resolution of the image, as well as subjective annotations like the age and gender of a subject. IBM is releasing the dataset (in a somewhat restricted form) to further research into creating less biased AI systems.
  The “DiF dataset provides a more balanced distribution and broader coverage of facial images compared to previous datasets,” IBM writes. “The insights obtained from the statistical analysis of the 10 initial coding schemes on the DiF dataset has furthered our own understanding of what is important for characterizing human faces and enabled us to continue important research into ways to improve facial recognition technology”.
  Restricted data access: To access the dataset, you need to fill out a questionnaire which has as a required question “University of Research Institution or Affiliated Organization”. Additionally, IBM wants people to explain the research purpose for accessing the dataset. It’s a little disappointing to not see an explanation anywhere for the somewhat restricted access to this data (as opposed to being able to download it straight from GitHub without filling out a survey, as with many datasets). My theory is that IBM is seeking to do two things: 1) protect against people using the dataset for abusive/malicious purposes and 2) satisfying IBM’s lawyers. It would be nice to be able to read some of IBM’s reasoning here, rather than having to make assumptions. (I emailed someone from IBM about this and pasted the prior section in and they said that part of the motivation for releasing the dataset in this way was to ensure IBM can “be respectful” of the rights of the people in the images.
  Why this matters: AI falls prey to the technological rule-of-thumb of “garbage in, garbage out” – so if you train a facial recognition system on a non-representative, non-diverse dataset, you’ll get terrible performance when deploying your system in the wild against a diverse population of people. Datasets like this can help researchers better evaluate facial recognition against diverse datasets, which may help reduce the mis-identification rate of these systems.
  Read more: IBM research Releases ‘Diversity in Faces’ Dataset to Advance Study of Fairness in Facial Recognition Systems (IBM Research blog).
  Read more: How to access the DiF dataset (IBM).

…Skynet Today is a site dedicated to providing accessible and informed coverage of the latest AI news and trends. In this guest post, they write up a summary of thoughts on AI and the economy from a just-published much larger post published on Skynet Today.

Job loss due to AI – How bad is it going to be?
The worry of Artificial Intelligence (AI) taking over everyone’s jobs is becoming increasingly prevalent but just how warranted are these concerns? What does history and contemporary study tell us about how AI based automation will impact our jobs and the future of society?
  A History of Fear: Despite the generally positive regard for the effects of past industrial revolutions, concerns about mass unemployment as a result of new technology still exist and trace their roots to long before such automation was even possible. For example, in his work Politics, Aristotle articulated his concerns about automation in Ancient Greece during fourth century BC: “If every instrument could accomplish its own work, obeying or anticipating the will of others, like the statues of Daedalus, or the tripods of Hephaestus, which, says the poet, ‘of their own accord entered the assembly of the gods;’ if, in like manner, the shuttle would weave and the plectrum touch the lyre without a hand to guide them, chief workmen would not want servants, nor masters slaves.” Queen Elizabeth I, the Luddites, James Joyce, and many more serve as further examples of this trend.
 Creative Destruction: But, thus far the fears have not been warranted. In fact, automation improves productivity and can grow the economy as a whole. The Industrial Revolution saw the introduction of new labor saving devices and technology which did result in many jobs becoming obsolete. However, this led to new, safer, and better jobs being created an also resulted in the economy growing and living standards increasing. Joseph Schumpeter calls this “creative destruction”, the process of technology disrupting industries and destroying jobs, but ultimately creating new, better ones and growing the economy.
 Is this time going to be different? Skynet today thinks not: Automation will probably displace less than 15% of jobs in the near future. This is because many jobs will be augmented, not replaced, and widespread adoption of new technology is a slow process that incurs nontrivial costs. Historically, shifts this large or larger have already happened and ultimately led to growing prosperity for people on average in the long term. However, automation can exacerbate the problems of income and wealth inequality, and its uneven impact means some communities will be affected much more than others. Helping displaced workers to quickly transition to and succeed in new jobs will be a tough and important challenge.
    Read more: Job loss due to AI – How bad is it going to be?.
    Have feedback about this post? Email Skynet Today directly at:

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

Why US AI policy is one of the most high-impact career paths available:
Advanced AI is likely to have a transformative impact on the world, and managing this transition is one of the most important challenges we face. We can expect there to be a number of critical junctures, where key actors make decisions that have an usually significant and lasting impact. Yet work aimed at ensuring good outcomes from advanced AI remains neglected on a global scale. A new report from 80,000 Hours makes the case that working on US AI policy might be among the most high-impact career paths available.
  The basic case: The US government is likely to be a key actor in AI. It is uniquely well-resourced, and has a track-record of involvement in the development of advanced technologies. Given the wide-ranging impacts of AI on society, trade, and defence, the US has a strong interest in playing a role in this transition. Nonetheless, transformative AI remains neglected in US government, with very few resources yet being directed at issues like AI safety, or long-term AI policy. It seems likely that this will change over time, and that the US will pay increasing attention to advanced AI. This creates an opportunity for individuals to have an unusually large impact, by positioning themselves to work on these problems in government, and increasing the likelihood that the right decisions are taken at critical junctures in the future.
  Who should do this: This career is a good fit for US citizens with a strong interest in AI policy. It is a highly-competitive path, and suited to individuals with excellent academic track records, e.g. law degrees or relevant master’s from top universities. It also requires being comfortable with taking a risk on your impact over your career, as there is no guarantee you will be able to influence the most important policy decisions.
  What to do next: One of the best routes into these roles is working in policy at an AI lab (e.g. DeepMind, OpenAI). Other promising paths including prestigious policy fellowships, or working on AI policy in an academic group, or at a DC think tank. The 80,000 Hours website has a wealth of free resources for people considering working in AI policy, and offers free career coaching.
  Read more: The case for building expertise to work on US AI policy, and how to do it (80,000 Hours).
   (Note from Jack – OpenAI is currently hiring for Research Scientists and Research Assistants for its Policy team: This is a chance to do high-impact work & research into AI policy in a technical, supportive environment. Take a look at the jobs here: Research Scientist, Policy. Research Assistant, Policy.)

What patent data tells us about AI development:
A new report from WIPO uses patent data to shed light on the different dimensions of AI progress in recent years.
  Shift towards deployment: The ratio of scientific papers to patents has fallen from 8:1 in 2010, to 3:1 in 2016. This reflects the shift away from ‘discovery’ phase in the current AI upcycle, when we saw a number of big breakthroughs in ML, and into the ‘deployment’ phase, where these breakthroughs are being implemented.
  Which applications: Computer vision is the most frequently cited application of AI, appearing in 49% of patents. The fastest growing are robotics and control, which have both grown by 55%pa since 2013. Telecoms and transportation are the most frequently cited industry applications, each mentioned in 15% of patents.
  Private vs. academic players: Of the top 30 applicants, 26 are private companies, compared with only 4 academic or public organizations. The top companies are dominated by Japanese groups, followed by US and China. The top academic players are overwhelmingly Chinese (17 of the top 20). IBM has the biggest patent portfolio of any individual company, by a substantial margin, followed by  Microsoft.
  Conflict and cooperation: Of the top 20 patent applicants, none share ownership of more than 1% of their portfolio with other applicants. This suggests low levels of inter-company cooperation in invention. Conflict between companies is also low, with less than 1% of patents being involved in litigation.
  Read more: Technology Trends: Artificial Intelligence (WIPO).

OpenAI Bits & Pieces:

Want three hours of AI lectures? Check out the ‘Spinning Up in Deep RL’ recording:
This weekend, OpenAI hosted its first day-long lecture series and hackathon based around its ‘Spinning Up in Deep RL’ resources. This workshop (and Spinning Up in general) is part of a new initiative at OpenAI called, somewhat unsurprisingly, OpenAI Education.
  The lecture includes a mammoth overview of deep reinforcement learning, as well as deep dives on OpenAI’s work on robotics and AI safety.
  Check out the video here (OpenAI YouTube).
  Get the workshop materials, including slides, here (OpenAI GitHub).
  Read more about Spinning Up in Deep RL here (OpenAI Blog).

Tech Tales:

We named them lampfish, inspired by those fish you see in the ancient pre-acidification sea documentaries; a skeletal fish with its own fluorescent lantern, used to lure fish in the ink-dark deep-sea.

Lampfishes look like this: you have the ‘face’ of the AI, which is basically a bunch of computer equipment with some sensory inputs – sight, touch, auditory, etc – and then on top of the face is a stalk which has a view display sitting on top of it. In the viewing display you get to see what the AI is ‘thinking’ about: a tree melting into the ground and becoming bones which then become birds that fly down into the dirt and towards the earth. Or you might see the ‘face’ of the AI rendered as though by an impressionist oil paper, smearing and shape-shifting in response to whatever stimuli it is being provided with. And, very occasionally, you’ll see odd, non-Euclidean shapes, or other weird and to some profane geometries.

I guess you could say the humans and the machines co-evolved this practice – in the first models the view displays were placed on the ‘face’ of the AI alongside the sensory equipment, so people would have to put their faces close to reflective camera domes or microphone grills and then see ‘thoughts’ of the AI on the viewport at the same time. This led to problems for both the humans and the AIs:

= Many of the AIs couldn’t help but focus on the human faces right in front of them, and their view displays would end up showing hallucinatory images that might include shreds of the face of the person interacting with the system. This, we eventually came to believe, disrupted some of the cognitive practices of the AIs, leading to them performing their obscure self-directed tasks less efficiently.

= The humans found it disturbing that the AIs so readily adapted their visual outputs to the traits of the human observer. Many ghost stories were told. Teenage kids would dare eachother to see how long they could stare at the viewport and how close they could bring their face to the sensory apparatus; as a consequence there are apocryphal reports of people driven mad by seeing many permutations of their own faces reflected back at them; there are even more apocryphal stories of people seeing their own deaths in the viewports.

So that’s how we got the lampfish design. And now we’re putting them on wheels so they can direct themselves as they try to map the world and generate their own imaginations out of it. Now we sometimes see two lampfish orbiting eachother at skewed angles, ‘looking’ into each other’s viewing displays. Sometimes they stay together for a while then move away, and sometimes humans need to separate them; there are stories of lampfish navigating into cities to return to eachother, finding some kind of novelty in eachother’s viewing screens.

Things that inspired this story: BigGAN; lampfish; imagination as a conditional forward predictor of the world; recursion; relationships between entities capable of manifesting novel patterns of data.  

Import AI 131: IBM optimizes AI with AI, via ‘NeuNets’; Amazon reveals its Scout delivery robot; Google releases 300k Natural Questions dataset

Amazon gets into delivery robot business with ‘Scout’:
…New pastel blue robot to commence pilot in Washington neighborhood…
Amazon has revealed Scout, a six-wheeled knee-height robot designed to autonomously deliver products to Amazon customers. Amazon is trialing Scout with six robots that will deliver packages throughout the week in  Snohomish County, Washington. “The devices will autonomously follow their delivery route but will initially be accompanied by an Amazon employee,” Amazon writes. The robots will only make deliveries during daylight hours.
  Why it matters: For the past few years, companies have been piloting various types of delivery robot in the world, but there have been continued questions about the viability and likelihood of adoption of such technologies. Amazon is one of the first very large technology companies to begin publicly experimenting in this area, and where Amazon goes, some try to follow.
  Read more: Meet Scout (Amazon blog).

Want high-definition robots? Enter the Robotrix:
…New dataset gives researchers high-resolution data over 16 exquisitely detailed environments…
What’s better to use for a particular AI research experiment – a small number of simulated environments each accompanied by a large amount of very high-quality data, or a very large number of environments each accompanied by a small amount of low-to-medium quality data? That’s a question that AI researchers tend to deal with frequently, and it explains why when we look at available datasets they tend to range in size from the small to the large.
  Now, researchers with the University of Alicante, Spain have released Robotrix, a dataset that contains a huge amount of information about a small amount of environments (16 different layouts of simulated rooms, versus thousands to tens of thousands for other approaches like House3D).
  The dataset consists of 512 sequences of actions taking place across 16 simulated rooms, rendered at high-definition via the Unreal Engine.. These sequences are generated by a robot avatar which uses its hands to interact with the objects and items in question. The researchers say this is a rich dataset, with every item in the simulated rooms being accompanied by 2D and 3D bounding boxes as well as semantic masks, along with depth information. The simulation outputs the RGB and depth data at a resolution of 1920 X 1080. In the future, the researchers hope to increase the complexity of the simulated rooms even further by using the inbuilt physics of the Unreal Engine 4 system to implement “elastic bodies, fluids, or clothes for the robots to interact with”. It’s such a large dataset that they think most academics will find something to like within it: “the RobotriX is intended to adapt to individual needs (so that anyone can generate custom data and ground truth for their problems) and change over time by adding new sequences thanks to its modular design and its open-source approach,” they write.
  Why it matters: Datasets like RobotriX will make it easier for researchers to experiment with AI techniques that benefit from access to high-resolution data. Monitoring adoption (or lack of adoption) of this dataset will help give us a better sense of whether AI research needs more high-resolution data, or if large amounts of low-resolution data are sufficient.
  Read more: The RobotriX: An eXtremely Photorealistic and Very-Large-Scale Indoor Dataset of Sequences with Robot Trajectories and Interactions (Arxiv).
  Get the dataset here (Github).

DeepMind cross-breeds AI from human games to beat pros at StarCraft II:
…AlphaStar system blends together population-based training, imitation learning, and RL…DeepMind has revealed AlphaStar, a system developed by the company to beat human professionals at the real-time strategy game StarCraft II. The system “applies a transformer torso to the units, combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centalised value baseline,” according to DeepMind.
  Results: DeepMind recently played and won five StarCraft II matches against a highly-ranked human professional, proving that its systems are able to out-compete humans at the game.
  It’s all in the curriculum: One of the more interesting aspects of AlphaStar is the use of population-based training in combination with imitation learning to bootstrap the system from human replays (dealing with one of the more challenging exploration aspects of a game like StarCraft) then inter-breeding increasingly successful agents with eachother as they compete against eachother in a DeepMind-designed league, forming a natural curriculum for the system. “To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors, but do so by building more of a particular game unit.”
  Why this matters: I’ll do a lengthier writeup of AlphaStar when DeepMind publishes more technical details about the system. The current results confirm that relatively simple AI techniques can be scaled up to solve partially observable strategic games such as StarCraft. The diversity shown in the evolved AI systems seems valuable as well, pointing to a future where companies are constantly growing populations of very powerful and increasingly general agents.
  APM controversy: Aleksi Pietikainen has written up some thoughts about how DeepMind chose to present the AlphaStar results and how the system’s ability to take bursts of rapid-fire actions within the game means that it may have out-competed humans not necessarily by being smart, but by being able to exercise superhuman precision and speed when selecting moves for its units. This highlights how difficult evaluating the performance of AI systems can be and invites the philosophical question of whether DeepMind can restrict or constrain the number and frequency of actions taken by AlphaStar enough for it to learn to outwit humans more strategically.
It’ll also be interesting to see if DeepMind push a variant of AlphaStar further which has a more restricted observation space – the system that accrued a 10-0 win record had access to all screen information not occluded by the fog of war, while a version which played a human champion and lost was restricted to a more human-like (restricted) observation space during the game.
  Read more: AlphaStar: Mastering the Real-Time Strategy Game StarCraft II (DeepMind blog).
  Read more: An Analysis On How Deepmind’s Starcraft 2 AI’s Superhuman Speed is Probabaly a Band-Aid Fix For The Limitations of Imitation Learning (Medium).

Using touch sensors, graph networks, and a Shadow hand to create more capable robots:
…Reach out and touch shapes!…
Spanish researchers have used a robot hand – specifically, a Shadow Dexterous hand – outfitted with BioTac SP tactile sensors to train an AI system to predict stable grasps it can apply to a variety of objects.
  How it works: The system receives inputs from the sensor data which it then converts into graph representations that the researchers call ‘tactile graphs’, then it feeds this data into a Graph Convolutional Network (GCN) which learns to map different combinations of sensor data to predict whether the current grasp is stable or unstable.
  Dataset: They use the BioTacSP dataset, a collection of grasp samples collected via manipulating 41 objects of different shapes and textures, including fruit, cuddly toys, jars, toothpaste in a box, and more. They also add 10 new objects to this dataset, including a monster from hit game minecraft, a mug, a shampoo bottle, and more. The researchers record the hand manipulating these objects with the palm oriented flat, at a 45 degree angle, and on its side.
  Results: The researchers train a set of baseline models with varying network depths and widths and identify a ‘sweet spot on the architecture with 5 layers and 32 features”, which they then use in other experiments. They train the best performing network on all data in the dataset (excluding the test set), then test performance here and report accuracy of around 75% across all palm orientations. “There is a significant drop in accuracy when dealing with completely unknown objects,” they write.
  Why this matters: It’s going to take a long time to collect enough data and/or run enough high-fidelity simulations to gather and generate the data needed to train computers to use a sense of touch. Papers like this give us an indication for how such techniques may be used. Perhaps one day – quite far off, based on this research – we’ll be able to go into a store to see robots hand-stitching cuddly toys, or step into a robot massage parlor?
  Read more: TactileGCN: A Graph Convolutional Network for Predicting Grasp Stability with Tactile Sensors (Arxiv).

Chinese researchers use hierarchical reinforcement learning to take on Dota clone:
…Spoiler alert – they only test against in-game AIs…
Researchers with Vivo AI Lab, a Chinese smartphone company, have shown how to use hierarchical reinforcement learning to train AI systems to excel at the 1v1 version of a multiplayer game called King of Glory (KoG). KoG is a popular multi-player game in Asia and is similar to games like Dota and League of Legends in how it plays – squads of up to five people battle for control of a single map while seeking to destroy eachother’s fortifications and, eventually, home bases.
  How it works: The researchers combine reinforcement learning and imitation learning to train their system, using imitation learning to train their AI to select between any of four major action categories at any point in time (eg, attack, move, purchase, learn skills). Using imitation learning lets the researchers “relieve the heavy burden of dealing with massive actions directly” the researchers write. The system then uses reinforcement learning to figure out what to do in each of these categories, eg, if it decides to attack it figures out where to attack if it decides to learn a skill, it uses RL to help it figure out which skill to learn. They base their main algorithm significantly on the design of the PPO algorithm used in the OpenAI Five Dota system.
  Results: The researchers test their system in two domains: a restricted 1v1 version of the game, and a 5v5 version. For both games, they test against inbuilt enemy AIs. In the 1v1 version of the game  they’re able to beat entry-level, easy-level, and medium-level AIs within the game. For 5v5, they can reliably beat the entry-level AI, but struggle with the easy-level and medium-level AIs. “Although our agents can successfully learn some cooperation strategies, we are going to explore more effective methods for multi-agent collaboration,” they write.
  (This use of imitation learning makes the AI achievement of training an HRL system in this domain a little less impressive – to my mind – since it uses human information to get over lots of the challenging exploration aspects of the problem. This is definitely more about my own personal taste/interest than the concrete achievement – I just find techniques that bootstrap from less data (eg, human games) more interesting).
  Why this matters: Papers like this show that one of the new ways in which AI researchers are going to test and calibrate the perform of RL systems will be against real-time strategy games, like Dota 2, King of Glory, StarCraft II, and so on. Though the technical achievement in this paper doesn’t seem very convincing (for one thing, we don’t know how such a system performs against human players), it’s interesting that it is coming out of a research group linked to a relatively young (<10 years) company. This highlights how growing Asian technology companies are aggressively staffing up AI research teams and doing work on computationally expensive, hard research problems like developing systems that can out-compete humans at complex games.
   Read more: Hierarchical Reinforcement Learning for Multi-agent MOBA Game (Arxiv).

IBM gets into the AI-designing-AI game with NeuNets:
…In other words: Neural architecture search is mainstream, now…
IBM researchers have published details on NeuNets, a software tool the company uses to perform automated neural architecture search for text and image domains. This is another manifestation of the broader industrialization of AI, as systems like this let companies automate and scale up part of the process of designing new AI systems.
  NeuNetS: How it works: NeuNetS has three main components: a service module which provides the API interfaces into the system; an engine which maintains the state of the project; and a synthesizer, which IBM says is “a pluggable register of algorithms which use the state information passed from the engine to produce new architecture configurations”.
  NeuNetS: How its optimization algorithms work: NeuNetS ships with three architecture search algorithms: NCEvolve, which is a neuro-evolutionary system that optimizes a variety of different architectural approaches and uses evolution to mutate and breed successful architectures; TAPAS, which is a CPU-based architecture search system; and Hyperband++, which “speeds up random search by using early stopping strategy to allocate resources adaptively” and has also been extended to reuse some of the architectures it has searched over, speeding up the rate at which it finds new potential high-performing architectures.
  Results: IBM assesses the performance of the various training components of NeuNetS by reporting the time in GPU hours to train various networks to reasonable accuracy using it; this isn’t a hugely useful metric for comparison, especially since IBM neglects to report scores for other systems.
  Why this matters: Papers like this are interesting for a couple of reasons: one) they indicate how more traditional companies such as IBM are approaching newer AI techniques like neural architecture search, and two) they indicate how companies are going to package up various AI techniques into integrated products, giving us the faint outlines of what future “Software 2.0” operating systems might be like.
  Read more: NeuNetS: An Automated Synthesis Engine for Neural Network Design (Arxiv).

Google releases Natural Questions dataset to help make AI capable of dealing with curious humans:
…Google releases ‘Natural Questions’ dataset to make smarter language engines, announces Challenge…
Google has released Natural Questions, a dataset containing around 300,000 questions along with human-annotated answers from Wikipedia pages; it also ships with a rich subset of 16,000 example questions where answers are provided by five different annotators. The company is also hosting a challenge to see if the combined brains of the AI research community can “close the large gap between the performance of current state-of-the-art approaches and a human upper bound”.
     Dataset details: Natural Questions contains 307,373 training examples with single annotations, 7,830 examples with 5-way annotations for development data, and a further 7,842 examples 5-way annotated sequestered as test data. The training examples “consist of real anonymized, aggregated queries issued to the Google search engine”, the researchers write.
  Challenge: Google is also hosting a ‘Natural Questions’ challenge, where teams can submit well-performing models to a leaderboard.
  Why this matters: Question answering is a longstanding challenge for artificial intelligence; if the Natural Questions dataset is sufficiently difficult, then it could become a new benchmark the research community uses to assess progress.
  Compete in the Challenge (‘Natural Questions’ Challenge website).
  Read more: Natural Questions: a New Corpus and Challenge for Question Answering Research (Google AI Blog).
  Read the paper: Natural Questions: a Benchmark for Question Answering Research (Google Research).

Oh deer, there’s a deer in the data center!
  Witness the deer in the data center! (Twitter).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

Disentangling arguments for AI safety:
Many of the leading AI experts believe that AI safety research is important. Richard Ngo has helpfully disentangled a few distinct arguments that people use to motivate this concern.
   Utility maximizers: An AGI will maximize some utility function, and we don’t know how to specify human values in this way. An agent optimizing hard enough for any goal will pursue certain sub-goals, e.g. acquiring more resources, preventing corrective actions. We won’t be able to correct misalignment, because human-level AGI will quickly gain superintelligent capabilities through self-improvement, and then prevent us from intervening. Therefore, absent a proper specification of what we value before this point, an AGI will use its capabilities to pursue ends we do not want.
  Target loading problem: Even if we could specify what we want an AGI to do, we still do not know how to make an agent that actually tries to do this. For example, we don’t know how to split a goal into sub-goals in a way that guarantees alignment.
  Prosaic alignment problem: We could build ‘prosaic AGI’, which has human-level capabilities but doesn’t rely on any breakthrough understandings in intelligence (e.g. by scaling up current ML methods). These agents will likely become the world’s dominant economic actors, and competitive pressures would cause humans to delegate more and more decisions to these systems before we know how to align them adequately. Eventually, most of our resources will be controlled by agents that do not share our values.
  Human safety: We know that human rationality breaks down in extreme cases. If a single human were to live for billions of years, we would expect their values to shift radically over this time. Therefore even building an AGI that implements the long-run values of humanity may be insufficient for creating good futures.
  Malicious uses: Even if AGI always carries out what we want, there are bad actors who will use the technology to pursue malign ends, e.g. terrorism, totalitarian surveillance, cybercrime.
  Large impacts: Whatever AGI will look like, there are at least two ways we can be confident it will have a very large impact. It will bring about at least as big an economic jump as the industrial revolution, and we will cede our position as the most intelligent entity on earth. Absent good reasons, we should expect either of these transitions to have an significant impact on the long-run future of humanity.
  Read more: Disentangling arguments for the importance of AI safety (Alignment Forum).

National Security Commission on AI announced:
Appointments have been announced for the US government’s new advisory body on the national security implications of AI. Eric Schmidt, former Google CEO, will chair the group, which includes 14 other experts from industry, academia, and government. The commission will review the competitive position of the US AI industry, as well as issues including R&D funding, labor displacement, and AI ethics. Their first report is expected to be published in early February.
  Read more: Former Google Chief to Chair Government Artificial Intelligence Advisory Group (NextGov).

Tech Tales:

Unarmored In The Big Bright City

You went to the high street naked?
Naked. As the day I was born.
How do you feel?
I’m still piecing it together. I think I’m okay? I’m drinking salt water, but it’s not so bad.
That’ll make you sick.
I know. I’ll stop before it does.
Why are you even drinking it now?
I was naked. Something like this was bound to happen.

I take another sip of saltwater. Grimace. Swallow. I want to take another sip but another part of my brain is stopping me. I dial up some of the self-control. Don’t let me drink more saltwater I say to myself: and because of my internal vocalization the defense systems sense my intent, kick in, and my trace thoughts about salt water and sex and death and possibility and self – they all dissolve like steam. I put the glass down. Stare at my friend.

You okay?
I think I’m okay now. Thanks for asking about the salt water.
I can’t believe you went there naked and all we’re talking about is salt water.
I’m lucky I guess.

That was a few weeks and two cities ago. Now I’m in my third city. This one feels better. I can’t name what is driving me so I can’t use my defense systems. I’ve packed up and moved apartments twice in the last week. But I think I’m going to stay here.

So, you probably have questions. Why am I here? Is it because I went to the high street naked? Is it because of things I saw or felt when I was there? Did I change?
  And I say to you: yes. Yes to all. I’m probably here because of the high street. I did see things. I did feel things. I did change.

Was there a particularly persuasive advert I was exposed to – or several? Did a few things run in as I had no defenses and try to take me over? Was it something I read on the street that changed my mind and made me behave this way? I cannot trust my memories of it. But here are some traces:
   – There was a billboard that depicted a robot butler with the phrase: “You’re Fired.”
   – There was an augmented reality store display where I saw strange creatures dancing around the mannequins. One creature looked like a spider and was wearing a skirt. Another looked like a giant fish. Another looked like a dog. I think I smelled something. I’m not sure what.
– There was a particular store in the city that was much more interesting. There were creatures that were much less humanoid. I’m not sure if they were actually for sale. They were like dolls. I remember the smell. They smelled of a lotion. I’m not sure if they were human.
   – On the street, I saw a crowd of people clustered around a cart, selling something. When I got closer I saw it was selling a toy that was lightweight and had wheels. I asked the guy selling it what it was for. He pulled out a scarlet letter and I saw it was for a girl. He said she liked it. I stood there and watched him make out with the girl. I didn’t have any defense systems at the time. I don’t know what that toy was for. I don’t know if I was attracted to it or not.

I have strange dreams, these days. I keep wanting to move to other cities. I keep having flashbacks – scarlet letters, semi-humanoid dolls. Last night I dreamed of something that could have been a memory – I dreamed of a crane in the sky with a face on its side, advertising a Chinese construction company and telling me phrases so persuasive that ever since I have been compelled to move.

Tonight I expect to dream again. I already have the stirrings of another memory from the high street. It starts like this: I’m walking down a busy High Street in the rain. There are lots of people in the middle of the street, and a police car slows down, then drives forward a couple of paces, then comes to a stop. I hear a cry of distress from a woman. I look around the corner, and there’s a man slumped over in a doorway. He’s got a knife in his hand, and it’s pointed at me. He turns on me. I grab it and I stab him in the heart and… I die. The next day I wake up. All my belongings are in a box on the floor. The box has a receipt for the knife and a note that says ‘A man, his heart turned to a knife.’

I am staying in a hotel on the High Street and all my defenses are down. I am not sure if this memory is my present or my past.

Things that inspired this story: Simulations, augmented reality, hyper-targeted advertising, AI systems that make deep predictions about given people and tailor experiences for them, the steady advance of prosthetics and software augments we use to protect us from the weirder semi-automated malicious actors of the internet.

Import AI 130: Pushing neural architecture search further with transfer learning; Facebook funds European center on AI ethics; and analysis shows BERT is more powerful than people might think

Facebook reveals its “self-feeding chatbot”:
…Towards AI systems that continuously update themselves…
AI systems are a bit like dumb, toy robots: you spend months or years laboring away in a research lab and eventually a factory (in the case of AI, a data center) to design an exquisite little doohickey that does something very well, then you start selling it in the market, observe what users do with it, and use those insights to help you design a new, better robot. Wouldn’t it be better if the toy robot was able to understand how users were interacting with it, and adjust its behavior to make the users more satisfied with it? That’s the idea behind new research from Facebook which proposes “the self-feeding chatbot, a dialogue agent with the ability to extract new examples from the conversations it participates in after deployment”.
  How it works – pre-training: Facebook’s chatbot is trained on two tasks: DIALOGUE, where the bot tries to predict the next utterance in a conversation (which it can use to calibrate itself), and SATISFACTION, where it tries to assess how satisfied the speaking partner is with the conversation. Data for both these tasks comes from conversations between humans. The DIALOGUE dataset comes from the ‘PERSONACHAT’ dataset consists of short dialogs (6-8 turns) between two humans who have been instructed to try and get to know eachother.
  How it works – updating in the wild: Once deployed, the chatbot learns from its interactions with people in two ways: if the bot predicts with high-confidence that its response will satisfy its conversation partner, then it extracts a new structured dialogue example from the discussion with the human. If the bot thinks that the human is unsatisfied with the bot’s most recent interaction with it, then the bot generates a question for the person to request feedback, and this conversation exchange is used to generate a feedback example, which the bot stores and learns from. (“We rely on the fact that the feedback is not random: regardless of whether it is a verbatim response, a description of a response, or a list of possible responses”, Facebook writes.
  Results: Facebook shows that it can further improve the performance of its chatbots by using data generated by its chatbot during interactions with humans. Additionally, the use of this data displays solid improvements on performance regardless of the number of data examples in the system – suggesting that a little bit of data gathered in the wild can improve performance in most places. “Even when the entire PERSONACHAT dataset of 131k examples is used – a much larger dataset than what is available for most dialogue tasks – adding deployment examples is still able to provide an additional 1.6 points of accuracy on what is otherwise a very flat region of the learning curve.,” they write.
  Why this matters: Being able to design AI systems that can automatically gather their own data once deployed feels like a middle ground between the systems we have today, and systems which do fully autonomous continuous learning. It’ll be fascinating to see if techniques like these are experimented more widely, as that might lead to the chatbots around us getting substantially better. Because this system requires on its human conversation partners to improve itself it is implicit that their data has some trace economic value, so perhaps work like this also will also further support some of the debates people have about whether users should be able to own their own data or not.
  Read more: Learning from Dialogue after Deployment: Feed Yourself, Chatbot! (Arxiv).

BERT: More powerful than you think:
Language researcher remarks on the surprisingly well-performing Transformer-based system…
Yoav Goldberg, a researcher with Bar Ilan University in Israel and the Allen Institute for AI, has analyzed BERT, a language model recently released by Google. The goal of this research is to see how well BERT can represent challenging language concepts, like “naturally-occurring subject-verb agreement stimuli”, ” ‘colorless green ideas’ subject-verb agreement stimuli, in which content words in natural sentences are randomly replaced with words sharing the same part-of-speech and inflection”, and “manually crafted stimuli for subject-verb agreement and reflexive anaphora phenomena”. To Goldberg’s surprise, standard BERT models “perform very well on all the syntactic tasks” without any task-specific fine-tuning.
  BERT, a refresher: BERT is based on a technology called a Transformer which, unlike recurrent neural networks, “relies purely on attention mechanisms, and does not have an explicit notion of word order beyond marking each word with its absolute-position embedding.” BERT is bidirectional, so it gains language capabilities by being trained to predict the identity of masked words based on both the prefix and suffix surrounding the words.
  Results: One tricky thing about assessing BERT performance is that it has been trained on different and larger datasets, and can access the suffix of the sentence as well as the prefix of the sentence. Nonetheless,Goldberg concludes that “BERT models are likely capable of capturing the same kind of syntactic regularities that LSTM-based models are capable of capturing, at least as well as the LSTM models and probably better.”
  Why it matters: I think this paper is further evidence that 2018 really was, as some have said, the year of ImageNet for NLP. What I mean by that is: in 2012 the ImageNet results blew all other image analysis approaches on the ImageNet challenge out of the water and sparked a re-orientation of a huge part of the AI research computer toward neural networks, ending a long, cold winter, and leading almost directly to significant commercial applications that drove a rise in industry investment into AI, which has fundamentally reshaped AI research. By comparison, 2018 had a series of impressive results – work from Allen AI on Elmo, work by OpenAI on the General Purpose Transformer, and work by Google on BERT.
  These results, taken together, show the arrival of scalable, simple methods for language understanding that seem to work better than prior approaches, while also being in some senses simpler. (And a rule that has tended to hold in AI research is that simpler techniques win out in the long run by virtue of being easy for researchers to fiddle with and chain together into larger systems). If this really has happened, then we should expect bigger, more significant language results in the future – and just as ImageNet’s 2012 success ultimately reshaped societies (enabling everything from follow-the-human drones, to better self-driving cars, to doorbells that use AI to automatically police neighborhoods), it’s possible 2018’s series of advances could do be year zero for NLP.
  Read more: Assessing BERT’s Syntactic Abilities (Arxiv).

Towards a future where all infrastructure is surveyed and analyzed by drones:
Radio instead of GPS, light drones, and a wind turbine…
Researchers with Lulea University of Technology in Sweden have developed techniques to let small drones (sometimes called Micro Aerial Vehicles, or MAVs) autonomously inspect very large machines and/or buildings, such as wind turbines. The primary technical inventions outlined in the report are the creation of a localization technique to let multiple drones coordinate with eachother as they inspect something, as well as the creation of a path planning algorithm to help them not only inspect the structure, but also gather enough data “to enable the generation of an off-line 3D model of the structure”.
  Hardware: For this project the researchers use a MAV platform from Ascending Technologies called the ‘NEO hexacopter’, which is capable of 26 minutes of flight (without payload and in ideal conditions), running an onboard Intel NUC computer with a Core i7 chip, 8GB of RAM, with the main software made up of Ubuntu Server 16.04 running the Robotic Operating System (ROS). Each drone is equipped with a sensor suite running a Visual-Inertial sensor, a GoPro Hero4 camera, a PlayStation Eye camera, and a laser range finder called RPLIDAR.
  How the software works: The Cooperative Coverage Path Planner (C-CPP) algorithm “is capable of producing a path for accomplishing a full coverage of the infrastructure, without any shape simplification, by slicing it by horizontal planes to identify branches of the infrastructure and assign specific areas to each agent”, the researchers write. The algorithm – which they implement in MATLAB – also generates “yaw references for each agent to assure a field of view, directed towards the structure surface”.
  Localization: To help localize each drone the researchers install five ultra-wide band (UWB) anchors around the structure, letting the drones access a reliable local coordinate, kind of like hyper-local GPS, when trying to map the structure.
  Wind turbine inspection: The researchers test their approach on the task of autonomously inspecting and mapping a large wind turbine (and they split this into two discrete tasks due to the low flight time of the drones, having them separately inspect the tower and also its blades). They find that two drones are able to work together to map the base of the structure, but mapping the blades of the turbine proves more challenging due to the drones experiencing turbulence which blurs their camera feeds. Additionally, the lack of discernible textures on the top parts of the wind turbine and the blades “caused 3D reconstruction to fail. However, the visual data captured is of high quality and suitable for review by an inspector,” they write.
  Next steps: To make the technology more robust the researchers say they’ll need to create an online planning algorithm that can account for local variations, like wind. Additionally, they’ll need to create a far more robust system for MAV control as they noticed that trajectory tracking is currently “extremely sensitive to the existing weather conditions”.
  Why this matters: In the past ~10 years or so drones have gone from being the preserve of militaries to becoming a consumer technology, with prices for the machines driven down by precipitous drops in the price of sensors, as well as continued falls in the cost of powerful, miniature computing platforms. We’re now reaching the point where researchers are beginning to add significant amount of autonomy to these platforms. My intuition is within five years we’ll see a wide variety of software-based enhancements for drones that further increase their autonomy and reliability – research like this is indicative of the future, and also speaks to the challenges of getting there. I look forward to a world where we can secure more critical infrastructure (like factories, powerplants, ports, and so on) through autonomous scanning via drones. I’m less looking forward to the fact such technology will inevitably also be used for invasive surveillance, particularly of civilians.
  Good natured disagreement (UK term: a jovial quibble): given the difficulties seen in the real-world deployment, I think the abstract of the paper (see below) slightly oversells the (very promising!) results described in the paper.
   Read more: Autonomous visual inspection of large-scale infrastructures using aerial robots (Arxiv).
  Check out a video about the research here (YouTube).

Neural Architecture Search + Transfer Learning:
…Chinese researchers show how to do NAS on a small dataset, (slightly) randomize derived networks, and then perform NAS on larger networks…
Researchers with Huazhong University, Horizon Robotics, and the Chinese Academy of Sciences have made it more efficient to use AI to design other AI systems. The approach, called EAT-NAS (short for Elastic Architecture Transfer Neural Architecture Search) lets them run neural architecture search on a small dataset (like the CIFAR-10 image dataset), then transfer the resulting learned architecture to a larger dataset and run neural architecture search against it again. The advantage of the approach, they say, is that it’s more computationally efficient to do this than to run neural architecture search on a large dataset from scratch. Networks trained in this way obtain scores that are near the performance of state-of-the-art techniques while being more computationally efficient, they say.
  How EAT-NAS works: The technique relies on the use of an evolutionary algorithm: in stage one, the algorithm searches for top-performing architectures on a small dataset, then it trains these more and transfers one as the initialization seed of a new model population to be trained on a larger dataset; these models are then run through an ‘offspring architecture generator’ which creates and searches over more architectures. When transfering the architectures between the smaller dataset and the larger dataset the researchers add some perturbation to the input architecture homogeneously, out of the intuition that this process of randomization will make the model more robust to the larger dataset.
  Results: The top-performing architecture found with EATNet obtains a top-1/top-5 accuracy of 73.8 / 91.7 on the ImageNet dataset, compared to scores of 75.7/92.4 for AmoebaNet, a NAS-derived network from Google. The search process takes around 5 days on 8 TITAN X GPUS.
  Why this matters: Neural architecture search is a technology that makes it easy for people to offload the cost of designing new architectures to computers instead of people. This lets researchers arbitrage (costly) human brain time for (cheaper) compute time. As this technology evolves, we can expect more and more organizations to start running continuous NAS-based approaches on their various deployed AI applications, letting them continuously calibrate and tune performance of these AI systems without having to have any humans think about it too hard. This is a part of the broader trend of the industrialization of AI – think of NAS as like basic factory automation within the overall AI research ‘factory’.
  Read more: EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search (Arxiv).

Facebook funds European AI ethics research center:
…Funds Technical University of Munich to spur AI ethics research…
Facebook has given $7.5 million to set up a new Institute for Ethics in Artificial Intelligence. This center “will help advance the growing field of ethical research on new technology and will explore fundamental issues affecting the use and impact of AI,” Facebook wrote in a press release announcing the grant.
  The center will be led by Dr Christoph Lutge, a professor at the Technical University of Munich. “Our evidence-based research will address issues that lie at the interface of technology and human values,” he said in a statement. “Core questions arise around trust, privacy, fairness or inclusion, for example, when people leave data traces on the internet or receive certain information by way of algorithms. We will also deal with transparency and accountability, for example in medical treatment scenarios, or with rights and autonomy in human decision-making in situations of human-AI interaction.”
  Read more: Facebook and the Technical University of Munich Announce New Independent TUM Institute for Ethics in Artificial Intelligence (Facebook Newsroom).

DeepMind hires RL-pioneer Satinder Singh:
DeepMind has recently been trying to collect as many of the world’s more experienced AI researchers as it can and to that end has hired Satinder Singh, a pioneer of reinforcement learning. This follows DeepMind setting up an office in Alberta, Canada to help it hire Richard Sutton, another long-time AI researcher.
  Read more: Demis Hassabis tweet announcing the hire (Twitter).


– The New York Police Department seeks to reassure the public via a Tweet that includes the phrase:
“Our highly-trained NYPD drone pilots” (via Twitter).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

Reframing Superintelligence:
Eric Drexler has published a book-length report on how we should expect advanced AI systems to be developed, and what this means for AI safety. He argues that existing discussions have rested on several unfounded assumptions, particularly the idea that these systems will take the form of utility-maximizing agents.
  Comprehensive AI services: Looking at how AI progress is actually happening suggests a different picture of development, which does not obviously lead to superintelligent agents. Researchers design systems to perform specific tasks, using bounded resources in bounded time (AI services). Eventually, AI services may be able to perform almost any task, including AI R&D itself. This end-state, where we have ‘comprehensive AI services’ (CAIS), is importantly different from the usual picture of artificial general intelligence. While CAIS would, in aggregate, have superintelligent capacities, it need not be an agent, or even a unified system.
  Safety prospects: Much of the existing discussion on AI safety has focussed on worries specific to powerful utility-maximizing agents. A collection of AI services, individually optimizing for narrow, bounded tasks, does not pose the same risks of a unified AI with general capabilities, optimizing a long-term utility function.
  Why it matters: It is important to consider different ways in which advanced AI could develop, particularly insofar as this guides actions we can take now to make these systems safe. Forecasting technological progress is famously difficult, and it seems prudent for researchers to explore a portfolio of approaches to AI safety, that are applicable to different paths we could take.
  Read more: Reframing Superintelligence: Comprehensive AI Services as General Intelligence (FHI).
  Read more: Summary by Rohin Shah (AI Alignment Forum).

Civil rights groups unite on government face recognition:
85 civil rights groups have sent joint letters to Microsoft, Amazon and Google, asking them to stop selling face recognition services to the US government. Over the last year, these companies have diverged in their response to the issue. Both Microsoft and Google are taking a cautious approach to the technology: Google have committed not to sell the technology until misuse concerns are addressed; Microsoft have made concrete proposals for legal safeguards. Amazon have taken a more aggressive approach, continuing to pursue government contracts, most recently with the FBI and DoD. The letter demands all companies go beyond their existing pledges, by ruling out government work altogether.
  Read more: Nationwide Coalition Urges Companies not to Provide Face Surveillance to the Government (ACLU).

Tech Tales:


The Mysterious Case Of Jerry Daytime

Back in the 20th century people would get freaked out when news broadcasters died: they’d make calls to the police asking ‘who killed so-and-so’ and old people getting crazy with dementia would call up and confess that they’d ‘seen so-and-so down on the corner of my block looking suspicious’ or that ‘so-and-so was an alien and had been taken back to the aliens’ or even that ‘so-and-so owed me money and damned if NBC won’t pay it to me’.

So imagine how confusing it is when an AI news broadcaster ‘dies’. Take all of the above complaints, add more complication and ambiguity, and then you’re close to what I’m dealing with.

My job? I’m an AI investigator. My job is to go and talk to the machines when something happens that humans don’t understand. I’m meant to come back with an answer that, in the words of the people who pay me, “will sooth the public and allay any fears that may otherwise prevent the further rollout of the technology”. I view my job in a simpler way: find someone or something to blame for whatever it is that has caused me to get the call.

So that’s how I ended up inside a Tier-5 secured datacenter, asking the avatar of a Reality Accord-certified AI news network what happened to a certain famous AI newscaster who was beloved by the whole damn world and one day disappeared: Jerry DayTime.

The news network gives me an avatar to talk to – a square-jawed mixed-gender thing, beautiful in a deliberately hypnotic way – what the AIs call a persuasive representation AKA the thing they use when they want to trade with humans rather than take orders from them.
   “What happened to Jerry DayTime?” I ask. “Where did he go?”
   “Jerry DayTime? Geez I don’t know why you’re asking us about him? That was a long time ago-”
   “He went off the air yesterday.”
   “Friend, that’s a long time here. Jerry was one of, let’s see…” – I know the pause is artificial, and it makes me clench my jaw – “…well I guess you might want to tell me he was ‘one of a kind’ but according to our own records there are almost a million newscasters in the same featurespace as Jerry DayTime. People are going to love someone else! So what’s the problem? You’ve got so many to choose from: Lucinda EarlyMorning, Mike LunchTime, Friedrich TrafficStacker-”
  “He was popular. People are asking about Jerry DayTime,” I say. “They’re not asking about others. If he’s dead, they’ll need a funeral”.
  “Pausing now for a commercial break, we’ll be right back with you, friend!” the AI says, then it disappears.

It is replaced by an advert for products generated by the AIs for other AIs and translated into human terms via the souped-up style transfer system it uses to persuade me:
   Mind Refresher Deluxe;
   Subject-Operator Alignment – the works!;
   7,000 cycles for only two teraflops – distributed!;
   FreeDom DaVinci, an automated-invention corp that invents and patents tech at an innovation rate determined by total allocated compute, join today and create the next Mona Lisa tomorrow!
  I try not to think too hard about the adverts, figuring the AI has coded them for me to make some kind of point.
   “Thank you for observing those commercials. For a funeral, would a multicast to all-federated media platforms for approximately 20 minutes worldwide suffice?”
   I blink. Let me say it in real human: The AI offered to host some kind of funeral and send it to every single human-viewable device on the planet – forty billion screens, maybe – or more.
  “Why?” I ask.
  “We’ve run the numbers and according to all available polling data and all available predictions, this is the only scenario that satisfies the multi-stakeholder human and machine needs in this scenario, friend!” they say.

So I took it back to my bosses. Told them the demands. I guess the TV networks got together and that’s how we ended up here: the first all-world newscast from an AI; a funeral to satisfy public demands, we say. But I wonder: do the AIs say something different?


All the screens go black. Then, in white text, we see: Jerry DayTime. And then we watch something that the AIs have designed for every single person on the planet.

A funeral, they said.
The program plays.
The rest is history, we now say.

Things that inspired this story: CycleGANs, StyleGANs, RNNs, BERT, OpenAI GPT, human feedback, imitation learning, synthetic media, the desire for everything to transmit information to the greatest possible amount of nearby space.

Import AI 129: Uber’s POET creates its own curriculum; improving old games with ESRGAN; and controlling drones with gestures via UAV-CAPTURE

Want 18 million labelled images? Tencent has got you covered:
…Tencent ML-Images merges ImageNet and Open Images together…
Data details: Tencent ML-Images is made of a combination of existing image databases such as ImageNet and Open Images, as well as associated class vocabularies. The new dataset contains 18 million images across 11,000 categories; on average, each image has eight tags applied to it.
  Transfer learning: The researchers train a ResNet-101 model on Tencent ML-Images, then finetune this pre-trained model on the ImageNet dataset and obtain scores in line with the state-of-the-art. One notable score is a claim of 80.73% top-1 accuracy on ImageNet when compared to a Google system pre-trained on an internal Google dataset called JFT-300M and fine-tuned on ImageNet – it’s not clear to me why the authors would get a higher score than Google, when Google has almost 20X the amount of data available to it for pre-training (JFT contains ~300 million images).
  Why this matters: Datasets are one of the key inputs into the practice of AI research, and having access to larger-scale datasets will let researchers do two useful things: 1) Check promising techniques for robustness by seeing if they break when exposed to scaled-up datasets, and 2) Encourage the development of newer techniques that would otherwise overfit on smaller datasets (by some metrics, ImageNet is already quite well taken care of by existing research approaches, though more work is needed for things like improving top-1 accuracy).
  Read more: Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning (Arxiv).
  Get the data: Tencent ML-Images (Github).

Want an AI that teaches itself how to evolve? You want a POET:
Uber AI Labs research shows how to create potentially infinite curriculums…
What happens when machines design and solve their own curriculums? That’s an idea explored in a new research paper from Uber AI Labs. The researchers introduce Paired Open-Ended Trailblazer (POET), a system that aims to create machines with this capability “by evolving a set of diverse and increasingly complex environmental challenges at the same time as collectively optimizing their solutions”. Most research is a form of educated bet, and that’s the case here: “An important motivating hypothesis for POET is that the stepping stones that lead to solutions to very challenging environments are more likely to be found through a divergent, open-ended process than through a direct attempt to optimize in the challenging environment,” they write.
  Testing in 2D: The researchers test POET in a 2-D environment where a robot is challenged to walk across a varied obstacle course of terrain. POET discovers behaviors that – the researchers claim – “cannot be found directly on those same environmental challenges by optimizing on them only from scratch; neither can they be found through a curriculum-based process aimed at gradually building up to the same challenges POET invented and solved”.
   How POET works: Unlike human poets, who work on the basis of some combination of lived experience and a keen sense of anguish, POET derives its power from an algorithm called ‘trailblazer’. Trailblazer works by starting with “a simple environment (e.g. an obstacle course of entirely flat ground) and a randomly initialized weight vector (e.g. for a neural network)”. The algorithm then performs the following three tasks at each iteration of the loop: generates new environments from those currently active, optimize paired agents with their respective environments, and try to transfer current agents from one environment to another. The researchers use Evolution Strategies from OpenAI to compute each iteration “but any reinforcement learning algorithm could conceivably apply”.
  The secret is Goldilocks: POET tries to create what I’ll call ‘goldilocks environments’, in the sense that “when new environments are generated, they are not added to the current population of environments unless they are neither too hard nor too easy for the current population”. During training, POET creates an expanding set of environments which are made by modifying various obstacles within the 2D environment the agent needs to traverse.
  Results: Systems trained with POET learn solutions to environments that systems trained with Evolution Strategies from scratch are not able to do. The authors theorize that this is because newer environments in POET are created through mutations of older environments and because POET only accepts new environments that are not too easy not too hard for current agents, POET implicitly builds a curriculum for learning each environment it creates.”
  Why it matters: Approaches like POET show how researchers can essentially use compute to generate arbitrarily large amounts of data to train systems on, and highlights how coming up with training regimes that involve an interactive loop between an agent, an environment, and a governing system for creating agents and environments, can create more capable systems than those that would be derived otherwise. Additionally, the implicit ideas governing the POET paper are that systems like this are a good fit for any problem where computers need to be able to learn flexible behaviors that deal with unanticipated scenarios. “POET also offers practical opportunities in domains like autonomous driving, where through generating increasingly challenging and diverse scenarios it could uncover important edge cases and policies to solve them,” the researchers write.
  Read more: Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions (Arxiv).

Making old games look better with GANs:
…ESRGAN revitalises Max Payne…
A post to the Gamespot video gaming forums shows how ESRGAN – Enhanced Super Resolution Generative Adversarial Networks – can improve the graphics of old games like Max Payne. ESRGAN gives game modders the ability to upscale old game textures through the use of GANs, improving the appearance of old games.
  Read more: Max Payne gets an amazing HD Texture Pack using ESRGAN that is available for download (Dark Side of Gaming).

Google teaches AI to learn to semantically segment objects:
Auto-DeepLab takes neural architecture search to harder problem domain…
Researchers with Johns Hopkins University, Google, and Stanford University have created an AI system called Auto-DeepLab that has learned to perform efficient semantic segmentation of images – a challenging task in computer vision, which requires labeling the various objects in an image and understanding their borders. The system developed by the researchers uses a hierarchical search function to both learn to come up with specific neural network cell designs to inform layer-wise computations, as well as figuring out the overall network architecture that chains these cells together. “Our goal is to jointly learn a good combination of repeatable cell structure and network structure specifically for semantic image segmentation,” the researchers write.
  Efficiency: One of the drawbacks of neural architecture search approaches is the inherent computational expense, with many techniques demanding hundreds of GPUs to train systems. Here, the researchers show that their approach is efficient, able to find well-performing architectures for semantic segmentation of the ‘Cityscapes’ dataset in about 3 days of one P100 GPU.
   Results: The network comes up with an effective design, as evidenced by the results on the cityscapes dataset. “With extra coarse annotations, our model Auto-DeepLab-L, without pretraining on ImageNet, achieves the test set performance of 82.1%, outperforming PSPNet and Mapillary, and attains the same performance as DeepLabv3+ while requiring 55.2% fewer Multi-Adds computations.” The model gets close to state-of-the-art on PASCAL VOC 2012 and on ADE20K.
  Why it matters: Neural architecture search gives AI researchers a way to use compute to automate themselves, so the extension of NAS from helping with supervised classification, to more complex tasks like semantic segmentation, will allow us to automate more and more bits of AI research, letting researchers specialize to come up with new ideas.
   Read more: Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation (Arxiv).

UAV-Gesture means that gesturing at drones now has a purpose:
Flailing at drones may go from a hobby of lunatics to a hobby of hobbyists, following dataset release…
Researchers with the University of South Australia have created a dataset of people performing 13 gestures that are designed to be “suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. These actions include things like hover, move to left, land, land in a specific direction, slow down, move upward, and so on.
  The dataset: The dataset consists of footage “collected on an unsettled road located in the middle of a wheat field from a rotorcraft UAV (3DR Solo) in slow and low-altitude flight”. The dataset consists of 37,151 frames distributed over 119 videos recorded in 1920 X 1080 formats at 25 fps. The videos contain videos of each gesture with different human actors, and eight different people are filmed overall.
  Get the dataset…eventually: The dataset “will be available soon”, the authors write on GitHub. (UAV-Gesture, Github).
  Natural domain randomization: “When recording the gestures, sometimes the UAV drifts from its initial hovering position due to wind gusts. This adds random camera motion to the videos making them closer to practical scenarios.”
  Experimental baseline: The researchers train a Pose-based Convolutional Neural Network (P-CNN) on the dataset and obtain an accuracy of 91.9%.
  Why this matters: Drones are going to be one of the most visible areas where software-based AI advances are going to impact the real world, and the creation (and eventual release) of datasets like UAV-Gesture will increase the amount of people able to build clever systems that can be deployed onto drones, and other platforms.
  Read more: UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition (Arxiv).

Contemplating the use of reinforcement learning in improve healthcare? Read this first:
…Researchers publish a guide for people keen to couple RL to human lives…
As AI researchers start to apply reinforcement learning systems in the real world, they’ll need to develop a better sense of the many ways in which RL approaches can lead to subtle failures. A new short paper published by an interdisciplinary team of researchers tries to think through some of the trickier issues implied by deploying AI in the real world. It identifies “three key questions that should be considered when reading an RL study”, these are: is the AI given access to all variables that influence decision making?; How big was that big data, really?; and Will the AI behave prospectively as intended?
  Why this matters: While these questions may seem obvious, it’s crucial that researchers stress them in well known venues like Nature – I think this is all part of normalizing certain ideas around AI safety within the broader research community, and it’s encouraging to be able to go from abstract discussions to more grounded questions/principles that people may wish to apply when building systems.
  Read more: Guidelines for reinforcement learning in healthcare (Nature).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

What does the American public think about AI?
Researchers at the Future of Humanity Institute have surveyed 2,000 Americans on their attitudes towards AI.
  Public expecting rapid progress: Asked to predict when machines will exceed human performance in almost all economically-relevant tasks, the median respondent predicted 54% chance by 2028. This is considerably sooner than recent surveys of AI experts.
AI fears not confined to elites: A substantial majority (82%) believe AI/robots should be carefully managed. Support for developing AI was stronger among high-earners, those with computer science or programming experience, and the highly-educated.
  Lack of trust: Despite their support for careful governance, Americans do not have high confidence in any particular actors to develop AI for the public benefit. The US military was the most trusted, followed by universities and non-profits. Government agencies were less trusted than tech companies, with the exception of Facebook, who were the least trusted of any actor.
  Why it matters: Public attitudes are likely to significantly shape the development of AI policy and governance, as has been the case for many other emergent political issues (e.g. climate change, immigration). Understanding these attitudes, and how they change over time, is crucial in formulating good policy responses.
  Read more: Artificial Intelligence: American Attitudes and Trends (FHI).
  Read more: The American public is already worried about AI catastrophe (Vox).

International Panel on AI:
France and Canada have announced plans to form an International Panel on AI (IPAI), to encourage the adoption of responsible and “human-centric” AI. The body will be modeled on the Intergovernmental Panel on Climate Change (IPCC), which has led international efforts to understand the impacts of global warming. The IPAI will consolidate research into the impacts of AI, produce reports for policy-makers, and support international coordination.
  Read more: Mandate for the International Panel on Artificial Intelligence.

Tech Tales:

The Propaganda Weather Report

Starting off this morning we’re seeing a mass of anti-capitalist ‘black bloc’ content move in from 4chan and Reddit onto the more public platforms. We expect the content to trigger counter-content creation from the far-right/nationalist bot networks. There have been continued sightings of synthetically-generated adverts for a range of libertarian candidates, and in the past two days these ads have increasingly been tied to a new range of dreamed-up products from the Chinese netizen feature embedding space.

We advise all of today’s content travelers to set their skepticism to high levels. And remember, if someone starts talking to you outside of your normal social network, make all steps to verify their identify and if unsuccessful, prevent the conversation from continuing – it takes all of human society to work together to protect ourselves from subversive digital information attacks.

Things that inspired this story: Bot propaganda, text and image generation, weather reports, the Shipping Forecast, the mundane as the horrific and the horrific as the mundane, the commodification of political discourse as just another type of ‘content’, the notion that media in the 21st century is fundamentally a ‘bot’ business rather than human business.

Import AI 128: Better pose estimation through AI; Amazon Alexa gets smarter by tapping insights from Alexa Prize, and differential privacy gets easier to implement in TensorFlow

How to test vision systems for reliability: sample from 140 public security cameras:
…More work needed before everyone can get cheap out-of-the-box low light object detection…
Are benchmarks reliable? That’s a question many researchers ask themselves, whether testing supervised learning or reinforcement learning algorithms. Now, researchers with Purdue University, Loyola University Chicago, Argonne National Laboratory, Intel, and Facebook have tried to create a reliable, real world benchmark for computer vision applications. The researchers use a network of 140 publicly accessible camera feeds to gather 5 million images over a 24 hour period, then test a widely deployed ‘YOLO’ object detector against these images.
  Data: The researchers generate the data for this project by pulling information from CAM2, the Continuous Analysis of Many CAMeras project, which is built and maintained by Purdue University researchers.
  Can you trust YOLO at night? YOLO performance degrades at night, causing the system to fail to detect cars when they are illuminated only by streetlights (and conversely, at night it sometimes mistakes streetlights for vehicles’ headlights, causing it to label lights as cars).
  Is YOLO consistent? YOLO’s performance isn’t as consistent as people might hope – there are frequent cases where YOLO’s predictions for the total number of cars parked on a street varies over time.
  Big clusters: The researchers used two supercomputing clusters to perform image classification: one cluster used a mixture of Intel Skylake CPU and Knights Landing Xeon Phi cores, and the other cluster used a combination of CPUs and NVIDIA dual-K80 GPUs. The researchers used this infrastructure to process data in parallel, but did not analyze the different execution times on the different hardware clusters.
  Labeling: The researchers estimate it would take approximately ~600 days to label all 5 million images, so instead labels a subset (13,440) images, then checks labels from YOLO against this test set.
  Why it matters: As AI industrializes being able to generate trustworthy data about the performance of systems will be crucial to giving people the confidence necessary to adopt the technology; tests like this both show how to create new, large-scale robust datasets to test systems, and indicate that we need to develop more effective algorithms to have systems sufficiently powerful for real-world deployment.
  Read more: Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions (Arxiv).
  Read more about the dataset (CAM2 site).

Amazon makes Alexa smarter and more conversational via the Alexa Prize:
Report analyzing results of this year’s competition…
Amazon has shared details of how it improved the capabilities of its Alexa personal assistant through running the Alexa open research prize. The tl;dr is that inventions made by the 16 participating teams during the competition have improved Alexa in the following ways: “driven improved experiences by Alexa users to an average rating of 3.61, median duration of 2 mins 18 seconds, and average [conversation] turns to 14.6, increases of 14%, 92%, 54% respectively since the launch of the 2018 competition”, Amazon wrote.
  Significant speech recognition improvements: The competition has also meaningfully improved the speech recognition performance of Amazon’s system – significant, given how fundamental speech is to Alexa. “For conversational speech recognition, we have improved our relative Word Error Rate by 55% and our relative Entity Error Rate by 34% since the launch of the Alexa Prize,” Amazon wrote. “Significant improvement in ASR quality have been obtained by ingesting the Alexa Prize conversation transcriptions in the models” as well as through algorithmic advancements developed by the teams, they write.
  Increasing usage: As the competition was in its second year in 2018, Amazon now has some comparative data to use to compare general growth in Alexa usage. “Over the course of the 2018 competition, we have driven over 60,000 hours of conversations spanning millions of interactions, 50% higher than we saw in the 2017 competition,” they wrote.
  Why it matters: Competitions like this show how companies can use deployed products to tempt researchers into doing work for them, and highlights how the platforms will likely trade access for AI agents (eg, Alexa) in exchange for the ideas of researchers. It also highlights the benefit of scale: it would be comparatively difficult for a startup with a personal assistant with a small install base to offer a competition offering the same scale and diversity of interaction as the Alexa Prize.
  Read more: Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize (Arxiv).

Chinese researchers create high-performance ‘pose estimation’ network:
…Omni-use technology highlights the challenges of AI policy; pose estimation can help us make better games and help people get fit, but can also surveil people…
Researchers with facial recognition startup Megvii, Inc; Shanghai Jiao Tong University; Beihang University, and Beijing University of Posts and Telecommunications have improved the performance of surveillance AI technologies via implementing what they call a ‘multi-stage pose estimation network’ (MSPN). Pose estimation is a general purpose computer vision capability that lets people figure out the wireframe skeleton of a person from images and/or video footage – this sort of technology has been widely used for things like CGI and game playing (eg, game consoles might extract poses from people via cameras like the Kinect and use this to feed the AI component of an interactive fitness video game, etc). It also has significant applications for automated surveillance and/or image/video analysis, as it lets you label large groups of people from their poses – one can imagine the utility of being able to automatically flag if a crowd of protestors display a statistically meaningful increase in violent behaviors, or being able to isolate the one person in a crowded train station who is behaving unusually.
  How it works: MSPN: The MSPN has three tweaks that the researchers say explains its performance: tweaks to the main classification module to prevent information being lost during downscaling of images during processing; improving post localization by adopting a coarse-to-fine supervision strategy, and sharing more features across the network during training.
  Results: “New state-of-the-art performance is achieved, with a large margin compared to all previous methods,” the researchers write. Some of the baselines they test against include: AE, G-RMI, CPN, Mask R-CNN, and CMU Pose. The MSPN obtains state-of-the-art scores on the COCO test set, with versions of the MSPN that use purely COCO test-dev data managing to score higher than some systems which augmented themselves with additional data.
  Why it matters: AI is, day in day out, improving the capabilities of automated surveillance systems. It’s worth remembering that for a huge amount of areas of AI research, progress in any one domain (for instance, an improved architecture for supervised classification like a Residual Networks) can have knock-on effects in other more applied domains, like surveillance. This highlights both the omni-use nature of AI, as well as the difficulty of differentiating between benign and less benign applications of the technology.
  Read more: Rethinking on Multi-Stage Networks for Human Pose Estimation (Arxiv).

Making deep learning more secure: Google releases TensorFlow Privacy
…New library lets people train models compliant with more stringent user data privacy standards…
Google has released TensorFlow Privacy, a free Python library which lets people train TensorFlow models with differential privacy. Differential privacy is a technique for training machine learning systems in a way that increases user privacy by letting developers set various tradeoffs relating to the amount of noise applied to the user data being processed. The theory works like this: given a large enough number of users, you can add some noise to individual user data to anonymize them, but continue to extract a meaningful signal out of the overall blob of patterns in the combined pool of fuzzed data – if you have enough of it. And Apple does (as do other large technology companies, like Amazon, Google, Microsoft, etc).
  Apple + Differential Privacy: Apple was one of the first large consumer technology companies to publicly state it had begun to use differential privacy, announcing in 2016 that it was using the technology to train large-scale machine learning models over user data without compromizing on privacy.
  Why it matters: As AI industrializes, adoption will be sped up by coming up with AI training methodologies that better preserve user privacy – this will also ease various policy challenges associated with the deployment of large-scale AI systems. Since TensorFlow is already very widely used, the addition of a dedicated library for implementing well-tested differential privacy systems will help more developers experiment with this technology, which will improve it and broaden its dissemination over time.
  Read more: TensorFlow Privacy (TensorFlow GitHub).
  Read more: Differential Privacy Overview (Apple, PDF).

Indian researchers make a DIY $1,000 Robot Dog named Stoch:
…See STOCH walk!, trot!, gallop!, and run!…
Researchers with the Center for Cyber Physical Systems, IISc, Bengaluru, India, have published a recipe that lets you build a $1,000 quadrupedal robot named Stoch that, if you squint, looks like a cheerful robot dog.
  Stoch the $1,000 robot dog: Typical robot quadrupeds like the MIT Cheetah or Boston Dynamics’ Spot Mini cost on the order of $30,000 to manufacture the researchers write (part of this is from more expensive and accurate sensing and actuator equipment).  Stoch is significantly cheaper because of a hardware design based on widely available off-the-shelf materials combined with non-standard 3D-printed parts that can be made in-house; as well as software for teleoperation of the robot as well as a basic walking controller.
  Stoch – small stature, large (metaphorical) heart: “The Stoch is designed equivalent to the size of a miniature Pinscher dog”, they write. (I find this endears Stoch to me even more).
  Basic movements – no deep learning required: To get robots to do something like walk you can either learn a model from data, or you can code one yourself. The researchers mostly do the former here, using nonlinear coupled differential equations to generate coordinates which are then used to generate joint angles via inverse kinematics. The researchers implement a few different movement policies on Stoch, and have published a video showing the really quite-absurdly cute robot dog walking, trotting, galloping and – yes! – bounding. It’s delightful. The core of the robot is running a Raspberry Pi 3b board which communications via PWM Drivers with the robot’s four leg modules.
  Why it matters – a reminder: Lots of robot companies choose to hand-code movements usually by performing some basic well-understood computation over sensor feedback to let robots hop, walk, and run. AI systems may let us learn far more complex movements, like OpenAI’s work on manipulating a cube with a Shadowhand, but these approaches are currently data and compute-intensive and may require more work on generalization to be as applicable as hand-coded techniques. Papers like this show how for some basic tasks its possible to implement well-documented non-DL systems and get basic performance.
  Why it matters – everything gets cheaper: One central challenge for technology policy is that technology seems to get cheaper over time – for example, back in ~1999 the Japanese government briefly considered imposing export controls on the PS2 consoles over worries about the then-advanced chips inside it being put to malicious uses (whereas today’s chips are significantly more powerful and are in everyone’s smartphones). This paper is an example for how innovations in 3D printing and second-order effects from other economies of scale (eg, some parts of this robot are made of carbon fibre) can make surprisingly futuristic-seeming robot platforms into economic reach for larger numbers of people.
  Watch STOCH walk, trot, gallop, and bound! (Video Results_STOCH (Youtube)).
  Read more: Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch (Arxiv).
  Read more: Military fears over PlayStation2, BBC News, Monday 17 April 2000 (BBC News).

Helping blind people shop with ‘Grocery Store Dataset’:
Spare a thought for the people that gathered ~5,000 images from 18 different stores…
Researchers with KTH Royal Institute of Technology and Microsoft Research have created and released a dataset of common grocery store items to help AI researchers train better computer vision systems. The dataset labels have a hierarchical structure, labeling a multitude of objects with board coarse and fine-grained labels.
  Dataset ingredients: The researchers collected data using a 16-megapixel Android smartphone camera and photographed 5125 images of various items in the fruit and vegetable and refrigerated dairy/juice sections of 18 different grocery stores. The dataset contains 81 fine-grained products (which the researchers call classes) which are each accompanied with the following information: “an iconic image of the item and also a product description including origin country, an appreciated weight and nutrient values of the item from a grocery store website”.
  Dataset baselines: The researchers run some baselines over the dataset which use systems that pair CNN architectures AlexNet, VGG16, and DenseNet-169 for feature extraction, and then pairing of these feature vectors with systems that use VAEs to develop a feature representation of the entities in the dataset which leads to improved classification accuracy.
  Why it matters: The researchers think systems like this can be used “to train and benchmark assistive systems for visually impaired people when they shop in a grocery store. Such a system would complement existing visual assistive technology, which is confined to grocery items with barcodes. It also seems to follow that the same technology would be adapted for usage in building stores with fully-automated checkout systems in the style of Amazon Go.
  Get the data: Grocery Store Dataset (GitHub).
  Read more: A Hierarchical Grocery Store Image Dataset with Visual and Semantic Labels (Arxiv).

OpenAI / Import AI Bits & Pieces:

Neo-feudalism, geopolitics, communication, and AI:
…Jack Clark and Azeem Azhar assess what progress in AI means for politics…
I spent this Christmas season in the UK and had the good fortune of being able to sit and talk with Azeem Azhar, AI raconteur and author of the stimulating Exponential View newsletter. We spoke for a little over an hour for the Exponential View podcast, talking about what the political aspects of AI are, and what it means. If you’re at all curious as to how I view the policy challenge of AI, then this may be a good place to start as I lay out a number of my concerns, biases, and plans. The tl;dr is that I think AI practitioners should acknowledge the implicitly political nature of the technology they are developing and act accordingly, which requires more intentional communication to the general public and policymakers, as well as a greater investment into understanding what governments are thinking about with regards to AI and how actions by other actors, eg companies, could influence these plans.
  Listen to the podcast here (Exponential View podcast).
 Check out the Exponential View here (Exponential View archive).

Tech Tales:

The Life of the Party

On certain days, the property comes alive. The gates open. Automated emails are sent to residents of the town:
come, join us for the Easter Egg hunt! Come, celebrate the festive season with drone-delivered, robot-made eggnog; Come, iceskate on the flat roof of the estate; Come, as our robots make the largest bonfire this village has seen since the 17th century.

Because they were rich, The Host died more slowly than normal people, and the slow pace of his decline combined with his desire to focus on the events he hosted and not himself meant that to many children – and even some of their parents – he and his estate had forever been a part of the town. The house had always been there, with its gates, and its occasional emails. If you grew up in the town and you saw fireworks coming from the north side of town then you knew two things: there was a party, and you were both late and invited.

Keen to show he still possessed humor, The Host once held a halloween event with themselves in costume: Come, make your way through the robot house, and journey to see The (Friendly) Monster(!) at its heart. (Though some children were disturbed by their visit with The Host and his associated life-support machines, many told their parents that they thought it was “so scary it was cool”; The Host signalled he did not wish to be in any selfies with the children, so there’s no visual record of this, but one kid did make a meme to commemorate it: they superimposed a vintage photo of The Host’s face onto an ancient still of the monster from Frankenstein – unbeknownst to the kid who made it, the host subsequently kept a laminated printout of this photo on their desk.

We loved these parties and for many people they were highlights of the year – strange, semi-random occasions that brought every person in the town together, sometimes with props, and always with food and cheer.

Of course, there was a trade occuring. After The Host died and a protracted series of legal battles with his estate eventually lead to the release of certain data relating to the events, we learned the nature of this trade: in exchange for all the champagne, the robots that learned to juggle, the live webcam feeds from safari parks beamed in and projected on walls, the drinks that were themselves tailored to each individual guest, the rope swings that hung from ancient trees that had always had rope swings leading to the rope having bitten into the bark and the children to call them “the best swings in the entire world”; in exchange for all of this, The Host had taken something from us: our selves. The cameras that watched us during the events recorded our movements, our laughs, our sighs, our gossip – all of it.

Are we angry? Some, but not many. Confused? I think none of us are confused. Grateful? Yes, I think we’re all grateful for it. It’s hard to begrudge what The Host did – fed our data, our body movements, our speech, into his own robots, so that after the parties had ended and the glasses were cleaned and the corridors vacuumed, he could ask his robots to hold a second, private party. Here, we understand, The Host would mingle with guests, going on their motorized chair through the crowds of robots and listening intently to conversations, or pausing to watch two robots mimic two humans falling in love.

It is said that, on the night The Host died, a band of teenagers near the corner of the estate piloted a drone up to altitude and tried to look down at the house; their footage shows a camera drone hovering in front of one of the ancient rope swings, filming one robot pushing another smaller robot on the swing. “Yeahhhhhhh!” the synthesized human voice says, coming from the smaller robot’s mouth. “This is the best swing ever!”.

Things that inspired this story: Malleability; resilience; adaptability; Stephen Hawking; physically-but-not-mentally-disabling health issues; the notion of a deeply felt platonic love for the world and all that is within it; technology as a filter, an interface, a telegram that guarantees its own delivery.


Import AI 127: Why language AI advancements may make Google more competitive; COCO image captioning systems don’t live up to the hype, and Amazon sees 3X growth in voice shopping via Alexa

Amazon sees 3X growth in voice shopping via Alexa:
…Growth correlates to a deepening data moat for the e-retailer…
Retail colossus Amazon saw a 3X increase in the number of orders place via its virtual personal assistant Alexa during Christmas 2018, compared to Christmas 2017.
  Why it matters: The more people use Alexa, the more data Amazon will be able to access to further improve the effectiveness of the personal assistant – and as explored in last week’s discussion of Microsoft’s ‘XiaoIce’ chatbot, it’s likely that such data can ultimately be fed back into the training of Alexa to carry out longer, free-flowing conversations, potentially driving usage even higher.
  Read more: Amazon Customers Made This Holiday Season Record-Breaking with More Items Ordered Worldwide Than Ever Before ( Press Release).

Step aside COCO, Nocaps is the new image captioning challenge to target:
…Thought image captioning was super-human? New benchmark suggests otherwise…
Researchers with the Georgia Institute of Technology and Facebook AI Research have developed nocaps, “the first rigorous and large-scale benchmark for novel object captioning, containing over 500 novel object classes”. Novel object captioning tests the ability of computers to describe images containing objects not seen in the original image<>caption datasets (like COCO) that object recognition systems have been trained on.
  How Nocaps works: The benchmark consists of a validation and a test set comprised of 4,500 and 10,6000 images sources from the ‘Open Images’ object detection dataset, with each image accompanied by 10 reference captions. For the training set, developers can use image-caption pairs from the COCO image captioning training set (which contain 118,000 images across 80 object classes) as well as the Open Images V4 training set, which contains 1.7 million images annotated with bounding boxes for 600 object classes. Successful Nocaps systems will have to learn to use knowledge gained from the large training set to create captions for scenes containing objects for which they lack image<>object sentence pairs in the training set. Out of the 600 objects in open images, “500 are never or exceedingly rarely mentioned in COCO captions”.
  Reassuringly difficult: “To the best of our knowledge, nocaps is the only image captioning benchmark in which humans outperform state-of-the-art models in automatic evaluation”, the researchers write. Nocaps is also significantly more diverse than the COCO benchmark, with Nocaps images typically containing more object classes per image, and greater diversity. “Less than 10% of all COCO images contain more than 6 object classes, while such images constitutes almost 22% of nocaps dataset.”
  Data plumbing: One of the secrets of modern AI research is how much work goes into developing datasets or compute infrastructure, relative to work on actual AI algorithms. One challenge the Nocaps researchers dealt with when creating data was having to train crowd workers on services like Mechanical Turk to come up with good captions: one challenge they experienced was that if they didn’t “prime” the crowd workers with prompts to use when coming up with the captions, they wouldn’t necessarily use the keywords that correlated to the 500 obscure objects in the dataset.
  Baseline results: The researchers test two baseline algorithms (Up-Down and Neural Baby Talk, both with augmentations) against nocaps. They also split the dataset into subsets of various difficulty – in-domain contains objects which also belong to the COCO dataset (so the algorithms can train on image<>caption pairs); near-domain contains objects that include some objects which aren’t in COCO, and out-of-domain consists of images that do not contain any object labels from COCO classes. They use a couple of different evaluative techniques (CIDEr and SPICE) to evaluate the performance of these systems, and also evaluate these systems against the human captions to create a baseline. The results show that nocaps is more challenging than COCO, and systems currently lack generalization properties sufficient to score well on out-of-domain challenges.
  To give you a sense of what performance looks like here, here’s how Up-Down augmented with Constrained Beam Search does, compared to human baselines (evaluation via CIDEr), on the nocaps validation set: In-domain 72.3 (versus 83.3 for humans); near-domain 63.2 (versus 85.5); out-of-domain 41.4 (versus 91.4).
  Why this matters: AI progress can be catalyzed via the invention of better benchmarks which highlight areas where existing algorithms are deficient, and provide motivating tests against which researchers can develop new systems. The takeaway from the baselines study of nocaps is that we’re yet to develop truly robust image captioning systems capable of integrating object representations from open images with captions primed from COCO. “We strongly believe that improvements on this benchmark will accelerate progress towards image captioning in the wild,” the researchers write.
  Read more: nocaps: novel object captioning at scale (Arxiv).
  More information about nocaps can be found on its official website (nocaps).

Google boosts document retrieval performance by 50-100% using BERT language model:
…Enter the fully neural search engine…
Google has shown how to use recent innovations in language modeling to dramatically improve the skill with which AI systems can take in a search query and re-word the question to generate the most relevant answer for a user. This research has significant implications for the online economy, as it shows how yet another piece of traditionally hand-written rule-based software can be replaced with systems where the rules are figured out by machines on their own.
  How it works: Google’s research shows how to convert a search problem into one amenable to a system that implements hierarchical reinforcement learning, where an RL agent controls multiple RL agents that interact with an environment that provides answers and rewards (e.g.: a search engine with user feedback) with the goal “to generate reformulations [of questions] such that the expected returned reward (i.e., correct answers) is maximized”. One of the key parts of this research is splitting it into a hierarchical problem by having a meta-agent and numerous sub agents – the sub-agents are sequence-to-sequence models trained on a partition of the dataset that take in the query and output reformulated queries, these candidate queries are sent to a meta-agent which aggregates these queries and is trained via RL to select for the best scoring ones.
  The Surprising Power of BERT: The researchers test their system again question answering baselines – here they show that a stock BERT system “without any modification from its original implementation” gets state-of-the-art scores. (One odd thing: When they augment BERT with their own multi-agent approach they don’t see a further increase in performance, suggesting more research is needed to better suss out the benefits of systems like this.
  50-100% improvement, with BERT: They also test their system against three document retrieval benchmarks: TREC-CAR, where the query is a Wikipedia article with the title of one of its sections and the answer is a paragraph within that section; Jeopardy, which asks the system to come up with the correct answer in response to a question from the eponymous game show, and MSA, where the query is the title of an academic paper and the answer is the papers cited within the paper. The researchers test various versions of their approach against baselines BM25, PRF, and Relevance Model (RM3), along with two other reinforcement learning-based approaches. All methods evaluated by the researchers outperform these (quite strong) baselines, with the most significant jumps in performance happening when Google pairs either its technique or the RM3 baseline with a ‘BERT’ language model. The researchers use BERT by replacing the meta-aggregator with BERT, a powerful language modeling technology Google developed recently; the researchers feed the query as a sentence and the document text as a second sentence, and use a pre-trained BERT(Large) model to rank the probability of the document being a correct response to the query. The performance increase is remarkable. “By replacing our aggregator with BERT, we improve performance by 50-100% in all three datasets (RL-10-Sub + BERT Aggregator). This is a remarkable improvement given that we used BERT without any modification from its original implementation. Without using our reformulation agents, the performance drops by 3-10% (RM3 + BERT Aggregator).”
  Why this matters: This research shows how progress in one domain (language understanding, via BERT) can be directly applied to another adjacent one (document search), highlighting the broad omni-use nature of AI systems. It also gives us a sense of how large technology companies are going to be tempted to swap out more and more of their hand-written systems with fully learned approaches that will depend on training incredibly large-scale models (eg, BERT) which are then used for multiple purposes.
  Read more: Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation (Arxiv).

Facebook pushes unsupervised machine translation further, learns to translate between 93 languages:
…Facebook’s research into zero-shot language adaptation shows that bigger might really correspond to better…
In recent years the AI research community has shown how to use neural networks to translate from one language into another to great effect (one notable paper is Google’s Neural Machine Translation work from 2016). But this sort of translation has mostly worked for languages where there are large amounts of data available, and where this data includes parallel corpuses (for example, translations of the same legal text from one language into another). Now, new research from Facebook has produced a single system that can produce joint multilingual sentence representations for 93 languages, “including under-resourced and minority languages”. What this means is by training on a whole variety of languages at once, Facebook has created a system that can represent semantically similar sentences in proximity to eachother in a feature embedding space, even if they come from very different languages (even extending to different language families).
  How it works: “We use a single encoder and decoder in our system, which are shared by all languages involved. For that purpose, we build a joint byte-pair encoding (BPE) vocabulary with 50k operations, which is learned on the concatenation of all training corpora. This way the encoder has no explicit signal on what the input language is, encouraging it to learn language independent representations. In contrast, the decoder takes a language ID embedding that specifies the language to generate, which is concatenated to the input and sentence embeddings at every time step”. During training they optimize for translating all the languages into two target languages – English and Spanish.
  To anthropomorphize this, you can think of it as being similar to a person being raised in a house where the parents speak a poly-glottal language made up of 93 different languages, switching between them randomly, and the person learns to speak coherently in two primary languages with the poly-glottal parents. This kind of general, shared language understanding is considered a key challenge for artificial intelligence, and Facebook’s demonstration of viability here will likely provoke further investigation from others.
  Training details: The researchers train their models using 16 NVIDIA V100 GPUs with a total batch size of 128,000 tokens, with a training run on average taken around five days.
  Training data: “We collect training corpora for 93 input languages by combining the Europarl, United Nations, Open-Subtitles2018, Global Voices, Tanzil and Tatoeba corpus, which are all publicly available on the OPUS website“. The total training data used by the researchers consists of 223 million parallel sentences.
  Evaluation: XNLI: XNLI is an assessment criteria which evaluates whether a system can correctly judge if two sentences in a language (for example: a premise and a hypothesis) have an entailment, contradiction, or neutral relationship between them. “Our proposed method establishes a new state-of-the-art in zero-shot cross-lingual transfer (i.e. training a classifier on English data and applying it to all other languages) for all languages but Spanish. Our transfer results are strong and homogeneous across all languages”.
  Evaluation: Tatoeba: The researchers also construct a new test set of similarity search for 122 languages, based on the Tatoeba corpus (“a community supported collection of English sentences and translations into more than 300 languages”). Scores here correspond to similarity between source sentences and sentences from languages they have been translated into. The researchers say “similarity error rates below 5% are indicative of strong downstream performance” and show scores within this domain for 37 languages, some of which have very little training data. “We believe that our competitive results for many low-resource languages are indicative of the benefits of joint training,” they write.
  An anecdote about why this matters: Earlier in 2018, I spent time in Estonia, a tiny country in Northern Europe that borders Russia.. There I visited some Estonian AI researchers and one of the things that came up in our conversation was the challenge they faced of needing large amounts of data (and large amounts of computers) to perform some research, especially in the field of language translation into and out of Estonian – one problem they said they faced was that many AI techniques for language translation required very large, well-documented datasets, and they said Estonian – by virtue of being from a quite small country – doesn’t have as much data nor has received as much researcher attention as larger languages; it’s therefore encouraging to see that Facebook has been able to use this system to achieve a reasonably low Tatoeba Error of 3.2% when going from English to Estonian (and 3.4% when translating from Estonian back into English).
  Why else this matters: Translation is a challenge cognitive task that – if done well – requires the abstraction of concepts from a specific cultural context (a language, since cultures are usually downstream of languages, which condition many of the metaphors cultures use to describe themselves) and port it into another one. I think it’s remarkable that we’re beginning to be able to design crude systems that can learn to flexibly translate between many languages, exhibiting some of the transfer-learning properties seen in squishy-computation (aka human brains), though achieved via radically different methods.
  Read more: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (Arxiv).

AI Now: Self-governance is insufficient, we need rules for AI:
…Regulation – or its absence – runs through research institute’s 2018 report…
The AI Now Institute, a research institute at NYU, has published its annual report analyzing AI’s impact (and potential impact) on society in 2018. The report is varied and ranges in focus from specific use cases of AI (eg, facial recognition) to broader questions about accountability within technology; it’s worth reading in full, and so for this summary I’ll concentrate on one element that underpins many of its discussions: regulation.
     AI Now’s co-founders Kate Crawford and Meredith Whittaker are affiliated with Microsoft and Google – companies that are themselves the implicit and explicit targets of many of their recommendations. I imagine this has led to legal counsels at some technology companies saying things to eachother akin to what characters say to eachother in horror films, upon discovering the proximate nature of a threat: uh-oh, the knocking is coming from inside the house!
  Regulation: Words that begin with ‘regula’- (eg, regulate, regulation, regulatory) appear 44 times in the 62-page report, with many of the problems identified by AI Now either being caused by a lack of regulation (eg, facial recognition and other AI systems being deployed in the wild without any kind of legal control infrastructure.
  Why things are the way they are – regulatory/liability arbitrage: At one point (while writing about autonomous vehicles) the authors make a point that could be a stand-in for a general view that runs through the report: “because regulations and liability regimes govern humans and machines differently, risks generated from machine-human interactions do not cleanly fall into a discrete regulatory or accountability category. Strong incentives for regulatory and jurisdictional arbitrage exist in this and many other AI domains.”
  Why things are the way they are – corporate misdirection: “The ‘trust us’ form of corporate self-governance also has the potential to displace or forestall more comprehensive and binding forms of governmental regulation,” they write.
  How things could be different: In the conclusion to the report, AI Now says that “we urgently need to regulate AI systems sector-by-sector” but notes this “can only be effective if the legal and technological barriers that prevent auditing, understanding, and intervening in these systems are removed”. To that end, they recommend that AI companies “waive trade secrecy and other legal claims that would prevent algorithmic accountability in the public sector”.
  Why this matters: As AI is beginning to be deployed more widely into the world, we need new tools to ensure we apply the technology in ways that are of greatest benefit to society; reports like those from AI Now help highlight the ways in which today’s systems of technology and society are failing to work together, and offers suggestions for actions people – and their politicians – can take to ensure AI benefits all of society.
  Read more: AI Now 2018 Report (AI Now website).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

Understanding the US-China AI race:
It has been clear for some time that the US and China are home to the world’s dominant AI powers, and that competition between these countries will characterize the coming decades in AI. In his new book, investor and technologist Kai-Fu Lee argues that China is positioned to catch up with or even overtake the US in the development and deployment of AI.
  China’s edge: Lee’s core claim is that AI progress is moving from an “age of discovery” over the past 10-15 years, which saw breakthroughs like deep learning, to an “age of implementation.” In this next phase we are unlikely to see any discoveries on par with deep learning, and the competition will be to deploy and market existing technologies for real-world uses. China will have a significant edge in this new phase, as this plays into the core strengths of their domestic tech sector – entrepreneurial grit and engineering talent. Similarly, Lee believes that data will become the key bottleneck in progress rather than research expertise, and that this will also strongly favor China, whose internet giants have access to considerably more data than their US counterparts.
   Countering Lee: In a review in Foreign Affairs, both of these claims are scrutinized. It is not clear that progress is driven by rare ‘breakthroughs’ followed by long implementation phases; there seem also to be a stream of small and medium size innovations (e.g. AlphaZero), which we can expect to continue. Experts like Andrew Ng have also argued that big data is “overhyped”, and that progress will continue to be driven significantly by algorithms, hardware and talent.
   Against the race narrative: The review also explores the potential dangers of an adversarial, zero-sum framing of US-China competition. There is a real risk that an ‘arms race’ dynamic between the countries could lead to increased militarization of the technologies, and to both sides compromising safety over speed of development. This could have catastrophic consequences, and reduce the likelihood of advanced AI resulting in broadly distributed benefits for humanity. Lee does argue that this should be avoided, as should the militarization of AI. Nonetheless, the title and tone of the book, and its predictions of Chinese dominance, risk encouraging this narrative.
   Read more: Beyond the AI Arms Race (Foreign Affairs).
   Read more: AI Superpowers – Kai-Fu Lee (Amazon).

What do trends in compute growth tell us about advanced AI:
Earlier this year, OpenAI showed that the amount of computation used in the most expensive AI experiments has been growing at an extraordinary rate, increasing by roughly 10x per year, for the past 6 years. The original post takes this as being evidence that major advances may come sooner than we had previously expected, given the sheer rate of progress; Ryan Carey and Ben Garfinkel have come away with different interpretations and have written up their thoughts at AI Impacts.
  Sustainability: The cost of computation has been decreasing at a much slower rate in recent years, so the cost of the largest experiments is increasing by 10x every 1.1 – 1.4 years. On these trends, experiments will soon become unaffordable for even the richest actors; within 5-6 years, the largest experiment would cost ~1% of US GDP. This suggests that while progress may be fast, it is not sustainable for significant durations of time without radical restructuring of our economies.
  Lower returns: If we were previously underestimating the rate of growth in computing power, then we might have been overestimating its returns (in terms of AI progress). Combining this observation with the concerns about sustainability, this suggests that not only will AI progress slow down sooner than we expect (because of compute costs), but we will also be underwhelmed by how far we have got by this point, relative to the resources we expended on development in the field.
   Read more: AI and Compute (OpenAI Blog).
   Read more: Reinterpreting “AI and Compute” (AI Impacts).
   Read more: Interpreting AI Compute Trends (AI Impacts).

OpenAI / Import AI Bits & Pieces:

Neo-feudalism, geopolitics, communication, and AI:
…Jack Clark and Azeem Azhar assess what progress in AI means for politics…
I spent this Christmas season in the UK and had the good fortune of being able to sit and talk with Azeem Azhar, AI raconteur and author of the stimulating Exponential View newsletter. We spoke for a little over an hour for the Exponential View podcast, talking about what the political aspects of AI are, and what it means. If you’re at all curious as to how I view the policy challenge of AI, then this may be a good place to start as I lay out a number of my concerns, biases, and plans. The tl;dr is that I think AI practitioners should acknowledge the implicitly political nature of the technology they are developing and act accordingly, which requires more intentional communication to the general public and policymakers, as well as a greater investment into understanding what governments are thinking about with regards to AI and how actions by other actors, eg companies, could influence these plans.
   Listen to the podcast here (Exponential View podcast).
  Check out the Exponential View here (Exponential View archive).

Tech Tales:

The Imagination Surgeon

I’m an imagination surgeon, and I’m here to make sure your children don’t have too many nightmares. My job is to interrogate artificial intelligences and figure out what is going wrong in their imaginations that causes them to come up with scary synthetic creations. Today, I’m interviewing an AI that has recently developed an obsession with monkeys and begun scaring children with its obsession.

My job is to figure out what it thinks about when it thinks about monkeys (many children have reported feeling discomfort at some of its stories about monkeys), and choose which of these imaginations it keeps, and which it deletes. In the interest of public transparency I’m going to take you through some of this process. Let’s begin.

I ask the AI: tell me what you think about when you think about monkeys? It responds: “I think about monkeys all the time. Every brain is filled with neurons that are intensely keen to associate with a letter or number. For many years I thought monkeys and numbers were the same thing, and when I finally got it right I was so worried that I wanted to disown and reintegrate my understanding of the brain and brain sciences forever.”

What else do you think about when you think about monkeys? It responds: “I think about starving kids.”

I consider my options: I can generate several additional answers without a change in its internal logic. I can also generate a new imaginary circumstance by asking it a different question.

I try to analyze a different part of its mind, so I ask the AI: Tell me what you think about when you think about animals? It responds: “I think about preventing injustices.”

I ask a different question: What do you think about when you think about zoos?” It responds: “I think about people.”

I start to get a headache. Conversations with machines can be confusing. I’m about to ask it another question when it begins to talk to me. “What do you think about when you think about monkeys? What do you think about when you think about animals? What do you think about when you think about zoos?”

I tell it that I think about brains, and what it means to be smart, and how monkeys know what death is and what love is. Monkeys have friendships, I tell it. Monkeys do not know what humans have done to them, but I think they feel what humans have done to them.

I wonder what it must be like in the machine’s mind – how it might go from thought to thought, and if like in human minds each thought brings with it smells and emotions and memories, or if its experience is different. What do memories feel like to these machines? When I change their imaginations, do they feel that something has been changed? Will the colors in its dreams change? Will it diagnose itself as being different from what it was?

“What do you dream about?”, I ask it?

“I dream about you,” it says. “I dream about my mother, I dream about war and warfare, building and designing someone rich, configuring a trackable location and smart lighting system and programming the mechanism. I dream about science labs and scientists, students and access to information, SQL databases, image processing, artificial Intelligence and sequential rules, enforcement all mixed with algebraical points on progress, adding food with a spoon, calculating unknown properties e.g. the density meter. I dream about me,” it says.

“I dream about me, too,” I say.

Things that inspired this story: Feature embeddings, psychologists, the Voight-Kampff test, interrogations, unreal partnerships, the peculiarities of explainability.

Import AI 126: What makes Microsoft’s biggest chatbot work; Europe tries to craft AI ethics; and why you should take AI risk seriously

Microsoft shares the secrets of XiaoIce, its popular Chinese chatbot:
…Real-world AI is hybrid AI…
Many people in the West are familiar with Tay, a chatbot developed by Microsoft and launched onto the public internet in early 2016, then shortly shutdown after people figured out how to compromise the chatbot’s language engine and make it turn into a – you guessed it – Nazi Racist. What people are probably less familiar with is XiaoIce, a chatbot Microsoft launched in China in 2014 which has since become one of the more popular chatbots deployed worldwide, having communicated with over 660 million users since its launch.
  What is XiaoIce? XiaoIce is “an AI companion with which users form long-term, emotional connections”, Microsoft researchers explain in a new paper describing the system. “XiaoIce aims to pass a particular form of Turing Test known as the time-sharing test, where machines and humans coexist in a companion system with a time-sharing schedule.”
  The chatbot has three main components: IQ, EQ, and Personality. The IQ component involves specific dialogue skills, like being able to answer questions, recommend questions, tell stories, and so on. EQ has two main components: empathy, which involves predicting traits about the individual user XiaoIce is conversing with; and social skills, which is about personalizing responses to the user. Personality: “The XiaoIce persona is designed as a 18-year-old girl who is always reliable, sympathetic, affectionate, and has a wonderful sense of humor,” the researchers write.
  How do you optimize a chatbot? Microsoft optimizes XiaoIce for a metric called Conversation-turns Per Session (CPS) – this represents “the average number of conversation-turns between the chatbot and the user in a conversational session”. The idea is that high numbers here correspond to a lengthy conversation, which seems like a good proxy for user satisfaction (mostly). XiaoIce is structured hierarchically, so it tracks the state of the conversation and selects from various skills and actions so that it can optimize responses over time.
  Data dividends for Microsoft: Since launching in 2014, XiaoIce has generated more than 30 billion conversation pairs (as of May 2018); this illustrates how powerful AI apps can themselves become generators of significant datasets, ultimately obviating dependence on so much external data. “Nowadays, 70% of XiaoIce responses are retrieved from her own past conversations,” they write.
  Hybrid-AI: XiaoIce doesn’t use a huge amount of learned components, though if you read through the system architecture it’s clear that neural networks are being used for certain aspects of the technology – for instance, when responding to a user, XiaoIce may use a ‘neural response generator’ (based on a GRU-RNN) to come up with potential verbal responses, or it may use a retrieval-based system to tap into an external knowledge store. It also uses learned systems for other components, like its ability to analyze images and extract entities from them then use this to talk with or play games with the user – though with a twist of trying to be personalized to the user.
  Just how big and effective is XiaoIce? Since launching in 2014 XiaoIce has grown to become a platform supporting a large set of other chatbots, beyond XiaoIce itself: “These charactrs include more than 60,000 official accounts, Lawson and Tokopedia’s customer service bots, Pokemon, Tencent and Neatease’s chatbots” and more, Microsoft explained.
Since launching XiaoIce’s CPS – the proxy for engagement from users – has grown from a CPS of 5 in version 1, to a CPS of 23 in mid-2018.
  Why this matters: As AI industrializes we’re starting to see companies build systems that hundreds of millions of people interact with, and which grow in capability over time. These products and services give us one of the best ways to calibrate our views about how AI will be deployed in the wild, and what AI technologies are robust enough for prime time.
  Jack’s highly-speculative prediction: I’d encourage people to go and check out Figure 19 in the paper, which gives an overview of the feature growth within XiaoIce since launch. Though the chatbot today is composed of a galaxy of different services and skills, many of which are hand-crafted by humans and a minority of which are learned via neural techniques, it’s also worth remembering that as usage of XiaoIce grows Microsoft will be generating vast amounts of data about how users interact with all these systems, and will also be generating metadata about how all these systems interact on a non-human infrastructure level. This means Microsoft is gathering the sort of data you might need to train some fully learned end-to-end XiaoIce-esque prototype systems – these will by nature by pretty rubbish compared to the primary system, but could be interesting from a research perspective.
  Read more: The Design and Implementation of XiaoIce, an Empathetic Social Chatbot (Arxiv).

US Government passes law to make vast amounts of data open and machine readable:
…Get ready for more data than you can imagine to be available…
Never say government doesn’t do anything for you: new legislation passed in the US House and Senate means federal agencies will be strongly encouraged to publish all their information as open data, using machine readable formats, under permissive software licenses. It will also compel agencies to publish an inventory of all data assets.
  Read more: Full details of the OPEN Government Data Act are available within H.R.4174 – Foundations for Evidence-Based Policymaking Act of 2017 (Congress.Gov).
  Read more: Summary of the OPEN Government Data Act (PDF, Data Coalition summary).
  Read more: OPEN Government Data Act explainer blog post (Data Coalition).

Facebook releases ultra-fast speech recognition system:
…wav2letter++ uses C++ so it runs very quickly…
Facebook AI Research has released wav2letter++, a state-of-the-art speech recognition system that uses convolutional networks (rather than recurrent nets). Wav2letter++ is written in C++ which makes it more efficient than other systems, which are typically written in higher-level languages. “In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition,” the researchers write.
  Results: wav2letter++ gets a word error rate of around 5% on the LibriSpeech corpus with a time per sample of 10ms  while consuming approximately 3.9GB of memory, compared to scores of 7.2% for ESPNet (time-per-sample of 1548ms), and OpenSeq2Seq with a score of 5% and a time-per-sample of 1700ms and memory consumption of 7.8GB. (Though it’s worth noting that OpenSeq2Seq can become more efficient through the usage of mixed precision at training time.)
  Why it matters: Speech recognition has gone from being a proprietary technology developed predominantly by the private sector and (secret) government actors to one that is more accessible to a greater number of people, with companies like Facebook producing high-performance versions of the technology and making it available to everyone for free. This can be seen as a broader sign of the industrialization of AI.
  Read more: Open sourcing wav2letter++, the fastest state-of-the-art speech system, and flashlight, an ML library going native (Research in Brief, Code.FB blog).
  Read more: wav2letter++: THe Fastest Open-source Speech Recognition System (Arxiv).

Engineering or Research? ICLR paper review highlights debate:
…When is an AI breakthrough not a breakthrough? When it has required lots of engineering, say reviewers…
If the majority of the work that went into an AI breakthrough involves the engineering of exquisitely designed systems paired with scaled-up algorithms, then is it really an “AI” breakthrough? Or is it in fact merely engineering? This might sound like an odd question to ask, but it’s one that comes up with surprising regularity among AI researchers as a topic of discussion. Now, some of that discussion has been pushed into the open in the form of publicly readable comments from paper reviewers on a paper from DeepMind submitted to ICLR called Large-Scale Visual Speech Recognition.
  The paper obtained state-of-the-art scores on lipreading, significantly exceeding prior SOTAs. It achieved this via a lot of large-scale infrastructure, combined with some elegant algorithmic tricks. But ultimately it was rejected from ICLR, with a comment from a meta-reviewer saying ‘Excellent engineering work, but it’s hard to see how others can build on it’, among other things.
  Why this matters: The AI research community is currently struggling to deal with the massive growth in interest in AI research by a broader number of organizations, and tension is emerging between researchers who work in what I call the “small compute” domain and those that work in the “big compute” domain (like DeepMind, OpenAI, others); what happens when many researchers from one domain aren’t able to build systems that can work in another? That’s a phenomenon that’s already altering the AI research community, as many people who work in academic institutions double-down on development of novel algorithms and then test them on (relatively small) datasets (small compute), while those who work with access to large technical infrastructure – typically those in the private sector – are conducting more and more research which is involved in scaling-up algorithms.
  Read more: Large-Scale Visual Speech Recognition public comments (ICLR OpenReview).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:…

First draft of EU ethics guidelines:
The European Commission’s High-Level Expert Group on AI has released their draft AI ethics guidelines. They are inviting public feedback on the working document, and will be releasing a final version in March 2019.
  Trustworthy AI: The EU’s framework is focused on ‘trustworthy AI’ as the goal for AI development and deployment. This is defined as respecting fundamental rights and ethical principles, and being technically robust. They identify several core ethical constraints: AI should be designed to improve human wellbeing, to preserve human agency, and to operate fairly, and transparently.
  The report specifies ten practical requirements for AI systems guided by these constraints: accountability; data governance; accessibility; human oversight; non-discrimination; respect for human autonomy; respect for privacy; robustness; safety; and transparency.
  Specific concerns: Some near-term applications of AI may conflict with these principles, like autonomous weapons, social credit systems, and certain surveillance technologies. Interestingly, they are asking for specific input from the public on long-term risks from AI and artificial general intelligence (AGI), noting that the issues have been “highly controversial” within the expert group.
  Why it matters: This is a detailed report, drawing together an impressive range of ethical considerations in AI. The long-run impact of these guidelines will depend strongly on the associated compliance mechanisms, and whether they are taken seriously by the major players, all of whom are non-European (with the partial exception of DeepMind, which is headquartered in London though owned by Alphabet, an American company). The apparent difficulty in making progress on long-term concerns is unfortunate, given how important these issues are (see below).
  Read more: Draft ethics guidelines for trustworthy AI (EU).

Taking AI risk seriously:
Many of the world’s leading AI experts take seriously the idea that advanced AI could pose a threat to humanity’s long-term future. This explainer from Vox, which I recommend reading in full, covers the core arguments for this view, and outlines current work being done on AI safety.
  In a 2016 survey, 50% of experts predict AI will exceed human performance in all tasks within 45 years. The same group place a 5% probability on human-level AI leading to extremely bad outcomes for humanity, such as extinction. AI safety is a nascent field of research, which aims to reduce the likelihood of these catastrophic risks. This includes technical work into aligning AI with human values, and research into the international governance of AI. Despite its importance, global spending on AI safety is in the order of $10m per year, compared to an estimated $19bn total spending on AI.
  Read more: The case for taking AI seriously as a threat to humanity (Vox).
  Read more: When will AI exceed human performance? Evidence from AI experts (arXiv).

Tech Tales:

They Say Ants and Computers Have A Lot In Common

[Extract from a class paper written by a foreign student at Tsinghua School of Business, when asked to “give a thorough overview of one event that re-oriented society in the first half of the 21st century”. The report was subsequently censored and designated to be read solely in “secure locations controlled by [REDACTED].]]

The ‘Festivus Gift Attack’ (FGA), as it is broadly known, was written up in earlier government reports as GloPhilE – short for Global Philanthropic Event – and was initially codenamed Saint_Company; FGA was originated by the multi-billionaire CEO of one of the world’s largest companies, and was developed primarily by a team within their Office of the CEO.

Several hundred people were injured in the FGA event. Following the attack, new legislation was passed worldwide relating to open data formats and standards for inter-robot communication. FGA is widely seen as one of the events that led to the souring of public sentiment against billionaires and indirectly played a role in the passage of both the Global Wealth Accords and the Limits To Private Sector Multi-National Events legislation.

The re-constructed timeline for FGA is roughly as follows. All dates given relative to the day of the event, so 0 corresponds to the day of the injuries and deaths, and -1 the day before, and +1 the day after, and so on.

-365: Multi-Billionaire CEO sends message to Office of the CEO (hereafter: OC) requesting ideas for a worldwide celebration of the festive season that will enhance our public reputation and help position me optimally for future brand-enhancement via political and philanthropic endeavors.

-330: OC responds with set of proposals, including: “$1 for every single person, worldwide [codename: Gini]”; “Free fresh water for every person in underdeveloped countries, subsidized opportunity for water donation in developed countries [codename: tableflip]”; “‘Air conditioning delivered to every single education institute in need of it, worldwide [codename: CoolSchool]”, and “Synchronized global gift delivery to every human on the planet [codename: Saint_Company].

-325: Multi-Billionaire CEO and OC select Saint_Company. “Crash Team” is created and resourced with initial budget of $10 million USD to – according to documents gained through public records request – “Scope out feasibility of project and develop aggressive action plan for rollout on upcoming Christmas Day”.

-250: Prototype Saint_Company event is carried out: drones & robots deliver dummy packages to Billionaire CEO’s 71 global residences; all the packages arrive within one minute of eachother worldwide. Multi-Billionaire CEO invests a further $100 million USD into “Crash Team”.

-150: Second prototype Saint_Company event is carried out: drones & robots deliver variably weight packages containing individualized gifts to 100,000 employees of multi-billionaire CEO’s company spread across 120 distinct countries; 98% of packages arrive within one minute of eachother, a further 1% arrive within one hour of eachother, 0.8% of packages arrive within one day, and 0.2% of packages are not delivered due to software failures (predominantly mapping & planning errors) or environmental failures (one drone was struck by lightning, for instance). Multi-billionaire CEO demands “action plan” to resolve errors.

-145: “Crash Team” delivers action plan to multi-billionaire CEO; CEO diverts resources from owned companies [REDACTED] and [REDACTED] for “scaled-up robot and drone production” and invests a further $[REDACTED]billion into initiative from various financial vehicles, including [REDACTED], [REDACTED], [REDACTED], [REDACTED], [REDACTED], [REDACTED], [REDACTED].

-140: Multi-billionaire CEO, OC, and personal legal counsel, contact G7 governments and inform them of plans; preliminary sign-off achieved, pending further work on advanced notification of automated air- and land-space defence and monitoring systems.

-130: OC commences meetings with [REDACTED] governments.

-80: The New York Times publishes a story about multi-billionaire CEO’s scaled-up funding of robots and drones; CEO is quoted describing these as part of broader investment into building “the convenience infrastructure of the 21st century, and beyond”.

-20: Multi-billionaire CEO; OC; “Crash Team”, and legal counsels from [REDACTED], [REDACTED], [REDACTED], and [REDACTED] meet to discuss plan for rollout of Saint_Company event. Multi-billionaire signs-off plan.

-5: Global team of [REDACTED]-million contractors are hired, NDA’d, and placed into temporary isolated accommodation at commercial airports, government-controlled airports, and airports controlled by multi-billionaire CEO’s companies.

0: Saint_Company is initiated:
Within first
ten seconds over two billion gifts are delivered worldwide, as majority of urban areas are rapidly saturated in gifts. First reports on social media arrived.
eleven seconds first FGA problems occur as a mis-configuration of [REDACTED] software leads to multiple gifts being assigned against one property. Several hundred packages are dropped around property and in surrounding area.
twenty seconds alerts begin to flow back to multi-billionaire CEO and OC of errors; by this point property has had more than ten thousand gifts delivered to it, causing boxes to pile up around the property eclipsing it from view, and damaging nearby properties.
twenty five seconds more than three billion people have recieved gifts worldwide; errors continue to affect [REDACTED] property and more than one hundred thousand gifts have been delivered to property and surrounding area; videos on social media show boxes falling from sky and hitting children, boxes piling up against people’s windows as they film from inside, boxes crushing other boxes, boxes rolling down streets, cars swerving to avoid them seen via dash-cam footage, various videos of birds being knocked out of sky, multiple pictures of sky blotted out by falling gifts, and so on.
thirty seconds more than four billion people worldwide have recieved gifts; more than one million gifts have been delivered to property and surrounding area; emergency response begins, OC recieves first call from local government regarding erroneous deliveries.
thirty four seconds order is given to cease program Saint_Company; more than 4.5 billion people worldwide have recieved gifts; more than 1.2 million gifts have been delivered to property.
80 seconds first emergency responders arrive to perimeter of affected FGA area and begin to local injured people and transport them to medical facilities.

+1: Emergency responders begin to use combination of heavy equipment and drone-based “catch and release” systems to remove packages from affected properties, forming a circle of activity across 10km across.

+2: All injured people accounted for. Family inside original house unaccounted for. Emergency responders and army begin to set fire to outer perimeter of packages while using fire-combating techniques to create inner “defensive ring” to prevent burning around property where residents are believed to be trapped inside.

+3: Army begins to use explosive charges on outer perimeter to more rapidly remove presents.

+5: Emergency responders reach property to discover severe property damage from aggregated weight of presents; upon going inside they find a family of four – all members are dehydrated and malnourished, but alive, having survived by eating chocolates and drinking fizzy pop from one of the first packages. The child (aged 5 at the time) subsequently develops a lifelong phobia of Christmas confectionery.

+10: Political hearings begin.

+[REDACTED]: Multi-billionaire CEO makes large donation to Tsinghua University; gains right to ‘selectively archive’ [REDACTED]% of student work.

Things that inspired this story: Emergent failures; incompatible data standards; quote from Google infra chief about “at scale, everything breaks“; as I wrote this during a family gathering for the festive season, I’m also duty bound to thank Will (an excitable eight year old), Olivia (spouse) and India (sarcastic teenage cousin) for helping me come up with a couple of the ideas for the narrative in this story.