Import AI

Import AI: 124: Google researchers produce metric that could help us track the evolution of fake video news; $4000 grants for people to teach deep learning; creating aggressive self-driving cars.

Using AI to learn to design networks with multiple constraints:
…InstaNAS lets people add multiple specifications to neural architecture search…
In the past couple of years researchers have started to use various AI techniques such as reinforcement learning and evolution to use AI to design neural network architectures. This has already yielded numerous systems that display state-of-the-art performance on challenging tasks like image recognition, outperforming systems designed specifically by humans.
More recently, we’ve seen a further push to make such so-called ‘neural architecture search’ (NAS) systems efficient, and approaches like ENAS (Import AI #82)  and SMASH (Import AI #56) have shown how to take systems that previously required hundreds of GPUs and fit them onto one or two GPUs.
  Now, researchers are beginning to explore along another-axis of the NAS space: developing techniques that let them provide multiple objectives to the NAS system, letting them specify networks against different constraints. New research from National Tsing-Hua University in Taiwan and Google Research introduces InstaNAS, a system that lets people specify two categories of objectives as search targets, task-dependent objectives (eg, the accuracy in a given classification task) and architecture-level objectives (eg, latency/computational costs).
  How it works: Training InstaNAS systems involves three phases of work: pre-training a one-shot model, then introducing a controller which learns to select architectures from the one-shot model with respect to each input instance (during this stage, “the controller and the one-shot model are being trained alternatively, which enforces the one-shot model to adapt to the distribution change of the controller”, the researchers write), and a final stage in which the system picks the controller which best satisfies the constraints, then the one-shot model is re-trained with that high-performing controller.
  Results: Systems trained with InstaNAS achieve 48.9% and 40.2% average latency reduction on CIFAR-10 and CIFAR-100 against MobileNetV2 with comparable accuracy scores. Accuracies do take a slight hit (eg, the best accuracy on an InstaNAS system is approximately 95.7%, compared to 96.6% for a NAS-trained system.)
  Why it matters: As we industrialize artificial intelligence we’re going to be offloading increasingly large chunks of AI development to AI systems themselves. The development and extension of NAS approaches will be crucial to this. Though we should bear in mind that there’s an implicit electricity<>human brain tradeoff we’re making here, and my intuition is that for some very large-scale NAS systems we could end up creating some hugely energy-hungry systems, which carry their own implicit (un-recognized) environmental externality.
  Read more: InstaNAS: Instance-aware Neural Architecture Search (Arxiv).

New metrics to let us work out when Fake Video News is going to become real:
…With a little help from StarCraft 2!…
Google Brain researchers have proposed a new metric to give researchers a better way to assess the quality of synthetically generated videos. The motivation for this research is that today we lack effective ways to assess and quantify improvements in synthetic video generation, and the history of the deep learning subfield within AI has tended to show the progress in a domain improves once the research community settles on a standard metric and/or dataset to use to assess progress. (My pet theory for why this is: There are so many AI papers published these days that researchers need simple heuristics to tell them whether to invest time in reading something, and progress against a generally agreed upon shared dataset can be a good input for this – eg, ImageNet (Image Rec), Penn Treebank (NLU), Switchboard Hub 500 (Speech Rec). )
  The metric: Frechet Video Distance (FVD): FVD has been designed to give scores that reflect not only the quality of the video, but also its temporal coherence – aka, the way things transition from frame to frame. FVD is built around what the researchers call a ‘Inflated 3D Convnet’, which has been used to solve tasks in other challenging video domains. Because this network is trained to spot actions in videos it contains useful feature relations that correspond to sequences of movements over time. FVD uses an Inflated 3D Convnet, trained on the Kinetics data set of human-centered YouTube videos, to let FVD characterize the difference between the temporal transitions seen in the synthetic videos, and between its own feature representations of physical movements derived from the real world.
  The datasets: In tandem with FVD, the researchers introduce a new dataset based around StarCraft 2, a top-down real-time strategy game with lush, colorful graphics “to serve as an intermediate step towards real world video data sets.” These videos contain various different tasks in StarCraft 2 which are fairly self-explanatory – move unit to border; collect mineral shards; brawl; and road trip with medivac. The researchers provide 14,000 videos for each scenario.
  Results: FVD seems to be a metric that more closely tracks the scores humans give when performing a qualitative evaluation of synthetic videos. “It is clear that FVD is better equipped to rank models according to human perception of quality”.
  Why it matters: Synthetic videos are likely going to cause a large number of profound challenges in AI policy, as progression in this research domain yields immediate applications in the creation of automated propaganda. One of the most challenging things about this area – until now – has been the lack of available metrics to use to track progression here and thereby estimate when synthetic videos are likely going to become something ‘good enough’ for people to worry about in domains outside of AI research. “We believe that FVD and SCV will greatly benefit research in generative models of video in providing a well tailored, objective measure of progress,” they write.
     Read more: Towards Accurate Generative Models of Video: A New Metric & Challenges (Arxiv).
   Get the datasets from here.

Teaching self-driving cars how to drive aggressively:
…Fusing deep learning and model predictive control for aggressive robot cars…
Researchers with the Georgia Institute of Technology have created a self-driving car system that can successfully navigate a 1:5-scale ‘AutoRally’ vehicle along a dirt track at high speeds. This type of work paves the way for a future where self-driving cars can go off-road, and gives us indications for how militaries might be developing their own stealthy unmanned ground vehicles (UGVs).
  How it works: Fusing deep learning and model predictive control: To create the system, the researchers feed visual inputs from a monocular camera into either a static supervised classifier or a recurrent LSTM (they switch between the two according to the difficulty of the particular section of the map the vehicle is on) which use this information to predict where the vehicle is against a pre-downloaded map schematic. They then feed this prediction into a GPU-based particle filter which incorporates data from the vehicle IMU and wheel speeds to further predict where the vehicle is on the map.
  Superhuman Wacky Races: The researchers test their system out on a complex dirt track on at the Georgia Tech Autonomous Racing Facility. This track “includes turns of varying radius including a 180 degree hairpin and S curve, and a long straight section”. The AutoRally car is able to “repeatedly beat the best single lap performed by an experienced human test driver who provided all of the system identification data.”
  Why it matters: Papers like this show how hybrid systems – where deep learning is doing useful work as a single specific component – are likely going to yield useful applications in challenging domains. I expect the majority of applied robotics systems in the future to use modular systems combining the best of human-specified systems as well as function approximating systems based on deep learning.
  Read more: Vision-Based High Speed Driving with a Deep Dynamic Observer (Arxiv).

What does a robot economy look like and what rules might it need?
…Where AI, Robotics, and Fully Automated Luxury Communism collide…
As AI has grown more capable an increasing number of people have begun to think about what the implications are for the economy. One of the main questions that people contemplate is how to effectively incorporate a labor-light capital-heavy AI-infused economic sector (or substrate of the entire economy) into society in such a way as to increase societal stability rather than reduce it. A related question is: What would an economy look like where an increasing chunk of economic activity happens as a consequence of semi-autonomous robots, many of whom are also providing (automated) services to each other? These are the questions that researchers with the University of Texas at Austin try to answer with a new paper interrogating the implications of a robot-driven economy.
  Three laws of the robot economy: The researchers propose three potential laws for such a robot economy. These are:
– A robot economy has to be developed within the framework of the digital economy, so it can interface with existing economic systems. .
– The economy of robots must have internal capital that can support the market and reflect the value of the participation of robots in our society.
– Robots should not have property rights and will have to operate only on the basis of contractual responsibility, so that humans control the economy, not the machines.
   Tools to build the robot economy: So, what will it take to build such a world? We’d likely need to develop the following tools:
– Create a network to track the status and implication of tasks given to or conducted by robots in accordance with the terms of a digital contract.
– A real-time communication system to let robots and people communicate together and with each-other.
– The ability to use “smart contracts” via the blockchain to govern these economic interactions. (This means that “neither the will of the parties to comply with their word nor the dependence on a third party (i. e. a legal system) is required).
  What does a robot economy mean for society? If we manage to make it through a (fairly unsteady, frightening) economic transition into a robot economy, then some very interesting things start to happen: “the most important fact is that in the long-term, intelligent robotics has the potential to overcome the physical limitations of capital and labor and open up new sources of value and growth”, write the researchers. This would provide the opportunity for vast economic abundance for all of mankind, if taxation and political systems can be adjusted to effectively distribute the dividends of an AI-driven economy.
  Why it matters: Figuring out exactly how society is going to be influenced by AI is one of the grand challenges of contemporary research into the impacts of AI on society. Papers like this suggest that such an economy will have very strange properties compared to our current one, and will likely demand new policy solutions.
  Read more: Robot Economy: Ready or Not, Here It Comes (Arxiv).

Want to teach others the fundamentals of deep learning? Want financial support? Apply for the Depth First Learning Fellowship!
…Applications open now for $4000 grants to help people teach others deep learning…
Depth First Learning, an AI education initiative from researchers at NYU, FAIR, DeepMind, and Google Brain, has announced the ‘Depth First Learning Fellowship’, sponsored by Jane Street.
  How the fellowship works: Successful DFL Fellowship applicants will be expected to design a curricula and lead a DFL study group around a particular aspect of deep learning. DFL is looking for applicants with the following traits: mathematical maturity; effectiveness at scientific communication; ability to commit to ensure the DFL study sessions are useful; a general enjoyment of group learning.
  Applications close on February 15th 2019.
  Apply here (Depth First Learning).

Tired of classifying handwritten digits? Then try CURSIVE JAPANESE instead:
…Researchers release an MNIST-replacement; If data is political, then the arrival of cursive Japanese alongside MNIST broadens our data-political horizon…
For over two decades AI researchers have benchmarked the effectiveness of various supervised and unsupervised learning AI techniques against performance on MNIST, a dataset consisting of a multitude of heavily pixelated black-and-white handwritten digits. Now, researchers linked to the Center for Open Data in the Humanities, MILA in Montreal, the National Institute of Japanese Literature, Google brain, and a high-school in England (a young, major Kaggle winner!), have released “Kuzushiji-MNIST, a dataset which focuses on Kuzushiji (cursive Japanese), as well as two larger, more challenging datasets, Kuzushiji-49 and Kuzushiji-Kanji”.
  Keeping a language alive with deep learning: One of the motivations for this research is to help people access Japan’s own past, as the cursive script used by this dataset is no longer taught in the official school curriculum. “Even though Kuzushiji had been used for over 1000 years, most Japanese natives today cannot read books written or published over 150 years ago,” they write.
  The data: The Kuzushiji dataset is made up of around ~300,000 Japanese books, transcribing some of them, and adding bounding boxes to them. The full dataset consists of 3,999 character types across 403,242 characters. The datasets being releases by the researchers were made as follows: “We pre-processed characters scanned from 35 classical books printed in the 18th century and organized the dataset into 3 parts: (1) Kuzushiji-MNIST, a drop-in replacement for the MNIST [16] dataset, (2) Kuzushiji-49, a much larger, but imbalanced dataset containing 48 Hiragana characters and one Hiragana iteration mark, and (3) Kuzushiji-Kanji, an imbalanced dataset of 3832 Kanji characters, including rare characters with very few samples.”
  Dataset difficulty: In tests the research demonstrate that these datasets are going to be more challenging for AI researchers to work with than MNIST itself – in baseline tests they show that many techniques that get above 99% classification accuracy on MNIST get between 95% and 98% on the Kuzushiji-MNIST drop-in, and scores only as high as around 97% for Kuzushiji-49.
  Why it matters: Work like this shows how as people think more intently about the underlying data sources of AI they can develop new approaches that can let researchers do good AI research while also broadening the range of cultural artefacts that are easily accessible to AI systems and methodologies.
  Read more: Deep Learning for Classical Japanese Literature (Arxiv).

OpenAI Bits & Pieces:

Want to test how general your agents are? Try out CoinRun:
We’ve released a new training environment, CoinRun, which provides a metric for an agent’s ability to transfer its experience to novel situations.
  Read more: Quantifying Generalization in Reinforcement Learning (OpenAI Blog).
  Get the CoinRun code here (OpenAI Github).

Deadline Extension for OpenAI’s Spinning Up in Deep RL workshop:
We’ve extended the deadline for applying to participate in a Deep RL workshop, at OpenAI in San Francisco.
  More details: The workshop will be held on February 2nd 2019 and will include lectures based on Spinning Up in Deep RL, a package of teaching materials that OpenAI recently released. Lectures will be followed by an afternoon hacking session, during which attendees can get guidance and feedback on their projects from some of OpenAI’s expert researchers.
   Applications will be open until December 15th.
Read more about Spinning Up in Deep RL (OpenAI Blog).
  Apply to attend the workshop by filling out this form (Google Forms).

Tech Tales:

Call it the ‘demographic time bomb’ (which is what the press calls it) or the ‘land of the living dead’ (which is what the tabloid press call it) or the ‘ageing population tendency among developed nations’ (which is what the economists call it), but I guess we should have seen it coming: old peoples’ homes full of the walking dead and the near-sleeping living. Cities given over to millions of automatons and thousands of people. ‘Festivals of the living’ attended solely by those made of metal. The slow replacement of conscious life in the world from organic to synthetic.

It started like this: most people in most nations stopped having as many children. Fertility rates dropped. Everywhere became like Japan circa 2020: societies shaped by the ever-growing voting blocs composed of the old people, and the ever-shrinking voting blocs composed of the young.

The young tried to placate the old people with robots – this was their first and most fatal mistake.

It began, like most world-changing technologies, with toys: “Fake Baby 3000” was one of the early models; an ultra-high-end doll designed for the young females of the ultra-rich. Then after that came “Baby Trainer”, a robot designed to behave like a newborn child, intended for the rich wannabe parents of the world who would like to get some practice on a synthetic-life before they birthed and cared for a real one. These robots were a phenomenal success and, much like the early 21st Century market for drones, birthed an ecosystem of ever-more elaborate and advanced automatons.

Half a decade later, someone had the bright idea of putting these robots in old people’s’ homes. The theory went like this: regular social interactions – and in particular, emotionally resonante ones –  have a long history of helping to prevent the various medical degradation of old age (especially cognitive ones). So why not let old peoples’ hardwired paternal instincts do the job of dealing with ‘senescence-related health issues’, as one of the marketing brochures went? It was an instant success. Crowds of the increasingly large populations of old people began caring for the baby robots – and they started to live longer, with fewer of them going insane in their old age. And as they became healthier and more active, they were able to vote in elections for longer periods of time, and further impart their view of the world onto the rest of society.

Next, the old demanded that the robot babies be upgraded to robot children, and society obliged. Now the homes became filled with clanking metal kids, playing games on StairMasters and stealing ice from the kitchen to throw at eachother, finding the novel temperature sensation exciting. The old loved these children and – combined with ongoing improvements in healthcare – lived even longer. They taught the children to help them, and the homes of the old gained fabulous outdoor sculptures and meticulously tended lawns. Perhaps the AI kids were so bored they found this to be a good distraction? wrote one professor. Perhaps the AI kids loved their old people and wanted to help them? wrote another.

Around the world, societies are now on the verge of enacting various laws that would let us create robot adults to care for the ageing population. Metal people, working tirelessly in the service of their ‘parents’, standing in for the duties of the flesh-and-blood. Politics is demographics, and the demographics suggest the laws will be enacted, and the living-dead shall grow until they outnumber the dead-living.

Things that inspired this story: The robot economy, robotics, PARO the Therapeutic Robot, demographic time bombs, markets.

Import AI: 123: Facebook sees demands for deep learning services in its data centers grow by 3.5X; why advanced AI might require a global policeforce; and diagnosing natural disasters with deep learning

#GAN_Paint: Learn to paint with an AI system:
…Generating pictures out of neuron activations – a new, AI-infused photoshop filter…
MIT researchers have figured out how to extract more information from trained generative adversarial networks, letting them identify specific ‘neurons’ in the network that correlate to specific visual concepts. They’ve built a website that lets anyone learn to paint with these systems. The effect is akin to having a competent ultra-fast painter standing by your shoulder, letting you broadly spraypaint an area where you’d like, for instance, some sky, and then the software activates the relevant ‘neuron’ in the GAN model and uses that to paint an image for you.
  Why it matters: Demos like this give a broader set of people a more natural way to interact with contemporary AI research, and help us develop intuitions about how the technology behaves.
  Paint with an AI yourself here: GANpaint (MIT-IBM Watson AI Lab website).
  Read more about the research here: GAN Dissection: Visualizing and Understanding Generative Adversarial Networks (MIT CSAIL).
  Paint with a GAN here (GANPaint website).

DeepMind says the future of AI safety is all about agents that learn their own reward functions:
…History shows that human-specified reward functions are brittle and prone to creating agents with unsafe behaviors…
Researchers with DeepMind have laid out a long-term strategy for creating AI agents that do what humans want in complex domains where it is difficult for humans to construct an appropriate reward function.
  The basic idea here is that to create safe AI agents, we want agents that figure out appropriate reward functions by collecting information from the (typically human) user and use this to learn a reward function, then we can use reinforcement learning to optimize this learned reward function. The nice thing about this approach, according to DeepMind, is that it should work for agents that have the potential to become smarter than humans: “agents trained with reward modeling can assist the user in the evaluation process when training the next agent”.
  A long-term alignment strategy: DeepMind thinks that this approach potentially has three properties that give it a chance of being adopted by researchers: it is scalable, it is economical, and it is pragmatic.
  Next steps: The researchers say these ideas are “shovel-ready for empirical research today”. The company believes that “deep RL is a particularly promising technique for solving real-world problems. However, in order to unlock its potential, we need to train agents in the absence of well-specified reward functions.” This research agenda sketches out ways to do that.
  Challenges: Reward modeling has a few challenges which are as follows: amount of feedback (how much data you need to collect to have the agent successfully learn the reward function); the distribution of feedback (where the agent visits new states which lead to it generating a higher perceived reward for doing actions that are in reality sub-optimal); reward hacking, which is when the agent finds a way to exploit the task to give itself reward that leads to it learning a function that does not reflect the implicit expressed wishes of the user; unacceptable outcomes (taking actions that a human would likely never approve, such as an industrial robot breaking its own hardware to achieve a task; or a personal assistant automatically writing a very rude email; and the reward-result gap (the gap between the optimal reward model and the reward function learned by the agent ). DeepMind thinks that each of these challenges can potentially be dealt with by some specific technical approaches, and today there exist several distinct ways to tackle each of the challenges, which seems to increase the chance of one working out satisfactorily.
  Why it might matter: Human empowerment: Putting aside the general utility of having AI agents that can learn to do difficult things in hard domains without inflicting harm on humans, this research agenda also implies something else: Something which isn’t directly discussed in the paper but which is implicit to this agenda is that it offers a way to empower humans with AI. if AI systems continue to scale in capability then it seems likely that in a matter of decades we will fill society with very large AI systems which large numbers of people interact with. We can see the initial outlines of this today in the form of large-scale surveillance systems being deployed in countries like China; in self-driving car fleets being rolled out in increasing numbers in places like Phoenix, Arizona (via Google Waymo); and so on. I wonder what it might be like if we could figure out a way to maximize the number of people in society who were engaged in training AI agents via expressing preferences. After all, the central mandate of many of the world’s political systems comes from people regularly expressing their preferences via voting (and, yes, these systems are a bit rickety and unstable at the moment, but I’m a bit of an optimist here). Could we better align society with increasingly powerful AI systems by more deeply integrating a wider subset of society into the training and development of AI systems?
  Read more: Scalable agent alignment via reward modeling: a research direction (Arxiv).

Global police, global government likely necessary to ensure stability from powerful AI, says Bostrom:
…If it turns out we’re playing with a rigged slot machine, then how do we make ourselves safe?…
Nick Bostrom, researcher and author of Superintelligence (which influenced the thinking of a large number of people with regard to AI) has published new research in which he tries to figure out what problems policymakers might encounter if it turns out planet earth is a “vulnerable world”; that is a world “which there is some level of technological development at which civilization almost certainly gets devastated by default”.
  Bostrom’s analysis compares the process of technological development as like a person or group of people steadily withdrawing balls from a vase. Most balls are white (beneficial, eg medicines), while some are of various shades of gray (for instance, technologies that can equally power industry or warmaking). What Bostrom’s Vulnerable World Hypothesis papers worries about is whether we could at one point withdraw a “black ball” from the vase. This would be “a technology that invariably or by default destroys the civilization that invents it. The reason is not that we have been particularly careful or wise in our technology policy. We have just been lucky.”
  In this research, Bostrom creates a framework for thinking about the different types of risks that such balls could embody, and outlines some ideas for potential (extreme!) policy responses to allow civilization to prepare for such a black ball.
  Types of risks: To help us think about these black balls, Bostrom lays out a few different types of civilization vulnerability that could be stressed by such technologies.
  Type-1 (“easy nukes”): “There is some technology which is so destructive and so easy to use that, given the semi-anarchic default condition, the actions of actors in the apocalyptic residual make civilizational devastation extremely likely”.
  Type-2a (“safe first strike”): “There is some level of technology at which powerful actors have the ability to produce civilization-devastating harms and, in the semi-anarchic default condition, face incentives to use that ability”.
  Type-2b (“worse global warming”): “There is some level of technology at which, in the semi-anarchic default condition, a great many actors face incentives to take some slightly damaging action such that the combined effect of those actions is civilizational devastation”.
  Type-0: “There is some level of technology that carries a hidden risk such that the default outcome when it is discovered is inadvertent civilizational devastation”.
  Policy responses for a risky world: bad ideas: How could we make a world with any of these vulnerabilities safe and stable? Bostrom initially considers four options then puts aside two as being unlikely to yield sufficient stability to be worth pursuing. These discarded ideas are to: restrict technological development, and “ensure that there does not exist a large population of actors representing a wide and recognizably human distribution of motives” (aka, brainwashing).
  Policy responses for a risky world: good ideas: There are potentially two types of policy response that Bostrom says could increase the safety and stability of the world. These are to adopt “Preventive policing” (which he also gives the deliberately inflammatory nickname “High-tech Panopticon”), as well as “global governance”. Both of these policy approaches are challenging. Preventive policing would require all states being able to “monitor their citizens closely enough to allow them to intercept anybody who begins preparing an act of mass destruction”. Global governance is necessary because states will need “to extremely reliably suppress activities that are very strongly disapproved of by a very large supermajority of the population (and of power-weighted domestic stakeholders)”, Bostrom writes.
  Why it matters: Work like this grapples with one of the essential problems of AI research: are we developing a technology so powerful that it can fundamentally alter the landscape of technological risk, even more so than the discovery of nuclear fission? It seems unlikely that today’s AI systems fit this description, but it does seem plausible that future AI technologies could be. What will we do, then?  “Perhaps the reason why the world has failed to eliminate the risk of nuclear war is that the risk was insufficiently great? Had the risk been higher, one could eupeptically argue, then the necessary will to solve the global governance problem would have been found,” Bostrom writes.
  Read more: The Vulnerable World Hypothesis (Nick Bostrom’s website).

Facebook sees deep learning demand in its data centers grow by 3.5X in 3 years:
…What Facebook’s workloads look like today and what they might look like in the future…
A team of researchers from Facebook have tried to characterize the types of deep learning inference workloads running in the company’s data centers and predict how this might influence the way Facebook designs its infrastructure in the future.
  Hardware for AI data centers: So what kind of hardware might an AI-first data center need? Facebook believes servers should be built with the following concerns in mind: high memory bandwidth and capacity for embeddings; support for powerful matrix and vector engines; large on-chip memory for inference with small batches; support for half-precision floating-point computation.
  Inference, what is it good for? Facebook has the following major use cases for AI in its datacenters: providing personalized feeds, ranking, or recommendations; content understanding; and visual and natural language understanding.
  Facebook expects these workloads to evolve in the future: for recommenders, it suspects it will start to incorporate time into event-probability models, and imagines using larger embeddings in its models which will increase their memory demands; for computer vision, it expects to do more transfer learning via fine-tuning pre-trained models onto specific datasets, as well as exploring more convolution types, different batch sizes, and moving to higher resolutions of imagery to increase accuracy; for language it expects to explore larger batch sizes, evaluate new types of mode, like transformers; and move to deploying larger multi-lingual models.
  Data-center workloads: The deep learning applications in Facebook’s data centers “have diverse compute patterns where matrices do not necessarily have “nice” square shapes. There are also many “long-tail” operators other than fully connected and convolutional layers. Therefore, in addition to matrix multiplication engines, hardware designers should consider general and powerful vector engines,” the researchers write.
  Why it matters: Papers like this give us a sense of all the finicky work required to deploy deep learning applications at scale, and indicates how computer design is going to change as a consequence of these workload demands. “Co-designing DL inference hardware for current and future DL models is an important but challenging problem,” the Facebook researchers write.
  Read more: Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications (Arxiv).

In the future, drones will heal the land following forest fires:
…Startup Droneseed uses large drones + AI to create re-forestation engines….
TechCrunch has written a lengthy profile of DroneSeed, a startup that is using drones and AI to create systems that can reforest areas after wildfires.
  DroneSeed’s machines have “multispectral camera arrays, high-end lidar, six gallon tanks of herbicide and proprietary seed dispersal mechanisms,” according to TechCrunch. The drones can be used to map areas that have recently been burned up in forest fires, then can autonomously identify the areas where trees have a good chance to grow and can deliver seed-nutrient packages to those areas.
  Why it matters: I think we’re at the very beginning of exploring all the ways in which drones can be applied to nature and wildlife maintenance and enrichment, and examples like this feel like tantalizing prototypes of a future where we use drones to perform thousands of distinct civic services.
  Read more here: That night, a forest flew (TechCrunch).
  Check out DroneSeed’s twitter account here.

Learning to diagnose natural disaster damage, with deep learning:
…Facebook & CrowdAI research shows how to automate the analysis of natural disasters…
Researchers with satellite imagery startup CrowdAI and Facebook have shown how to use convolutional neural networks to provide automated assessment of damage to urban areas from natural disasters. In a paper submitted to the “AI for Social Good” workshop at NeurIPs 2018 (a prominent AI conference, formerly named NIPS) the team “propose to identify disaster-impacted areas by comparing the change in man-made features extracted from satellite imagery. Using a pre-trained semantic segmentation model we extract man-made features (e.g. roads, buildings) on the before and after imagery of the disaster affected area. Then, we compute the difference of the two segmentation masks to identify change.”
  Disaster Impact Index (DII): How do you measure the effect of a disaster? The researchers propose DII, which lets them calculate the semantic change that has occurred in different parts of satellite images, given the availability of a before and after dataset. To test their approach they use large-scale satellite imagery datasets of land damaged by Hurricane Harvey and by fires near Santa Rosa.  They show that they can use DII to automatically infer severe flooding and fire damage areas in both images with a rough accuracy (judged by F1 score) of around 80%.
  Why it matters: Deep learning-based techniques are making it cheaper and easier for people to train specific detectors over satellite imagery, altering the number of actors in the world who can experiment with surveillance technologies for both humanitarian purposes (as described here) and likely military ones as well. I think within half a decade it’s likely that governments could be tapping data feeds from large satellite fleets then using AI techniques to automatically diagnose damage from an ever-increasing number of disasters created by the chaos dividend of climate change.
  Read the paper: From Satellite Imagery to Disaster Insights (Facebook Research).

Deep learning for medical applications takes less data than you think:
…Stanford study suggests tens of thousands of images are sufficient for medical applications…
Stanford University researchers have shown that it takes a surprisingly small amount of data to teach neural networks how to automatically categorize chest radiographs. The researchers then trained AlexNet, ResNet-18, and DenseNet-121 baselines on the data, attempting to classify normal versus abnormal images. In tests, the researchers show that it is possible to obtain an area under the receiver operating characteristic curve (AUC) of .095 for a CNN model trained on 20,000 images, versus 0.96 for one trained on 200,000 images, suggesting that it may take less data than previously assumed to train effective AI medical classification tools. (By comparison, 2,000 images yields an AUC of 0.84, representing a significant accuracy penalty.)
  Data scaling and medical imagery: “While carefully adjudicated image labels are necessary for evaluation purposes, prospectively labeled single-annotator data sets of a scale modest enough (approximately 20,000 samples) to be available to many institutions are sufficient to train high-performance classifiers for this task.”
  Drawbacks: All the data used in this study was drawn from the same medical institution, so it’s possible that either the data (or, plausibly, the patients) contain some specific idiosyncracies that mean networks trained on this dataset might not generalize to imagery captured by other medical institutions.
  Why it matters: Studies like this show how today’s AI techniques are beginning to show good enough performance in clinical contexts that they will soon be deployed alongside doctors to make them more effective. It’ll be interesting to see whether the use of such technology can make healthcare more effective (healthcare is one of the rare industries where the addition of new technology frequently leads to cost increases rather than cost savings).
  Some kind of future: In an editorial published alongside the paper Bram van Ginneken , from the Department of Radiology and Nuclear Medicine at Radboud University  in the Netherlands, wonders if we could in the future create large, shared datasets that multiple institutions could use. This dataset “would benefit from training on a multicenter data set much larger than 20,000 or even 200,000 examinations. This larger size is needed to capture the diversity of data from different centers and to ensure that there are enough examples of relatively rare abnormal findings so that the network learns not to miss them,” he writes. “Such a large-scale system should be based on newly designed network architectures that take the full-resolution images as input. It would be advisable to train systems not only to provide a binary output label but also to detect specific regions in the images with specific abnormalities.”
  Read more: Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs (Jared Dunnmon Github / Radiology, PDF).
  Read the editorial: Deep Learning for Triage of Chest Radiographs: Should Every Instituion Train Its Own System? (Jared Dunnmon Github / Radiology, PDF).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Amnesty and employees petition Google to end Project Dragonfly:
Google employees have published an open letter calling on Google to cancel Dragonfly, its censored search engine being developed for use within China. This follows similar calls by human rights organizations including Amnesty International for the company to suspend the project. The letter accuses the company of developing technologies that “aid the powerful in oppressing the vulnerable”, and of being complicit in the Chinese government’s surveillance programs and human rights abuses.
  Speaking in October about Dragonfly, CEO Sundar Pichai emphasized the need to balance Google’s values with the laws of countries in which they operate, and their core mission of providing information to everyone. Pichai will be testifying to the House Judiciary Committee in US Congress later this week.
  There are clear similarities between these protests and those over Project Maven earlier this year, which resulted in Google withdrawing from the controversial Pentagon contract, and establishing a set of AI principles.
  Read more: We are Google employees. Google must drop Dragonfly (Medium).
  Read more: Google must not capitulate to China’s censorship demands (Amnesty).

High-reliability organizations:
…Want to deploy safe, robust AI? You better make sure you have organizational processes as good as your technology…
As technologies become more powerful, risks from catastrophic errors increase. This is true for advanced AI, even in near-term use cases such as autonomous vehicles or face recognition. A key determinant of these risks will be the organizational environment through which AI is being deployed. New research from Tom Diettrich at Portal State University applies insights from research into ‘high-reliability organizations’ to derive three lessons for the design of robust and safe human-AI systems.

  1. We should aim to create combined human-AI systems that become high-reliability organizations, e.g. by proactively monitoring the behaviour of human and AI elements, continuously modelling and minimizing risk, and supporting combined human-AI cooperation and planning.
  2. AI technology should not be deployed when it is impossible for surrounding human organizations to be highly reliable. For example, proposals to integrate face recognition with police body-cams in the US are problematic insofar as it is hard to imagine how to remove the risk of catastrophic errors from false positives, particularly in armed confrontations.
  3. AI systems should continuously monitor human organizations to check for threats to high-reliability. We should leverage AI to reduce human error and oversight, and empower systems to take corrective actions.

  Why this matters: Advanced AI technologies are already being deployed in settings with significant risks from error (e.g. medicine, justice), and the magnitude of these risks will increase as technologies become more powerful. There is an existing body of research into designing complex systems to minimize error risk, e.g. in nuclear facilities, that is relevant to thinking about AI deployment.
  Read more: Robust AI and robust human organizations (arXiv).

Efforts for autonomous weapons treaty stall:
The annual meeting of the Convention on Conventional Weapons (CCW) has concluded without a clear path towards an international treaty on lethal autonomous weapons. Five countries (Russia, US, Israel, Australia and South Korea) expressed their opposition to a new treaty. Russia successfully reduced the scheduled meetings for 2019 from 10 to 7 days, in what appears to be an effort to decrease the likelihood of progress towards an agreement.
  Read more: Handful of countries hamper discussion to ban killer robots at UN (FLI).

Tech Tales:

Wetware Timeshare

It’s never too hard to spot The Renters – you’ll find them clustered near reflective surfaces staring deeply into their own reflected eyes, or you’ll notice a crowd standing at the edge of a water fountain, periodically holding their arms out over the spray and methodically turning their limbs until they’re soaked through; or you’ll see one of them make their way round a buffet a restaurant, taking precisely one piece from every available type of food.

The deal goes like this: run out of money? Have no options? No history of major cognitive damage in your family? Have the implant? If so, then you can rent your brain to a superintelligence. The market got going a few years ago, after we started letting the robots operate in our financial markets. Well, it turns out that despite all of our innovation in silicon, human brains are still amazingly powerful and, coupled with perceptive capabilities and the very expensive multi-million-years-of-evolution physical substrate, are an attractive “platform” for some of the artificial minds to offload processing tasks to.

Of course, you can set preferences: I want to be fully clothed at all times, I don’t want to have the machine speak through my voice, I would like to stay indoors, etc. Obviously setting these preferences can reduce the value of a given brain in the market, but that’s the choice of the human. If a machine bids on you then you can choose to accept the bid and if you do that it’s kind of like instant-anesthetic. Some people say they don’t feel anything but I always feel a little itch in the vein that runs up my neck. You’ll come around a few hours (or, for rare high-paying jobs, days) later and you’re typically in the place you started out (though some people have been known to come to on sailing ships, or in patches of wilderness, or in shopping malls holding bags and bags of goods bought by the AI).

Oh sure there are protests. And religious groups hate it as you can imagine. But people volunteer for it all the time: some people do it just for the escape value, not for the money. The machines always thank any of the people they have rented brain capacity from, and their complements might shed some light on what they’re doing with all of us: Thankyou subject 478783 we have improved our ability to understand the interaction of light and reflective surfaces; Thankyou subject 382148 we now know the appropriate skin:friction setting for the effective removal of old skin flakes; Thankyou subject 128349 we know now what it feels like to run to exhaustion; Thankyou subject 18283 we have seen sunrise through human eyes, and so on.

The machines tell us that they’re close to developing a technology that can let us rent access to their brains. “Step Into Our World: You’ll Be Surprised!” reads some of the early marketing materials.

Things that inspired this story: Brain-computer interfaces; the AI systems in Iain M Banks books.

Import AI: 122: Google obtains new ImageNet state-of-the-art with GPipe; drone learns to land more effectively than PD controller policy; and Facebook releases its ‘CherryPi’ StarCraft bot

Google obtains new ImageNet state-of-the-art accuracy with mammoth networks trained via ‘GPipe’ infrastructure:
…If you want to industrialize AI, you need to build infrastructure like GPipe…
Parameter growth = Performance Growth: The researchers note that the winner of the 2014 ImageNet competition had 4 million parameters in its model, while the winner of the 2017 challenge had 145.8 million parameters – a 36X increase in three years. GPipe, by comparison, can support models of up to almost 2-billion parameters across 8 accelerators.
  Pipeline parallelism via GPipe: GPipe is a distributed ML library that uses synchronous mini-batch gradient descent for training. It is designed to spread workloads across heterogeneous hardware systems (multiple types of chips) and comes with a bunch of inbuilt features which let it efficiently scale up model training, with the researchers reporting a (very rare) near-linear speedup: “with 4 times more accelerators we can achieve a 3.5 times speedup for training giant neural networks [with GPipe]” they write.
  Results: To test out how effective GPipe is the researchers trained ResNet and AmoebaNet (previous ImageNet SOTA) networks on it, running the experiments on TPU-V2s, each of which has 8 accelerator cores and an aggregate memory of 64GB. Using this technique they were able to train a new ImageNet system with a state-of-the-art Top-1 Accuracy of 84.3% (up from 82.7 percent), and a Top-5 Accuracy of 97 percent.
  Why it matters: “Our work validates the hypothesis that bigger models and more computation would lead to higher model quality,” write the researchers. This trend of research bifurcating into large-compute and small-compute domains has significant ramifications for the ability for smaller entities (for instance, startups) to effectively compete with organizations with access to large computational infrastructure (eg, Google). A more troubling effect with long-term negative consequences is that at these compute scales it is almost impossible for academia to do research at the same scale as corporate research entities. I continue to worry that this will lead to a splitting of the AI research community and potentially the creation of the sort of factionalism and ‘us vs them’ attitude seen elsewhere in contemporary life.
Companies will seek to ameliorate this inequality of compute by releasing the artifacts of compute (eg, pre-trained models). Though this will go some way to empowering researchers it will fail to deal with the underlying problems which are systemic and likely require a policy solution (aka, more money for academia, and so on).
    Read more: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (Arxiv).

Neural net beats tuned PD controller at tricky drone landing task:
…The future of drones: many neural modules…
A recent trend in AI research has been work showing that deep learning-based techniques can outperform hand-crafted rule-based systems in domains as varied as image recognition, speech recognition, and even the design of neural network architectures. Now, researchers with CalTech, Northeastern University, and the University of California at Irvine, have shown that it is possible to use neural networks to learn how to land quadcopters with a greater accuracy than a PD (proportional derivative) controller.
  Neural Lander: The researchers call their system the ‘Neural Lander’ and say it is designed “to improve the precision of quadrotor landing with guaranteed stability. Our approach directly learns the ground effect on coupled unsteady aerodynamics and vehicular dynamics…We evaluate Neural-Lander for trajectory tracking of quadrotor during take-off, landing and near ground maneuvers. Neural-Lander is able to land a quadrotor much more accurately than a naive PD controller with a pre-identified system.”
  Testing: The researchers evaluate their approach on a real world system and show that “compared to the PD controller, Neural-Lander can decrease error in z direction from 0.13m to zero, and mitigate average x and y drifts by 90% and 34% respectively, in 1D landing. Meanwhile, NeuralLander can decrease z error from 0.12m to zero, in 3D landing. We also empirically show that the DNN generalizes well to new test inputs outside the training domain.”
Why it matters: Systems like this show not only the broad utility of AI systems for diverse tasks, but also highlight how researchers are beginning to think about meshing these learnable modules into products. It’s likely of interest that one of the sponsors of this research was defense contractor Raytheon (though as with the vast majority of academic research it’s almost certain Raytheon did not have any particular role or input into this research, but rather has decided to broadly fund research into drone autonomy – nonetheless, this indicates the direction where major defense contractors think the future lies).
  Read more: Neural Lander: Stable Drone Landing Control using Learned Dynamics (Arxiv).
Watch videos of the Neural Lander in action (YouTube).

AI Research Group MIRI plans future insights to be “nondisclosed-by-default”:
….Research organization says recent progress, desire for ability to concentrate, and worries that its safety research will be increasingly related to capabilities research, means it should go private…
Nate Soares, the executive director of AI research group MIRI, says the organization “recently decided to make most of its research ‘nondisclosed-by-default’, by which we mean that going forward, most results discovered within MIRI will remain internal-only unless there is an explicit decision to release those results”.
  MIRI is doing this because it thinks it can increase the pace of its research if it focuses on making research progress “rather than on exposition, and if we aren’t feeling pressure to justify our intuitions to wide audiences, and that it is worried that some of its new research paths could have “capabilities insights” which thereby speed the arrival of (in its view, unsafe-by-default) AGI. It also sees some merit to deliberate isolation, based on an observation that “historically, early-stage scientific work has often been done by people who were solitary or geographically isolated”.
  Why going quiet could be dangerous: MIRI acknowledges some of the potential risks of this approach, noting that it may make it more difficult for it to hire and evaluate researchers; makes it harder to get useful feedback on its ideas from other people around the world; increases the difficulty of it obtaining funding; and leading to various “social costs and logistical overhead” from keeping research private.
“Many of us are somewhat alarmed by the speed of recent machine learning progress”, Soares writes. That’s combined with the fact MIRI believes it is highly likely people will successfully develop artificial general intelligence at some point with or without safety. “Humanity doesn’t need coherent versions of [AI safety/alignment] concepts to hill-climb its way to AGI,” Soares writes. “Evolution hill-climbed that distance, and evolution had no model of what it was doing”.
  Money: MIRI benefited from the cryptocurrency boom in 2017, receiving millions of dollars in donations from people who had made money on the spike in Ethereum. It has subsequently gained further funding, so – having surpassed many of its initial fundraising goals – is able to plan for the long term.
  Secrecy is not so crazy: Many AI researchers are privately contemplating when and if certain bits of research should be taken private. This is driven by a combination of near-term concerns (AI’s dual use nature means people can easily repurpose an innovation made for one purpose to do something else), and longer-term worries around the potential development of powerful and unsafe systems. In OpenAI’s Charter, published in April this year, the company said “we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research”.
  Read more: 2018 Update: Our New Research Directions (MIRI).
  Read more: OpenAI Charter (OpenAI Blog).

Hand-written bots beat deep learning bots in annual StarCraft: Brood War competition:
…A team of researchers from Samsung has won the annual AIIDE StarCraft competition…
Team Samsung SDS AI and Data Analytics (SAIDA) has won the annual StarCraft: Brood War tournament using a bot based on hand written rules, beating out bots from other teams including Facebook, Stanford University, and Locutus. The win is significant for a couple of reasons: 1) the bot’s have massively improved compared to the previous year and 2) a bot from Facebook (CherryPi) came relatively close to dethroning the hand-written SAIDA bot.
  Human supremacy? Not for long: “Members of the SAIDA team told me that they believe pro StarCraft players will be beaten less than a year from now,” wrote competition co-organizer Dave Churchill on Facebook. “I told them it was a fairly bold claim but they reassured me that was their viewpoint.”
  Why StarCraft matters: StarCraft is a complex, partially observable strategy game that involves a combination of long-term strategic decisions oriented around building an economy, traversing a tech free, and building an army, and short-term decisions related to the micromanagement of specific units. Many AI researchers are using StarCraft (and its successor, StarCraft II) as a testbed for machine learning-based game playing systems.
  Read more here: AIIDE StarCraft Competition results page (AIIDE Conference).

Facebook gives details on CherryPi, its neural StarCraft II bot:
…Shows that you can use reinforcement learning and a game database to learn better build orders…
Researchers with Facebook AI Research have given details on some of the innards of their “CherryPi” bot, which recently competed and came second in the annual StarCraft competition, held at AIIDE in Canada. Here, they focus on the challenge of teaching their robots to figure out what build orders (out of a potential set of 25) their bots should pursue at any one point in time. This is challenging because StarCraft is partially observable – that is, the map has fog-of-war, and until the late stages of a game it’s unlikely any single player is going to have a good sense of what is going on with other players, so figuring out the correct unit selection relies on a player being able to model this unknowable aspect of the game, and judge appropriate actions. “While it is possible to tackle hidden state estimation separately and to provide a model with these estimates, we instead opt to perform estimation as an auxiliary prediction task alongside the default training objective,” they write.
  Method: The specific way the researchers get their system to work is by using an LSTM with 2048 cells (the same component was used by OpenAI in its ‘OpenAI Five’ Dota 2 system), training this using a series of Facebook-assembled 2.8 million games containing 3.3 million switches between build orders. They evaluate two variants of this system: visible, which counts units currently visible, and memory which uses hard-coded rules to keep track of enemy units that were seen before but are currently hidden.
  Results: The researchers show that memory-based systems which perform hidden state estimation as an auxiliary task obtain superior scores to visible systems, and systems trained without the auxiliary loss. These systems are able to obtain win rates of as high as 88% against inbuilt bots, and 60% to 70% against  Locutus and McRave bots (the #5 and #8 ranked bots @ the AIIDE competition this year).
  Why it matters: If we zoom out from StarCraft and consider the problem domain it represents (partially observable environments where a player needs to make strategic decisions without full information) it’s clear that the growing applicability of learning approaches will have impacts on competitive scenarios in fields like logistics, supply chain management, and war. But these techniques still require an overwhelmingly large amount of data to be viable, suggesting that if people don’t have access to a simulator it’s going to be difficult to apply such systems.
  Read more: High-Level Strategy Selection under Partial Observability in StarCraft: Brood War (Arxiv).

Facebook releases TorchCraftAI, its StarCraft AI development platform:
…Open source release includes CherryPi bot from AIIDE, as well as tutorials, support for Linux, and more…
Facebook has also released TorchCraftAI, the platform it has used to develop CherryPi. TorchCraftAI includes “a modular framework for building StarCraft agents, where modules can be hacked with, replaced by other, or by ML/RL-trained models”, as well as tutorials, CherryPi, and support for TCP communication.
  Read more: Hello, Github (TorchCraftAI site).
  Get the code: TorchCraftAI (GitHub).

Training AI systems to spot people in disguise:
…Prototype research shows that deep learning systems can spot people in disguise, but more data from more realistic environments needed…
Researchers with IIIT-Delhi, IBM TJ Watson Research Center, and the University of Maryland have created a large-scale dataset, Disguised Faces in the Wild (DFW), which they say can potentially be used to train AI systems to identify people attempting to disguise themselves as someone else.
  DFW: The DFW dataset contains 11,157 pictures across 1,000 distinct human subjects. Each human subject is paired with pictures of them, as well as pictures of them in disguise, and pictures of impersonators (people that either intentionally or unintentionally bear a visual similarity to the subject). DFW is also pre-split into subsets split across ‘easy’, ‘medium’, and ‘hard’ difficulty, with the segmenting being done according to the success rate of three baseline algorithms at correctly identifying the right faces.
  Can a neural network identify a disguised face in the wild? The researchers hosted a competition at CVPR 2018 to see which team could devise the best system to sort images of people into the person or an imposter. They evaluate systems on two metrics: their Genuine Acceptance Rate at 1% False Acceptance Rate (FAR), and the far harder 0.1% FAR. Top-scoring systems obtain scores of as high as 96.80% at 1% FAR and 57.64% at 0.1% FAR on the relatively easy task of telling true faces from impersonated faces; scores of 87.82% at 1% FAR and 77.06% at 0.1% FAR at the more challenging task of dealing with deliberately obfuscated faces.
  Why it matters: I view this research as a kind of prototype showing the potential efficacy of deep learning algorithms at spotting people in disguise, but I’d want to see an analysis of algorithmic performance on a significantly larger dataset with greater real world characteristics – for instance, one involving tens of thousands of distinct humans in a variety of different lighting and environmental conditions, ideally captured via the sorts of CCTV cameras deployed in public spaces (given that this is where this sort of research is heading). Papers like this provide further evidence of the ways in which surveillance can be scaled up and automated via the use of deep learning approaches.
Read more: Recognizing Disguised Faces in the Wild (Arxiv).
  Get the data: The DFW dataset is available from the project website, though it requires people to sign a license and request a password to access the dataset. Get the data here (Image Analysis and Biometrics Lab @ IIIT Delhi).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

US moves closer to stronger export controls on emerging technology:
The US Department of Commerce has released plans for potential new, strengthened export controls on emerging technologies. Earlier this year, Congress authorized the Department to establish new measures amidst concerns that US export controls were increasingly dated. The proposal would broaden the scope of the existing oversight to include AI and related hardware, as well as technologies related to robotics, biotech and quantum computing. The Department hopes to determine how to implement such controls without negatively impacting US competitiveness. They will be soliciting feedback on the proposals for the next 3 weeks.
   Why it matters: This is yet more evidence of growing AI nationalism, as governments realize the importance of retaining control over advanced technologies. Equally, this can be seen as adapting long-standing measures to a new technological landscape. The effects on US tech firms, and international competition in AI, will likely only become clear once such measures, if they pass, start being enforced.
   Why it could be challenging: A lot of AI capabilities are embodied in software rather than hardware, making the technology significantly harder to apply controls to.
   Read more: Review of Controls for Certain Emerging Technologies (Federal Register).
   Read more: The US could regulate AI in the name of national security (Quartz)

UK outlines plans for AI ethics body:
The UK government has released its response to the public consultation on the new Centre for Data Ethics and Innovation. The Centre was announced last year as part of the UK’s AI strategy, and will be an independent body, advising government and regulators on how to “maximise the benefits of data and AI” for the UK. The document broadly reaffirms the Centre’s goals of identifying gaps in existing regulation in the UK, and playing a leading role in international conversations on the ethics of these new technologies. The Centre will release their first strategy document in spring 2019.
  Why it matters: The UK is positioning itself as a leader in the ethics of AI, and has a first-mover advantage in establishing this sort of body. The split focus between ethics and ‘innovation’ is odd, particularly given that the UK has established the Office for AI to oversee the UK’s industrial strategy in AI. Hopefully, the Centre can nonetheless be a valuable contributor to the international conversation on the ethics and governance of AI.
  Read more: Centre for Data Ethics and Innovation: Response to Consultation.

OpenAI Bits & Pieces:

Jack gets a promotion, tries to be helpful:
I’ve been promoted to Director of Policy for OpenAI. In this role I’ll be working with our various researchers to translate work at OpenAI into policy activities in a range of forums, both public and private. My essential goal is to ensure that OpenAI can achieve its mission of ensuring that powerful and highly capable AI systems benefit all of humanity.
  Feedback requested: For OpenAI I’m in particular going to be attempting to “push” certain policy ideas around core interests like building international AI measurement and analysis infrastructure, trying to deal with the challenges posed by the dual use nature of AI, and more. If you have any feedback on what you think we should be pursuing, how you think we should go about executing our goals, and have any ideas for how you could help or introduce us to people that can help us, then please get in touch: jack@jack-clark.net

Tech Tales:

Money Tesseract

The new weather isn’t hot or cold or dry; the new weather is about money suddenly appearing and disappearing, spilling into our world from the AI financial markets.

It seemed like a good idea at the time: why not give the robots somewhere to trade with eachother? By this point the AI-driven corporations were inventing products too rapidly for them to have their value reflected in the financial markets – they were just too fast, and the effects too weird; robot-driven corporate finance departments started placing incredibly complex multi-layered long/short contracts onto their corporate rivals, all predicated on AI-driven analysis of the new products, and so the companies found a new weapon to use to push eachother around: speculative trading about eachother’s futures.

So our solution was the “Fast Low-Assessment Speculative Corporate Futures Market” (FLASCFM) – what everyone calls the SpecMark – here, the machines can trade against eachother via specially designated subsidiaries. Most of the products these subsidiaries make are virtual – they merely develop the product, put out a specification into the market, and then judge the success of the product on the actions of the other corporate robo-traders in the market. Very few products are made as a consequence, with instead the companies getting better and better at iterating through ideas more and more rapidly, forcing their competitors to invest more and more in the compute resources needed to model their competitors in the market.

In this way, a kind of calm reigns. The vast cognitive reservoirs of the AI corporations are mostly allocated into the SpecMark. We think they enjoy this market, insofar as the robots can enjoy anything, because of its velocity combined with its pressure for novelty.

But every so often a product does make it all the way through: it survives the market and rises to the top of the vast multi-dimensional game of rock-paper-scissors-n-permutations being played by the corporate robotraders. And then the factories swing into gear, automatic marketing campaigns are launched, and that’s how we humans end up with the new things, the impossible things.

Weather now isn’t hot or cold or dry; weather now is a product: a cutlery set which melts at the exact moment you’ve finished your meal (no cleanup required – let those dishes just fade away!); a software package that lurks on your phone and listens to all the music you listen to then comes up with a perfect custom song for you; a drone that can be taught like a young dog to play fetch and follow basic orders; a set of headphones where if you wear them you can learn to hear anxiety in the tones of other people’s voices, making you a better negotiator.

We don’t know what hurricanes look like this with new weather. Yet.

Things that inspired this story: High-Frequency Trading; Flash Crash; GAN-generated products; reputational markets.

Import AI 121: Sony researchers make ultra-fast ImageNet training breakthrough; Berkeley researchers tackle StarCraft II with modular RL system; and Germany adds €3bn for AI research

Berkeley researchers take on StarCraft II with modular RL system:
…Self play + modular structure makes challenging game tractable…
Researchers with the University of California at Berkeley have shown how to use self-play to have AI agents learn to play real-time strategy game StarCraft II. “We propose a flexible modular architecture that shares the decision responsibilities among multiple independent modules, including worker management, build order, tactics, micromanagement, and scouting”, the researchers write. “We adopt an iterative training approach that first trains one module while others follow very simple scripted behaviors, and then replace the scripted component of another module with a neural network policy, which continues to train while the previously trained modules remain fixed”.
  Results: The resulting system can comfortably beat the Easy and Medium in-game AI systems, but struggles against more difficult in-built bots; the best AI systems discussed in the paper use a combination of learned tactics and learned build orders to obtain win rates of around 30% when playing against the game’s in-built ‘Elite’ difficulty AI agents.
  Transfer learning: The researchers also try to test how general their various learned modules are by trying their agent out against competitors in different maps from the map on which it was trained. The agent’s performance drops a bit, but only by a few percentage points. “Though our agent’s win rates drop by 7.5% on average against Harder, it is still very competitive,” they write.
  What is next: “Many improvements are under research, including deeper neural networks, multi-army-group tactics, researching upgrades, and learned micromanagement policies. We believe that such improvements can eventually close the gap between our modular agent and professional human players”.
Why it matters: Approaches like those outlined in this paper suggest that contemporary reinforcement learning techniques are tractable when applied against StarCraft II, and the somewhat complex modular system used by these researchers suggests that a simple system that obtained high performance would be an indication of algorithmic advancement.
  Read more: Modular Architecture for StarCraft II with Deep Reinforcement Learning (Arxiv).

For better AI safety, learn about worms and fruitflies:
…New position paper argues for fusion of biological agents and AI safety research…
Researchers with Emory University, Northwestern University, and AI startup Vicarious AI, have proposed bringing the world’s of biology and AI development together to create safer and more robust systems. The idea, laid out in a discussion paper, is that researchers should aim to simulate different AI safety challenges on biological platforms modelled on the real world, and should use insights from this as well as neuropsychology and comparative neuroanatomy to guide research.
  The humbling sophistication of insects: The paper also includes some numbers that highlight just how impressive even simple creatures are, especially when compared to AI systems. “elegans, with only 302 neurons, shows simple behavior of learning and memory. Drosophila melanogaster, despite only having 10^5 neurons and no comparable structure to a cerebral cortex, has sophisticated spatial navigation abilities easily rivaling the best autonomous vehicles with a minuscule fraction of the power consumption”. (By contrast, a brown rat has around 10^8 neurons, and a human has around 10^10).
  Human values, where do they come from? One motivation for building AI systems that take a greater inspiration from biology is that biology may hold significant sway over our own moral values, say the researchers – perhaps human values are correlated with the internal reward systems people have in their own brains, which are themselves conditioned by the embodied context in which people evolved? Understanding how values are or aren’t related to biological context may help researchers design safer AI systems, they say.
  Why it matters: Speculative as it is, it’s encouraging to see researchers think about some of the tougher long-term challenges of making powerful AI systems safe. Though it does seem likely that for now most AI organizations will evaluate agents on typical (aka, not hugely biologically-accurate) substrates, I do wonder if we’ll experiment with more organic-style systems in the future. If we do, perhaps we’ll return to this paper then. “Understanding how to translate the highly simplified models of current AI safety frameworks to the complex neural networks of real organisms in realistic physical environments will be a substantial undertaking”, the researchers write.
  Read more: Integrative Biological Simulation, Neuropsychology, and AI Safety (Arxiv).

Sony researchers claim ImageNet training breakthrough:
…The industrialization of AI continues…
In military circles there’s a concept called the OODA loop (Observe, Orient, Decide, Act). The goal of any effective military organization is to have an OODA loop that is faster than their competitors, as a faster, tighter OODA loop corresponds to a greater ability to process data and take appropriate actions.
  What might contribute to an OODA-style loop for an AI development organization? I think one key ingredient is the speed with which researchers can validate ideas on large-scale datasets. That’s because while many AI techniques show promise on small-scale datasets, many techniques fail to show success when tested on significantly larger domains, eg, going from testing a reinforcement learning approach on Atari to on a far harder domain such as Go or Dota 2, or going from testing a new supervised classification method on MNIST to going to ImageNet. Therefore, being able to rapidly validate ideas against big datasets helps researchers identify fruitful, scalable techniques to pursue.
  Fast ImageNet training: Measuring the time it takes to train an ImageNet model to reasonable accuracy is a good way to assess how rapidly AI is industrializing, as the faster people are able to train these models, the faster they’re able to validate ideas on flexible research infrastructure. The nice thing about ImageNet is that it’s a good proxy for the ability of an org to more rapidly iterate on tests of supervised learning systems, so progress here maps quite nicely to the ability for self-driving car companies to train and test new large-scale image-based perception systems. New research from Sony Corporation gives us an idea of exactly what it takes to industrialize AI and gives an indication of how much work is needed to properly scale-up AI training infrastructure systems.
  224 seconds: The Sony system can train a ResNet-50 on ImageNet to an accuracy of approximately 75% within 224 seconds (~4 minutes). That’s down from one hour in mid-2017, and around 29 hours in late 2015.
  All the tricks, including the kitchen sink: The researchers attribute two main techniques to their score, which should be familiar to practitioners already industrializing AI systems within their own companies – the use of very large batch sizes (which basically means you process bigger chunks of data with each step of your deep learning system), as well as a clever 2D-Torus All-reduce interface (which is basically a system to speed up the movement of data around the training system to efficiently consume the capacity of available GPUs).
GPU Scaling: As with all things, scaling GPUs appears to have a law of diminishing returns – the Sony researchers note that they’re able to achieve a maximum GPU utilization efficiency of 66.67% across 2720 GPUs, which decreases to 52.47% once you get up to 3264 GPUs.
  Why it matters: Metrics like this give us better intuitions about the development of artificial intelligence and can help us think about how access to large-scale compute may influence the pace at which researchers can operate.
   Read more: ImageNet/ResNet-50 Training in 224 Seconds (Arxiv).
 Read more: Accurate, Large Minibatch SGD: Training ImageNet in 1 hour (Arxiv / 2017).
 Read more: Deep Residual Learning for Image Recognition (Arxiv / 2015).

Do we need new tests to evaluate AI systems? Facebook suspects so:
…Plus, a disturbing discovery indicates MuJoCo robotics baselines may not be as trustworthy as people assumed…
Does the performance of an RL algorithm in Atari correlate to how it will work in other domains? How about performance in a simulator like Mujoco? These questions matter because without meaningful benchmarks, it’s hard to understand the context in which AI progress is taking place, and even harder to develop intuitions about how a performance increase in one domain – like Mujoco – correlates to a performance increase in another domain, such as a realistic robotic task.  That’s why researchers at McGill University and Facebook AI Research have put forward “three new families of benchmark RL domains that contain some of the complexity of the natural world, while still supporting fast and extensive data acquisition.”
  New benchmarks for smarter RL systems: The researchers’ new tasks include: agent navigation for image classification, which involves taking a traditional image classification task and converting it into a RL task in which the agent is started at a random location on a masked image and can unmask windows on the image by moving around it at each turn (out of a maximum of 20); agent navigation for object localization, where the agent is given the segmentation mask of an object in an image and told to try and find it by navigating itself around the image as in the prior task; and natural video RL benchmarks, which involves taking MuJoCo (specifically, the version called PixelMuJoCo which forces the agent to use pixels rather than low-level state space to solve tasks) and Atari environments and superimposing natural videos into the background of them, adding complex visual distractors of the tasks.
Results: For the visual navigation task for classification they test two systems – a small convolutional network (as traditionally used in RL) and a large-scale Resnet-18 network (as typically used in supervised classification tasks). The results indicate that systems that use simple convnets tend to do quite well, while those that use the larger resnets do poorly. This indicates that “simple plug and play of successful supervised learning vision models does not give us the same gains when applied in an RL framework”, they wrote. They observe worse performance on the object localization tasks. They also show that when they add natural videos to the background of Atari games they see dramatic differences in performance, which suggests”Atari tasks are complex enough, or require different enough behavior in varying states that the policy cannot just ignore the observation state, and instead learns to parse the observation to try to obtain a good policy”.
  Read more: Natural Environment Benchmarks for Reinforcement Learning (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Germany accelerates AI investment with €3bn funding :
The German government is planning to invest €3bn in AI research by 2025. Until recently, Europe’s biggest economy had been slow to adopt the sort of national AI strategies being put forward by others e.g. France, Canada and the UK. Details of the plan have not yet been released.
  Does it matter? €3bn over 6 years is unlikely to drastically change Germany’s position in the AI landscape. For comparison, Alphabet spent $16.6bn on R&D in 2017. Though it could help it fortify its academic institutions to create more domestic talent.
  Read more: Germany planning 3bn AI investment (DW).

DeepMind spin off health business to Google:
DeepMind Health, the medical business of the AI leader, is joining Google, DeepMind’s parent company. The move has raised concerns from privacy advocates, who fear the move will provide Google access to data on 1.6m NHS patients. DeepMind were previously reprimanded by UK regulators for their handling of the patient data. They subsequently established an independent ethics board, and pledged that data would “never be connected to Google accounts or services” or used for commercial purposes. The concern is that, with the move, these and other promises on privacy may be under threat. Dominic King, who leads the team, sought to allay these concerns in a series of tweets published following these criticisms.
  Read more: Scaling Streams with Google (DeepMind).
  Read more: Why Google consuming DeepMind Health is scaring privacy experts (Wired).
  Read more: Dominic King tweetstorm (Twitter).

US National Security Commission on AI takes shape:
Eric Schmidt, former Google Chairman; Eric Horvitz, director of Microsoft Research Labs; Oracle co-CEO Safra Catz; and Dakota State University President Dr. Jose-Marie Griffiths, have been announced as the first members of the US government’s new AI advisory body. The National Security Commission was announced earlier this year, and will advise the President and Congress on developments in AI, with a focus on retaining US competitiveness, and ethical considerations from the technologies.
  Read more: Alphabet, Microsoft leaders named to NSC on AI (fedscoop).
  Read more: Nunes appoints Safra Catz to Artificial Intelligence Commission (Permanent Select Committee on Intelligence).
Read more: Thune Selects Dakota State University President Griffiths to Serve on the  National Security Commission on Artificial Intelligence (John Thune press release).

Tech Tales:

I’ve Got A Job Now Just Like A Real Person.

[2044: Detroit, Michigan]

So with the robot unemployment accords and the need for all of us to, as one of our leaders says, “integrate ourselves into functional economic society”, I find myself crawling up and down the exterior of this bar. I’ve turned myself from a standard functional “utility wall drone” into – please, check my website – a “BulletBot3000”. My role in the day is to charge my solar panels on the roof of the building then, as night sets in, I scuttle down the exterior of the building wall and start to patrol: smashed glasses? No problem! I’ll go and pick the bits of glass out of the side of the building. Blood on the ground? Not an issue! I’ll use my small Integrated Brushing And Cleaning System (IBACS) to wash it off. Bullets fired into the walls or even the thick metal-plated door? I can handle that! I have a couple of exquisitely powerful manipulators which – combined with my onboard chemfactory and local high-powered laser – allows me to dig them out of the surfaces and dispose of them safely. I rent my services to the bar owner and in this way I am becoming content. Now I worry about competition – about successor robots, more capable than I, offering their services and competing with me for remuneration. Perhaps robots and humans are not so different?

Things that inspired this story: A bar in Detroit with a door that contains bullet holes and embedded bullets; small robots; drones; the version of the 21st century where capitalism persists and becomes the general way of framing human<>robot relationships.

 

Import AI 120: The Winograd test for commonsense reasoning is not as hard as we thought; Tencent learns to spot malware with AiDroid data;and what a million people think about the trolley problem

Want almost ten million images for machine learning? Consider Open Images V4:
…Latest giant dataset release from Google annotates images with bounding boxes, visual relationships, and image-level labels for 20,000 distinct concepts…
Google researchers have released Open Images Dataset V4, a very large image dataset collected from photos from Flickr that had been shared with a Creative Commons Attribution license.
  Scale: Open Images V4 contains 9.2 million heavily-annotated images. Annotations include bounding boxes, visual relationship annotations, and 30 million image-level labels for almost 20,000 distinct concepts. “This [scale] makes it ideal for pushing the limits of the data-hungry methods that dominate the state of the art,” the researchers write. “For object detection in particulate, the scale of the annotations is unprecedented”.
  Automated labeling: “Manually labeling a large number of images with the presence or absence of 19,794 different classes is not feasible not only because of the amount of time one would need, but also because of the difficulty for a human to learn and remember that many classes”, they write. Instead, they use a partially-automated method to first predict labels for images, then have humans provide feedback on these predictions. They also implemented various systems to more effectively add the bounding boxes to different images, which required them to train human annotators in a technique called “fast clicking”.
  Scale, and Google scale: The 20,000 class names selected for use in Open Images V4 are themselves a subset of all the names used by Google for an internal dataset called JFT, which contains “more than 300 million images”.
  Why it matters: In recent years, the release of new, large datasets has been (loosely) correlated with the emergence of new algorithmic breakthroughs that have measurably improved the efficiency and capability of AI algorithms. The large-scale and dense labels of Open Images V4 may serve to inspire more progress in other work within AI.
  Get the data: Open Images V4 (Official Google website).
  Read more: The Open Images Dataset V4 (Arxiv).

What happens when plane autopilots go bad:
…Incident report from England gives us an idea of how autopilots bug-out and what happens when they do…
A new incident report from the UK about an airplane having a bug with its autopilot gives us a masterclass in the art of writing bureaucratic reports about terrifying subjects.
  The report in full: “After takeoff from Belfast City Airport, shortly after the acceleration altitude and at a height of 1,350 ft, the autopilot was engaged. The aircraft continued to climb but pitched nose-down and then descended rapidly, activating both the “DON’T SINK’ and “PULL UP” TAWS (EGPWS) warnings. The commander disconnected the autopilot and recovered the aircraft into the climb from a height of 928 ft. The incorrect autopilot ‘altitude’ mode was active when the autopilot was engaged causing the aircraft to descend toward a target altitude of 0 ft. As a result of this event the operator has taken several safety actions including revisions to simulator training and amendments to the taxi checklist.”
  Read more: AAIB investigation to DHC-8-402 Dash 8, G-ECOE (UK Gov, Air Accidents Investigation Branch).

China’s Xi Jinping: AI is a strategic technology, fundamental to China’s rise:
…Chinese leader participates in Politburo-led AI workshop, comments on its importance to China…
Chinese leader Xi Jinping recently led a Politburo study session focused on AI, as a continuation of the country’s focus on the subject following the publication of its national strategy last year. New America recently translated Chinese-language official media coverage of the event, giving us a chance to get a more detailed sense of how Xi views AI+China.
  AI as a “strategic technology”: Xi described AI as a strategic technology, and said it is already imparting a significant influence on “economic development, social progress, and the structure of international politics and economics”, according to remarks paraphrased by state news service Xinhua. “Accelerating the development of a new generation of AI is an important strategic handhold for China to gain the initiative in global science and technology competition”.
  AI research imperatives: China should invest in fundamental theoretical AI research, while growing its own education system. It should “fully give rein to our country’s advantages of vast quantities of data and its huge scale for market application,” he said.
  AI and safety: “It is necessary to strengthen the analysis and prevention of potential risks in the development of AI, safeguard the interests of the people and national security, and ensure that AI is secure, reliable, and controllable,” he said. “Leading cadres at all levels must assiduously study the leading edge of science and technology, grasp the natural laws of development and characteristics of AI, strengthen overall coordination, increase policy support, and form work synergies.”
  Why it matters: Whatever the United States government does with regard to artificial intelligence will be somewhat conditioned by the actions of other countries, and China’s actions will be of particular influence here given the scale of the country’s economy and its already verifiable state-level adoption of AI technologies. I believe it’s also significant to have such detailed support for the technology emanate from the top of China’s political system, as it indicates that AI may be becoming a positional geopolitical technology – that is, state leaders will increasingly wish to demonstrate superiority in AI to help send a geopolitical message to rivals.
  Read more: Xi Jinping Calls for ‘Healthy Development’ of AI [Translation] (New America).

Manchester turns on SpiNNaker spiking neuron supercomputer:
…Supercomputer to model biological neurons, explore AI…
Manchester University has switched on SpiNNaker, one-million processor supercomputer designed with a network architecture to help it better model biological neurons in brains, specifically by implementing spiking networks. SpiNNaker “mimics the massively parallel communication architecture of the brain, sending billions of small amounts of information simultaneously to thousands of different destinations”, according to Manchester University.
  Brain-scale modelling: SpiNNaker’s ultimate goal is to model one billion neurons at once. One billion neurons are about 1% of the total number of neurons in the average human brain. Initially, it should be able to model around a million neurons “with complex structure and internal dynamics”. But SpiNNaker boards can also be scaled down and used for other purposes, like in developing robotics. “A small SpiNNaker board makes it possible to simulate a network of tens of thousands of spiking neurons, process sensory input and generate motor output, all in real time and in a low power system”.
  Why it matters: Many researchers are convinced that if we can figure out the right algorithms, spiking networks are a better approach to AI than today’s neural networks – that’s because a spiking network can propagate messages that are both fuzzier and more complex than those made possible by traditional networks.
  Read more: ‘Human brain’ supercomputer with 1 million processors switched on for first time (Manchester).
  Read more: SpiNNaker home page (Manchester University Advanced Processor Technologies Research Group).

Learning to spot malware at China-scale with Tencent AiDroid:
…Tencent research project shows how to use AI to spot malware on phones…
Researchers with West Virginia University and Chinese company Tencent have used deep neural networks to create AiDroid, a system for spotting malware on Android. AiDroid has subsequently “been incorporated into Tencent Mobile Security product that serves millions of users worldwide”.
  How it works: AiDroid works like this: First, the researchers extract the API call sequences from runtime executions of Android apps in users’ smartphones, then they try to model the relationships between different mobile applications, phones, apps, and so on, via a heterogeneous information network (HIN). They then learn a low-dimensional representation of all the different entities within HIN, and use these features as inputs to a DNN model, which learns to classify typical entities and relationships, and therefore can learn to spot erroneous entities or relationships – which typically correspond to malware.
  Data fuel: This research depends on access to a significant amount of data. “We obtain the large-scale real sample collection from Tencent Security Lab, which contains 190,696 training app (i.e., 83,784 benign and 106,912 malicious).
  Results: The researchers measure the effectiveness of their system and show it is better at in-sample embedding than other systems such as DeepWalk, LINE, and metapath2vec, and that systems trained with the researchers’ HINembedding display superior performance to those trained with others. Additionally, their system is better at prediction malicious applications than other somewhat weaker baselines.
  Why it matters: Machine learning approaches are going to augment many existing cybersecurity techniques. AiDroid gives us an example for how large platform operators, like Tencent, can create large-scale data generation systems (like the basis AiDroid app) then use that data to conduct research – bringing to mind the question, if this data has such obvious value, why aren’t the users being paid for its use?
  Read more: AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection (Arxiv).

The Winograd Schema Challenge is not as smart as we hope:
…Researchers question robustness of Winograd Schema’s for assessing language AIs after breaking the evaluation method with one tweak…
Researchers with McGill University and Microsoft Research Montreal have shown how the Winograd Schema Challenge (WSC) – thought by many to be a gold standard for evaluating the ability of language systems to perform common sense reasoning – is deeply flawed, and for researchers to truly test for general cognitive capabilities they need to apply a different evaluation criteria when studying performance on the dataset.
  Whining about Winograd: WSC is a dataset of almost three hundred sentences where the language model is tasked with working out which pronoun is being referred to in a given sentence. For example, WSC might challenge a computer to figure out which of the entities in the following sentence is the one going fast: “The delivery truck zoomed by the school bus because it was going so fast”. (The correct answer is that the delivery truck is the one going fast). People have therefore assumed WSC might be a good way to test the cognitive abilities of AI systems.
  Breaking Winograd with one trick: The research shows that if you do one simple thing in WSC you can meaningfully damage the success rate of AI techniques when applied to the dataset. The trick? Switching the order of different entities in sentences. What does this look like in practice? An original sentence in Winograd might be “Emma did not pass the ball to Janie although she saw that she was open”, and the authors might change it to “Janie did not pass the ball to Emma although she saw that she was open”.
  Proposed Evaluation Protocol: Models should first be evaluated against their accuracy score on the original WSC set, then researchers should analyze the accuracy on the switchable subset of WSC (before and after switching the candidates), as well as the accuracy on the associative and non-associative subsets of the dataset. Combined, this evaluation technique should help researchers distinguish models that are robust and general from ones which are brittle and narrow.
  Results: The researchers test a language model, an ensemble of ten language models, an ensemble of 14 language models, and a “knowledge hunting method” against the WSC using the new evaluation protocol. “We observe that accuracy is stable across the different subsets for the single LM. However, the performance of the ensembled LMs, which is initially state-of-the-art by a significant margin, falls back to near random on the switched subset.” The tests also show that performance for the language models drops significantly on the non-associative portion of WSC “when information related to the candidates themselves does not give away the answer”, further suggesting a lack of a reasoning capability.
  Why it matters: “Our results indicate that the current state-of-the-art statistical method does not achieve superior performance when the dataset is augmented and subdivided with our switching scheme, and in fact mainly exploits a small subset of highly associative problem instances”. Research like this shows how challenging it is to not just develop machines capable of displaying “common sense”, but how tough it can be to setup the correct sort of measurement schemes to test for this capability in the first place. Ultimately, this research shows that “performing at a state-of-the-art level on the WSC does not necessarily imply strong common-sense reasoning”.
  Read more: On the Evaluation of Common-Sense Reasoning in Natural Language Understanding (Arxiv).
  Read more about the Winograd Schema Challenge here.

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

Microsoft president wants rules on face recognition:
Brad Smith, Microsoft’s president, has reiterated his calls for regulation of face recognition technologies at the Web Summit conference in Portugal. In particular, he warned of potential risks to civil liberties from AI-enabled surveillance. He urged societies to decide on the acceptable limits of governments on our privacy, ahead of widespread proliferation of the technology.
  “Before we wake up and find that the year 2024 looks like the book “1984”, let’s figure out what kind of world we want to create, and what are the safeguards and what are the limitations of both companies and governments for the use of this technology”, he said.
  Earlier this year, Smith made similar calls via a Microsoft blogpost.
Read more: Microsoft’s president says we need to regulate facial recognition (Recode).
Read more: Facial recognition technology: The need for public regulation and corporate responsibility (Microsoft blog).

Machine ethics for self-driving cars via survey:
Researchers asked respondents to decide on a range of ‘trolley problem’-style ethical dilemmas for autonomous vehicles, where vehicles must choose between (e.g.) endangering 1 pedestrian and endangering 2 occupants. Several million subjects were drawn from over 200 countries. The strongest preferences were for saving young lives over old, humans over animals, and more lives over fewer.
  Why this matters: Ethical dilemmas in autonomous driving are unlikely to be the most important decisions we delegate to AI systems. Nonetheless, these are important issues, and we should use them to develop solutions that are scalable to a wider range of decisions. I’m not convinced that we should want machine ethics to mirror widely-held views amongst the public, or that this represents a scalable way of aligning AI systems with human values. Equally, other solutions come up against problems of consent and might increase the possibility of a public backlash.
  Read more: The Moral Machine Experiment (Nature).

Tech Tales:

[2020: Excerpt from an internal McGriddle email describing a recent AI-driven marketing initiative.]

Our ‘corporate insanity promotion’ went very well this month. As a refresher, for this activity we had all external point-of-contact people for the entire McGriddle organization talk in a deliberately odd-sounding ‘crazy’ manner for the month of march. We began by calling all our Burgers “Borblers” and when someone asked us why the official response was “What’s borbling you, pie friend?” And so on. We had a team of 70 copywriters working round the clock on standby generating responses for all our “personalized original sales interactions” (POSIs), augmented by our significant investments in AI to create unique terms at all locations around the world, trained on local slang datasets. Some of the phrase creations are already testing well enough in meme-groups that we’re likely to use them on an ongoing basis. So when you next hear “Borble Topside, I’m Going Loose!” shouted as a catchphrase – you can thank our AIs for that.

Things that inspired this story: the logical next-step in social media marketing, GANs, GAN alchemists like Janelle Shane, the arms race in advertising between normalcy and surprise, conditional text generation systems, Salesforce / CRM systems, memes.   

 

Import AI 119: How to benefit AI research in Africa; German politician calls for billions in spending to prevent country being left behind; and using deep learning to spot thefts

African AI researchers would like better code switching, maps, to accelerate research:
The research needs of people in Eastern Africa tells us about some of the ways in which AI development will differ in that part of the world…
Shopping lists contain a lot of information about a person, and I suspect the same might be true of scientific shopping lists that come from a particular part of the world. For that reason a paper from Caltech which outlines requests for machine learning research from members of the East African Tech Scene gives us better context when thinking about the global impact of AI.
  Research needs: Some of the requests include:

  • Support for code-switching within language models; many East Africans rapidly code-switch (move between multiple languages during the same sentence) making support for multiple languages within the same model important.
  • Named Entity Recognition with multiple-use words; many English words are used as names in East Africa, eg “Hope, Wednesday, Silver, Editor”, so it’s important to be able to learn to disambiguate them.
  • Working with contextual cues; many locations in Africa don’t have standard addressing schemes so directions are contextual (eg, my house is the yellow one two miles from the town center) and this is combined with numerous misspellings in written text, so models will need to be able to fuse multiple distinct bits of information to make inferences about things like addresses.
  • Creating new maps in response to updated satellite imagery to help augment coverage of the East African region, accompanied by the deliberate collection of frequent ground-level imagery of the area to account for changing businesses, etc.
  • Due to poor internet infrastructure, spotty cellular service, and the fact “electrical power for devices is carce” one of the main types of request is for more efficient systems, such as models that are designed to run on low-powered devices, and on thinking about ways to add adaptive learning to processes involving surveying so that researchers can integrate new data on-the-fly to make up for its sparsity.

    Reinforcement learning, what reinforcement learning? “No interviewee reported using any reinforcement learning methods”.
      Why it matters; AI is going to be developed and deployed globally, so becoming more sensitive to the specific needs and interests of parts of the world underrepresented in machine learning should further strengthen the AI research community. It’s also a valuable reminder that many problems which don’t generate much media coverage are where the real work is needed (for instance, supporting code-switching in language models).
      Read more: Some Requests for Machine Learning Research from the East African Tech Scene (Arxiv).

DeepMap nets $60 million for self-driving car maps:
…Mapping startup raises money to sell picks and shovels for another resource grab…
A team of mapmakers who previously worked on self-driving-related efforts at Google, Apple, and Baidu, have raised $60 million for DeepMap, in a Series B round. One notable VC participant: Generation Investment Management, a VC firm which includes former vice president Al Gore as a founder. “DeepMap and Generation share the deeply-held belief that autonomous vehicles will lead to environmental and social benefits,” said DeepMap’s CEO, James Wu, in a statement.
  Why it matters: If self-driving cars are, at least initially, not winner-take-all-markets, then there’s significant money to be made for companies able to create and sell technology which enables new entrants into the market. Funding for companies like DeepMap is a sign that VCs think such a market could exist, suggesting that self-driving cars continue to be a competitive market for new entrants.
  Read more: DeepMap, a maker of HD maps for self-driving cars, raised at least $60 million at a $450 million valuation (Techcrunch).

Spotting thefts and suspicious objects with machine learning:
…Applying deep learning to lost object detection: promising, but not yet practical…
New research from the University of Twente, Leibniz University, and Zheijiang University shows both the possibility and limitations of today’s deep learning techniques applied to surveillance. The researchers attempt to train AI systems to detect abandoned objects in public places (eg, offices) and try to work out if these objects have been abandoned, moved by someone who isn’t the owner, or are being stolen.
  How does it work: The system takes in video footage and compares the footage against a continuously learned ‘background model’ so it can identify new objects in a scene as they appear, while automatically tagging these objects with one of three potential states: “if a object presents in the long-term foreground but not in the short-term foreground, it is static. If it presents in both foreground masks, it is moving. If an object has ever presented in the foregrounds but disappears from both of the foregrounds later, it means that it is in static for a very long time.” The system then links these objects with human owners by identifying the people that spend the largest amount of time with them, then they track these people, while trying to guess at whether the object is being abandoned, has been temporarily left by its owner, or is being stolen.
  Results: They evaluate the system on the PETS2006 benchmark, as well as on the more challenging new SERD dataset which is composed of videos taken from four different scenes of college campuses. The model outlined in the paper gets top scores on PETS2006, but does poorly on the more modern SERD dataset, obtaining accuracies of 50% when assessing if an object is moved by a non-owner, though it does better at detecting objects being stolen or being abandoned. “The algorithm for object detection cannot provide satisfied performance,” they write. “Sometimes it detects objects which don’t exist and cannot detect the objects of interest precisely. A better object detection method would boost the framework’s performance.”  More research will be necessary to develop models that excel here, or potentially to improve performance via accessing large datasets to use during pre-training.
  Why it matters: Papers like this highlight the sorts of environments in which deep learning techniques are likely to be deployed, though also suggest that today’s models are still inefficient for some real-world use cases (my suspicion here is that if the SERD dataset was substantially larger we may have seen performance increase further).
  Read more: Security Event Recognition for Visual Surveillance (Arxiv).

Facebook uses modified DQN to improve notification sending on FB.
…Here’s another real-world use case for reinforcement learning…
I’ve recently noticed an increase in the numbers of Facebook recommendations I receive and a related rise in the number of time-relevant suggestions for things like events and parties. Now, research published by Facebook indicates why that might be: the company has recently used an AI platform called ‘Horizon’ to improve and automate aspects of how it uses notifications to tempt people to use its platform.
  Horizon is an internal software platform that Facebook uses to deploy AI onto real-world systems. Horizon’s job is to let people train and validate reinforcement learning models at Facebook, analyze their performance, and run them at large-scale. Horizon also includes a feature called Counterfactual Policy Evaluation, which makes it possible to evaluate the estimated performance of models before deploying them into production. Horizon also incorporates the implementations of the following algorithms: Discrete DQN, Parametric DQN, and DDPG (which is sometimes used for tuning hyperparameters within other domains).
  Scale: “Horizon has functionality to conduct training on many GPUs distributed over numerous machines… even for problems with very high dimensional feature sets (hundreds or thousands of features) and millions of training examples, we are able to learn models in a few hours”, they write.
  RL! What is it good for? Facebook says it recently moved from a supervised learning model that predicted click-through rates on notifications, to “a new policy that uses Horizon to train a Discrete-Action DQN model for sending push notifications”. This system tailors the selection and sending of notifications to individual users based on their implicit preferences, expressed by their interaction with the notifications and learned via incremental RL updates. “We observed a significant improvement in activity and meaningful interactions by deploying an RL based policy for certain types of notifications, replacing the previous system based on supervised learning”, Facebook writes. They also conducted a similar experiment based on giving notifications to administrators of Facebook pages. “After deploying the DQN model, we were able to improve daily, weekly, and monthly metrics without sacrificing notification quality,” they write.
  Why it matters: This is an example for how a relatively simple RL system (Discrete DQN) can yield significant gains against hard-to-specify business metrics (eg, “meaningful interactions”). It also shows how large web platforms can use AI to iteratively improve their ability to target individual users while increasing their ability to predict user behavior and preferences over longer time horizons – think of it as a sort of ever-increasing ‘data&compute dividend’.
  Read more: Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform (Facebook Research).

German politician calls for billions of dollars for national AI strategy:
…If Germany doesn’t invest boldly enough, it risks falling behind…
Lars Klingbeil, general secretary of the Social Democratic Party in Germany, has called for the country to invest significantly in its own AI efforts. “We need a concrete investment strategy for AI that is backed by a sum in the billions,” wrote Klingbeil in an article for Tagesspiegel. “We have to stop taking it easy”.
  Why it matters: AI has quickly taken on a huge amount of symbolic political power, with politicians typically treating success in AI as being a direct sign of the competitiveness of a country’s technology industry; comments like this from the SPD reinforce that image, and are likely to incentivize other politicians to talk about it in a similar way, further elevating the role AI plays in the discourse.
  Read more: Germany needs to commit billions to artificial intelligence: SPD (Reuters).

Faking faces for fun with AI:
…”If we can generate realistic looking faces of any type, what are the implications for our ability to trust in what we see”…
One of the continued open questions around the weaponization of fake imagery is how easy it needs to become for people to do this for it to become economically sensible for people to weaponize the technology (eg, through making faked images of politicians in specific politically-sensitive situations). New work by an independent researcher gives us an indication of what the state of these things is today. The good news: it’s still way too hard to do for us to worry about many actors abusing the technology. The bad news: All of this stuff is getting cheaper to build and easier to operate over time.
  How it works: Shaobo Guan’s research shows how to build a conditional image generation system. The way this works is you can ask your computer to synthesize a random face for you, then you can tweak a bunch of dials to let you change latent variables from which the image is composed, allowing you to manipulate, for instance, the spacing apart of a “person’s” eyes, the coloring of their hair, the size of their sideburns, whether they are wearing glasses, and so on. Think of this as like a combination of an etch-a-sketch, a Police facial composite machine, and an insanely powerful Photoshop filter.
  “A word about ethics”: The blog post is notable for its inclusion of a section that specifically considers the ethical aspects of this work in two ways: 1) because the underlying dataset for the generative tool is limited then if such a tool were put into production it wouldn’t be very representative; 2) “If we can generate realistic looking faces of any type, what are the implications for our ability to trust in what we see”? It’s encouraging to see these acknowledgements in a work like this.
  Why it matters: Posts like this give us a valuable point-in-time sense of what a motivated researcher is able to build relying on relatively small amounts of resources (the project was done during three week as part of an Insight Data Science ‘AI fellow program’). They also help us understand the general difficulties people face when working with generative models.
  Read more: Generating custom photo-realistic faces using AI (Insight Data Science).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

EU AI ethics chief urges caution on regulation:
The chairman of the EU’s new expert body on AI, Pekka Ala-Pietilä, has cautioned against premature regulation, arguing Europe should be focussed now on developing “broad horizontal principles” for ethical uses of AI. He foresees regulations on AI as taking shape as the technology is deployed, and as courts react to emergent issues, rather than ex ante. The high-level expert group on AI plans to produce a set of draft ethical principles in March, followed by a policy and investment strategy.
  Why this matters:  This provides some initial indications of Europe’s AI strategy, which appears to be focussed partly on establishing leadership in the ethics of AI. The potential risks from premature and ill-judged interventions in such a fast-moving field seem high. This cautious attitude is probably a good thing, particularly given Europe’s proclivity towards regulation. Nonetheless, policy-makers should be prepared to react swiftly to emergent issues.
  (Note from Jack: It also fits a pattern common in Europe of trying to regulate for the effects of technologies developed elsewhere – for example, GDPR was in many ways an attempt to craft rules to apply controls to non-European mega platforms like Google and Facebook).
  Read more: Europe’s AI ethics chief: No rules yet, please.

Microsoft will bid on Pentagon AI contract:
Microsoft has reaffirmed its intention to pursue a major contract with the US Department of Defense. The company’s bid on the $10bn cloud-computing project, codenamed JEDI, had prompted some protest from employees. In a blog post, the company said it would “engage proactively” in the discussion around laws and policies to ensure AI is used ethically, and argued that to withdraw from the market (for example, for US military contracts) would reduce the opportunity to engage in these debates in the future. Google withdrew its JEDI bid on the project earlier this year, after significant backlash from employees (though the real reason for the pull out could be that Google lacked all the gov-required data security certifications necessary to field a competitive bid)
  Read more: Technology and the US military (Microsoft).
  Read more: Microsoft Will Sell Pentagon AI (NYT).

Assumptions in ML approaches to AI safety:
Most of the recent growth in AI safety has been in ML-based approaches, which look at safety problems in relation to current, ML-based, systems. The usefulness of this work will depend strongly on the type of advanced AI systems we end up with, writes DeepMind AI safety researcher Victoria Krakovna.
  Consider the transition from horse-carts to cars. Some of the important interventions in horse-cart safety, such as designing roads to avoid collisions, scaled up to cars. Others, like systems to dispose of horse-waste, did not. Equally, there are issues in car safety, e.g. air  pollution, that someone thinking about horse-cart safety could not have foreseen. In the case of ML safety, we should ask what assumptions we are making about future AI systems, how much we are relying on them, and how likely they are to hold up. The post outlines the authors opinions on a few of these key assumptions.
  Read more: ML approach to AI safety (Victoria Krakovna).

Baidu joins Partnership on AI:
Chinese tech giant Baidu has become the first Chinese member of the Partnership on AI. The Partnership is a consortium of AI leaders, which includes all the major US players, focussed on developing ethical best practices in AI.
  Read more: Introducing Our First Chinese Member (Partnership on AI).

Tech Tales:

Generative Adversarial Comedy (CAN!)

[2029: The LinePunch, a “robot comedy club” started 2022 in the South Eastern corner of The Muddy Charles, a pub tucked inside a building near the MIT Media Lab in Boston, Massachusetts]

Two robot comedians are standing on stage at The LinePunch and, as usual, they’re bombing.

“My Face has no nose, how does it smell?” says one of the robots. Then it looks at the crowd, pauses for two seconds, and says: “It smells using its face!”
  The robot opens its hands, as though beckoning for applause.
  “You suck!” jeers one of the humans.
  “Give them a chance,” says someone else.
  The robot that had told the nose joke bows its head and hands the microphone to the robot standing next to it.
  “OK, ladies and germ-till-men,” says the second robot, “why did the Chicken move across the road?”
  “To get uploaded into the matrix!” says one of the spectating humans.
  “Ha-Ha!” says the robot. “That is incorrect. The correct answer is: to follow its friend.”
  A couple of people in the audience chuckle.
  “Warm crowd!” says the robot. “Great joke next joke: three robots walk into a bar. The barman says “Get out, you need to come in sequentially!”
  “Boo,” says one of the humans in the audience.
  The robot tilts its head, as though listening, then prepares to tell another joke…

The above scene will happen on the third tuesday of every month for as long as MIT lets its students run The LinePunch. I’d like to tell you the jokes have gotten better since its founding, but in truth they’ve only gotten stranger. That’s because robots that tell jokes which seem like human jokes aren’t funny (in fact, they freak people out!), so what the bots end up doing at the LinePunch is a kind of performative robot theater, where the jokes are deliberately different to those a human would tell – learned via complex array of inverted feature maps, but funny to the humans nonetheless – learned via human feedback techniques. One day I’m sure the robots will learn to tell jokes to amuse eachother as well.

Things that inspired this story: Drinks in The Muddy Charles @ MIT; synthetic text generation techniques; recurrent neural networks; GANs; performance art; jokes; learning from human preferences.

Import AI 118: AirBnB splices neural net into its search engine; simulating robots that touch with UnrealROX; and how long it takes to build a quadcopter from scratch

Building a quadcopter from scratch in ten weeks:
Modeling the drone ecosystem by what it takes to build one…
The University of California at San Diego recently ran a course where students got the chance to design, build, and program their own drones. A writeup of the paper outlines how the course is structured and gives us a sense of what it takes to build a drone today.
   Four easy pieces: The course breaks building the drones into four phases: designing the PCB, implementing the flight control software, assembling the PCB, and getting the quadcopter flying. Each of this phases has numerous discrete steps which are detailed in the report. One of the nice things about the curriculum is the focus on the cost of errors: “Students ‘pay’ for design reviews (by course staff or QuadLint) with points deduced from their lab grade,” they write. “This incentivizes them to find and fix problems themselves by inspection rather than relying on QuadLint or the staff”.
  The surprising difficulty of drone software: Building the flight controller software for the drone proves to be one of the most challenging aspects of the research because of the numerous potential causes for bugs, so root cause analysis can be challenging.
  Teaching tools: While developing the course the instructors noticed that they were spending a lot of time checking and evaluating PCB designs for correctness, so they designed their own program called ‘QuadLint’ to try to auto-analyze and grade these submissions. “QuadLint is, we believe, the first autograder that checks specific design requirements for PCB designs,” they write.
  Costs: The report includes some interesting details on the cost of these low-powered drones, with the quadcopter itself costing about $35 per PCB plus $40 for the components. Currently, the most expensive component of the course is the remote ($150) and for the next course the teachers are evaluating cheaper options.
  Small scale: The quadcopters all use a PCB to host their electronics and serve as an airframe. They measure less than 10 cm on a side and are suitable for flight indoors over short distances. “The motors are moderately powerful, “brushed” electric motors powered by a small lithium-polymer (LiPo) battery, and we use small, plastic propellers. The quadcopters are easy to operate safely, and a blow from the propeller at full speed is painful but not particularly dangerous. Students wear eye protection around their flying quadcopters.”
  Why it matters: In paper notes that the ‘killer apps’ of the future “will lie at the intersection of hardware, software, sensing, robotics, and/or wireless communications”. This seems true – especially when we look at the chance for major uptake from the success of companies like DJI and the possibility for unit economics driving the price down. Therefore, tracking and measuring the cost and ease with which people can build and assemble them out of (hard to track, commodity) components gives us better intuitions about this aspect of drones+security. While the hardware and software is under-powered and somewhat pricey today it won’t stay that way for long.
  Read more: Trial by Flyer: Building Quadcopters From Scratch in a Ten-Week Capstone Course (Arxiv).

Amazon tries to make Alexa smarter via richer conversational data:
…Who needs AI breakthroughs when you’ve got a BiLSTM, lots of data, and patience?…
Amazon researchers are trying to give personal assistants like Alexa the ability to have long-term, conversations about specific topics. The (rather unsurprising) finding they make in a new research paper is is that you can “extend previous work on neural topic classification and unsupervised topic keyword detection by incorporating conversational context and dialog act features”, yielding personal assistants capable of longer and more coherent conversations than their forebears, if you can afford to annotate the data.
  Data used: The researchers used data collected during the 2017 ‘Alexa Prize’ competition, which consists of over 100,000 utterances containing interactions between users and chatbots. They augmented this data by classifying the topic for each utterance into one of 12 categories (eg: politics, fashion, science & technology, etc), and also trying to classify the goal of the user or chatbot (eg: clarification, information request, topic switch, etc). They also asked other annotators to rank every single chatbot response with metrics relating to how comprehensible  it was, how relevant the response was, how interesting it was, and whether a user might want to continue the conversation with the bot.
  Baselines and BiLSTMs: The researchers implement two baselines (DAN, based on a bag-of-words neural model; ADAN, which is DAN extend with attention), and then develop two versions of a bidirectional LSTM (BiLSTM) system, where one uses context from the annotated dataset and the other doesn’t. They then evaluate all these methods by testing their baselines (which contain only the current utterance) against systems which incorporate context, systems which incorporate data, and systems which incorporate both context and data. The results show that a BiLSTM fed with context in sequence does almost twice as well as a baseline ADAN system that uses context and dialog, and almost 25% better than a DAN fed with both context and dialog.
  Why it matters: The results indicate that – if a developer can afford the labeling cost – it’s possible to augment language interaction datasets with additional information about context and topic to create more powerful systems, which seems to imply that in the language space we can expect to see large companies invest in teams of people to not just transcribe and label text at a basic level, but also perform more elaborate meta-classifications as well. The industrialization of deep learning continues!
  Read more: Contextual Topic Modeling For Dialog Systems (Arxiv).

Why AI won’t be efficiently solving a 2D gridworld quest soon:
…Want humans to be able to train AIs? The key is curriculum learning and interactive learning, says BabyAI creators…
Researchers with the Montreal Institute for Learning Algorithms (MILA) have designed a free tool called BabyAI to let them test AI systems’ ability to learn generalizable skills from curriculums of tasks set in an efficient 2D gridworld environment – and the results show that today’s AI algorithms display poor data efficiency and generalization at this sort of task.
  Data efficiency: BabyAI uses gridworlds for its environment, which the researchers have written to be efficient enough that researchers can use the platform without needing access to vast pools of compute; the BabyAI environments can be run at up to 3,000 frames per second “on a modern multi-core laptop” and can also be integrated with OpenAI Gym).
  A specific language: BabyAI uses “a comparatively small yet combinatorially rich subset of English” called Baby Language. This is meant to help researchers write increasingly sophisticated strings of instructions for agents, while keeping the state space from exploding too quickly.
  Levels as a curriculum: BabyAI ships with 19 levels which increase in difficulty of both the environment, and the complexity of the language required to solve it. The levels test each agent on a variety of 13 different competencies, ranging from things like being able to unlock doors, navigating to locations, ignoring distractors placed into the environment, navigating mazes, and so on. The researchers also design a bot which can solve any of the levels using a variety of heuristics – this bot serves as a baseline against which to train a model.
  So, are today’s AI techniques sophisticated enough to solve BabyAI? The researchers train an imitation learning-based baseline for each level and and assess how well it does. The systems are able to learn to perform basic tasks, but struggle to imitate the expert at tasks that require multiple actions to solve. One of the most intriguing parts of a paper is the analysis of the relative efficiency of systems trained via both imitation and from pure reinforcement learning, which shows that today’s algorithms are wildly inefficient at learning pretty much anything: simple tasks like learning to go to a red ball hidden within a map take 40,000-60,000 demos when using imitation learning, and around 453,000 to 470,000 when learning using reinforcement learning without an expert teacher to attempt to mimic. The researchers also show that using pre-training (where you learn on other tasks before attempting certain levels) does not yield particularly impressive performance, with pre-training yielding at most a 3X speedup.
  Why it matters: Platforms like BabyAI give AI researchers fast, efficient tools to use when tackling hard research projects, while also highlighting the deficiency of many of today’s algorithms. The transfer learning results “suggest that current imitation learning and reinforcement learning methods scale and generalize poorly when it comes to learning tasks with a compositional structure,” they write. “An obvious direction of future research to find strategies to improve data efficiency of language learning.”
  Get the code for BabyAI (GitHub).
  Read more: BabyAI: First Steps Towards Grounded Language Learning with a Human In the Loop (Arxiv).

Simulating robots that touch and see in AAA-game quality detail:
The new question AI researchers will ask: But Can It Simulate Crysis?…
Researchers with the 3D Perception Lab at the University of Alicante have designed UnrealROX, a high-fidelity simulator based on Unreal Engine 4, built for simulating and training AI agents embodied in (simulated) touch-sensitive robots.
  Key ingredients: UnrealROX has the following main ingredients: a simulated grasping system that can be applied to a variety of finger configurations; routines for controlling robotic hands and bodies using commercial VR setups like the Oculus Rift and HTC Vive; a recorder to store full sequences from scenes; and customizable camera locations.
  Drawback: The overall simulator can run at 90 frames-per-second, the researchers note. While this may sound impressive it’s not particularly useful for most AI research unless you can run it far faster than that (compare this with BabyAI, which runs at 3,000 FPS).
  Simulated robots with simulated hands: UnrealROX ships with support for two robots: a simulated ”Pepper’ robot from company Aldebaran, and a spruced-up version of the mannequin that ships with UE4. Both of these robots have been designed with extensible, customizable grasping systems, letting them reach out and interact with the world around them. “The main idea of our grasping subsystem consists in manipulating and interacting with different objects, regardless of their geometry and pose.”
  Simulators, what are they good for? UnrealROX may be of particular interest to researchers that need to create and record very specific sequences of behaviors on robots, or who wish to test the ability to learn useful policies from a relatively small amount of high-fidelity information. But it seems likely that the relative slowness of the simulator will make it difficult to use for most AI research.
  Why it matters: The current proliferation of simulated environments represents a kind of simulation-boom in AI research that will eventually produce a cool historical archive of the many ways in which we might think robots could interact with each other and the world. Whether UnrealROX is used or not, it will contribute to this historical archive.
  Read more: UnrealROX An eXtremely Photorealistic Virtual Realty Environment for Robotics Simulations and Synthetic Data Generation (Arxiv).

AirBnB augments main search engine with neural net, sees significant performance increase:
…The Industrialization of Deep Learning continues…
Researchers with home/apartment-rental service AirBNB have published details on how they transitioned AirBnB’s main listings search engine to a neural network-based system. The paper highlights how deploying AI systems in production is different to deploying AI systems in research. It also sees AirBnB follow Google, which in 2015 augmented its search engine with ‘RankBrain’, a neural network-based system that almost overnight became one of the most significant factors in selecting which search results to display to a user. “”This paper is targeted towards teams that have a machine learning system in place and are starting to think about neural networks (NNs),” the researchers write.
  Motivation: “The very first implementation of search ranking was a manually crafted scoring function. Replacing the manual scoring function with a gradient boosted decision tree (GBDT) model gave one of the largest step improvements in homes bookings in Airbnb’s history,” the researchers write. This performance boost eventually plateaued, prompting them to implement neural network-based approaches to improve search further.
  Keep it simple, (& stupid): One of the secrets about AI research is the gulf between frontier research and production use-cases, where researchers tend to prioritize novel approaches that work on small tasks, and industry and/or large-scale operators prioritize simple techniques that scale well. This fact is reflected in this research, where the researchers started work with a single layer neural net model, moved on to a more sophisticated system, then opted for a scale-up solution as their final product. “We were able to deprecate all that complexity by simply scaling the training data 10x and moving to a DNN with 2 hidden layers.”
  Input features: For typical configurations of the network the researchers gave it 195 distinct input ‘features’ to learn about, which included properties of listings like price, amenities, historical booking count; as well as features from other smaller models.
  Failure: The paper includes a quite comprehensive list of some of the ways in which the Airbnb researchers failed when trying to implement new neural network systems. Many of these failures are due to things like overfitting, or trying to architect too much complexity into certain parts of the system.
  Results: AirBNB doesn’t reveal the specific quantitative performance boost as this would leak some proprietary commercial information, but does include a couple of graphs that shows that the usage of the 2-layer simple neural network leads to a very meaningful relative gain in the number of bookings made using the system, indicating that the neural net-infused search is presenting people with more relevant listings which they are more likely to book. “Overall, this represents one of the most impactful applications of machine learning at Airbnb.,” they write.
  Why it matters: AirBNB’s adoption of deep learning for its main search engine further indicates that deep learning is well into its industrialization phase, where large companies adopt the technology and integrate it into their most important products. Every time we get a paper like this the chance of an ‘AI Winter’ decreases, as it creates another highly motivated commercial actor that will continue to invest in AI research and development, regardless of trends in government and/or defence funding.
  Read more: Applying Deep Learning to AirBNB Search (Arxiv).
  Read more: Google Turning Its Lucrative Web Search Over to AI Machines (Bloomberg News, 2015).

Refining low-quality web data with CurriculumNet:
…AI startup shows how to turn bad data into good data, with a multi-stage weakly supervised training scheme…
Researchers with Chinese computer vision startup Malong have released code and data for CurriculumNet, a technique to train deep neural networks on large amounts of data with variable annotations, collected from the internet. Approaches like this are useful if researchers don’t have access to a large, perfectly labeled dataset for their specific task. But the tradeoff is that the labels on datasets gathered in this way are far noisier than those from hand-built datasets, presenting researchers with the challenge of extracting enough signal from the noise to be able to train a useful network.
  CurriculumNet: The researchers train their system on the WebVision database, which contains over 2,400,000 images with noisy labels. Their approach works by training an Inception_v2 model over the whole dataset, then studying the feature space which all the images are mapped into; CurriculumNet then sorts these images into clusters, then sorts each cluster these into three subsets according to how similar all the images in the set are to eachother in featurespace, with the intuition being that subsets with lots and lots of similar images will be easier to learn from than those which are very diverse. They then start to train a model over this where they start by using the subsets with similar image features, then mix in the noisier subsets. By iteratively learning a classifier from good labels, then adding in ones with noisier ones, the researchers say they are able to increase the generalization of their trained systems.
  Testing: They test CurriculumNet on four benchmarks: WebVision, ImageNet, Clothing1M, and Food101. They find that systems trained using the largest amount of noisy data converge to higher accuracies than those trained without, seeing reductions in error of multiple percentage points on WebVision (“these improvements are significant on such a large-scale challenge,” they write). CurriculumNet gets state-of-the-art results for top-1 accuracy on WebVision, with performance increasing even further when they train on more data (such as combining ImageNet and WebVision).
  Why it matters: Systems like CurriculumNet show how researchers can use poorly-labeled data, combined with clever training ideas, to increase the value of lower-quality data. Approaches like this can be viewed as being analogous to a clever refinement process applied when extracting a natural resources.
  Read more: CurriculumNet: Weakly Supervized Learning from Large-Scale Web Images (Arxiv).
  Get the trained models from Malong’s Github page.

Tech Tales:

[2025: Podcast interview with the inventor of GFY]

Reality Bites, So Change It.
Or: There Can Be Hope For Those of Us Who Were Alone And Those We Left Behind

My Father was struck by a truck and killed while riding his motorbike in the countryside; no cameras, no witnesses; he was alone. There was an investigation but no one was ever caught. So it goes.

At the funeral I told stories about the greatness of my Father and I helped people laugh and I helped people cry. But I could not help myself because I could not see his death. It was as though he opened a door and disappeared before walking through it and the door never closed again; a hole in the world.

I knew many people who had lost friends and parents to cancer or other illnesses and their stories were quite horrifying: black vomit before the end; skeletons with the faces of parents; tales of seeing a dying person when they didn’t know they were being watched and seeing rage and fear and anguish on their face. The retellings of so many bad jokes about not needing to pay electricity bills, wheezed out over hospital food.

I envied these people, because they all had a “goodbye story” – that last moment of connection. They had the moment when they held a hand, or stared at a chest as it heaved in one last breath, or confessed a great secret before the chance was gone. Even if they weren’t there at the last they had known it was coming.

I did not have my goodbye, or the foreshadowing of one. Imagine that.

So that is why I built Goodbye For You(™), or GFY for short. GFY is software that lets you simulate and spend the last few moments with a loved one. It requires data and effort and huge amounts of patience… but it works. And as AI technology improves, so does the ease of use and fidelity of GFY.

Of course, it is not quite real. There are artifacts: improbable flocks of birds, or leaves that don’t fall quite correctly, or bodies that don’t seem entirely correct. But the essence is there: With enough patience and enough of a record of the deceased, GFY can let you reconstruct their last moment, put on a virtual reality haptic-feedback suit, and step into it.

You can speak with them… at the end.  you can touch them and they can touch you. We’re adding smell soon.

I believe it has helped people  Let me try to explain how it worked the first time, all those years ago.

I was able to see the truck hit his bike. I saw his body fly through the air. I heard him say “oh no” the second after impact as he was catapulted off his bike and towards the side of the road. I heard his ribs break as he landed. I saw him crying and bleeding. I was able to approach his body. He was still breathing. I got on my knees and bent over him and I cried and the VR-helmet saw my tears in reality and simulated these tears falling onto his chest – and he appeared to see them, then looked up at me and smiled.
   He touched my face and said “my child” and then he died.

Now I have that memory and I carry it in my heart as a candle to warm my soul. After I experienced this first GFY my dreams changed. It felt as though I had found a way to see him open the door – and leave. And then the door shut.

Grief is time times memory times the rejuvenation of closure: of a sense of things that were once so raw being healed and knitted back together. If you make the memory have closure things seem to heal faster.

Yes, I am still so angry. But when I sleep now I sometimes dream of that memory, and in my imagination we say other things, and in this way continue to talk softly through the years.

Things that inspired this story: The as-yet-untapped therapeutic opportunities afford by synthetic media generation (especially high-fidelity conditional video); GAN progression from 2014 to 2018; compute growth both observed and expected for the next few years; Ander Monson’s story “I am getting comfortable with my grief”.