Import AI

Import AI: #105: Why researchers should explore the potential negative effects of their work; fusing deep learning with classical planning for better robots, and who needs adversarial examples when a blur will do?

Computer scientist calls for researchers to discuss downsides of work, as well as upsides:
…Interview with Brent Hecht, chair of the Association for Computing Machinery (ACM)’s Future of Computing Academy, which said in March that researchers should list downsides of their work…
One of the repeated problems AI researchers deal with is the omni-use nature of the technology: a system designed to recognize a wide variety of people in different poses and scenes can also be used to surveil people; auto-navigation systems for disaster response can be repurposed for weaponizing consumer platforms; systems to read lips and thereby improve the quality of life of people with hearing and/or speech difficulties can also be used to surreptitiously analyze people in the wild; and so on.
  Recently, the omni-use nature of this tech has been highlighted as companies like Amazon develop facial recognition tools which are subsequently used by the police, or how Google uses computer vision techniques to develop systems for the ‘MAVEN’ program from the DoD. What can companies and researchers do to increase the positive effects of their research and minimize some of the downsides? Computer science professor Brent Hecht says in an interview with Nature that scientists should consider changing the process of peer review to encourage scientists to talk about the potential for abuse of their work.
  “In the past few years, there’s been a sea-change in how the public views the real-world impacts of computer science, which doesn’t align with how many in the computing community view our work,” he says. “A sizeable population in computer science thinks that this is not our problem. But while that perspective was common ten years ago, I hear it less and less these days.”
  Why it matters: “Disclosing negative impacts is not just an end in itself, but a public statement of new problems that need to be solved,” he says. “We need to bend the incentives in computer science towards making the net impact of innovations positive.”
  Read more: The ethics of computer science: this researcher has a controversial proposal (Nature).

Sponsored: The AI Conference – San Francisco, Sept 4–7:
…Join the leading minds in AI, including Kai-Fu Lee, Meredith Whittaker, Peter Norvig, Dave Patterson, and Matt Wood. No other conference combines this depth of technical expertise with a laser focus on how to apply AI in your products and in your business today.
…Register soon. Last year this event sold out; training courses and tutorials are filling up fast. Save an extra 20% on most passes with code IMPORTAI20.

Worried about adversarial examples and self-driving cars? You should really be worried about blurry images:
…Very basic corruptions to images can cause significant accuracy drops, research shows…
Researchers with the National Robotics Engineering Center and the Electrical and Computer Engineering Department at CMU have shown that simply applying basic image degradations that blur images, or add haze to them, leads to significant performance issues. “We show cases where performance drops catastrophically in response to barely perceptible changes,” writes researcher Phil Koopman in a blog post that explains the research. “You don’t need adversarial attacks to foil machine learning-based perception – straightforward image degradations such as blur or haze can cause problems too”.
  Testing: The researchers test a variety of algorithms across three different architectures (Faster R-CNN, Single Shot Detector (SSD), and Region-based Fully Convolutional Network (R-FCN); they test these architectures with a variety of feature extractors, like Inception or MobileNets. They evaluate these algorithms by testing them on the NREC ‘Agricultural Person Detection Dataset’. The researchers apply two types of mutation to the images: “simple” mutators which modify the image, and “contextual” mutators which mutate the image while adding additional information. To test the “simple” mutations they apply simple image transformations, like Gaussian blur, JPEG Compression, the addition of salt and pepper noise, and so on. For the “contextual” mutations they apply things like haze to the image.
  Results: In tests, the researchers show that very few detectors are immune from the effects of these perturbations, with results indicating that the Single Shot Detectors (SSD)s have the greatest amount of trouble with dealing with these relatively minor tweaks. One point of interest is that some of the systems which are resilient to these mutations are resilient to quite a few of them quite consistently – the presence of these patterns shows “generalized robustness trends”, which may serve as signposts for future researchers to further evaluate generalization.
  Read more: Putting image manipulations in context: robustness testing for safe perception (Safe Autonomy / Phil Koopman blogspot).
  Read more: Putting Image Manipulations in Context: Robustness Testing for Safe Perception (PDF).

Researchers count on blobs to solve counting problems:
…Segmenting objects may be hard, but placing dots on them may be easy…
Precisely counting objects in scenes, like the number of cars on a road or people walking through a city, is a task that challenges both humans and machines. Researchers are training object counters to label individual entities via dots to indicate each entity, rather than pixel segmentation masks or bounding boxes, as is typical. “We propose a novel loss function that encourages the model to output instance regions such that each region contains a single object instance (i.e. a single point-level annotation),” they explain. This tweak significantly improves performance relative to other baselines based on segmentation and depth.They evaluate their approach on diverse datasets, consisting of images of parking lots, images taken by traffic cameras, images of penguins, PASCAL VOC 2007, another surveillance dataset called MIT Traffic, and Crowd Counting Datasets.
  Why it matters: Counting objects is a difficult task for AI systems, and approaches like this indicate other ways to tackle the problem. In the future, the researchers want to design new network architectures that can better distinguish between overlapping objects that have complicated shapes and appearances.
  Read more: Where are the Blobs: Counting by Localization with Point Supervision (Arxiv).

Predicting win states in Dota 2 for better reinforcement learning research:
…System’s recommendations outperform proprietary product’s…
Researchers have trained a system to predict the probability of a given team winning or losing a game of popular online game Dota 2. This is occurring at the same time that researchers across the world try to turn MOBAs into test-beds for reinforcement learning.
  To train their model, the researchers downloaded and parsed replay files from over 100,000 Dota 2 matches. They generate discrete bits of data for each 60 second period of a game, containing a vector which encodes information about the players state at that point in time. They then use these slices to inform a point-in-time ‘Time Slice Evaluation’ (TSE) model which attempts to predict the outcome of the match from a given point in time. . The researchers do detect some correlation between the elapsed game time, the ultimate outcome of the match, and the data contained within the slice being studied at this point in time. Specifically, they find that after the first fifty percent of games it becomes fairly easy to train a model to accurately predict win likelihoods, so they train their system on this data.
  Results: The resulting system can successfully predict the outcome of matches and outperforms  ‘Dota Plus’, a proprietary subscription service that provides a win probability graph for every match. (Chinese players apparently call this service the ‘Big Teacher’, which I think is quite delightful!).The researchers’ system is, on average, about three percentage points more accurate than Dota Plus Assistant and starts from a higher base prediction accuracy. One future direction of research is to train on the first 50 percent of elapsed match time, though this would require algorithmic innovation to deal with the early-game instability. Another is to implement a recurrent neural network system so that instead of making predictions based on a single time-slice, the system can instead make predictions from sequences of slices.
  Why it matters: MOBAs are rapidly becoming a testbed for advanced reinforcement learning approaches, with companies experiencing with games like Dota. Papers like this give us a better idea of the sorts of specific work that need to go on to make it easy for researchers to work with these platforms.
  Read more: MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games (Arxiv).

Better robots via fusing deep learning with classical planning:
…Everything old is new again as Berkeley and Chicago researchers staple two different bits of the AI field together…
Causal InfoGAN is a technique for learning what the researchers call “plannable representations of dynamical systems”. Causal InfoGANs work by observing an environment, for instance, a basic maze simulation, and exploring it. They use this exploration to develop a representation of the space, which they then use to compose plans to navigate across it.
  Results: In tests, the researchers show that Causal InfoGAN can develop richer representations of basic mazes, and can use these representations to create plausible trajectories to navigate the space. In another task, they show how the Causal InfoGAN can learn to perform a multi-stage task that requires searching to find a key then unlocking a door and proceeding through it. They also test their approach on a rope manipulation task, where the Causal InfoGAN needs to plan how to transition a rope from an initial state to a goal state (such as a crude knot, or different 2D arrangement of the rope on a table.
  Why it matters: The benefit of techniques like this is it takes something that has been developed for many years in classical AI – planning under constraints – and augments it with deep learning-based approaches to make it easier to access information about the environment. “Our results for generating realistic manipulation plans of a rope suggest promising applications in robotics,” they write. “Where designing models and controllers for manipulating deformable objects is challenging.”
  Read more: Learning Plannable Representations with Causal InfoGAN (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:…

Amazon’s face recognition software falsely matches US Members of Congress with criminals:
The ACLU have been campaigning against the use of Amazon’s Rekognition software by US law enforcement agencies. For their latest investigation, they used the software to compare photos of all sitting members of Congress against 2,500 mugshots. They found 28 members were falsely matched with mugshots. While the error-rate across the sample was 5%, it was 39% for non-white members.
  Amazon responds: Matt Wood (Amazon’s general manager of AI) writes in an Amazon blog post that the results are misleading, since the ACLU used the default confidence level of 80%, whereas Amazon recommends a setting of 99% for ‘important’ uses. (There is no suggestion that Amazon requires clients to use a higher threshold). He argues that the bias of the results is a reflection of bias in the mugshot database itself.
  Why this matters: Amazon’s response about the biased sample set is valid, but is precisely the problem the ACLU and others have pointed out. Mugshot and other criminal databases in the US reflect the racial bias in the US criminal justice system, which interacts disproportionately with people of colour. Without active efforts, tools that use these databases will inherit their biases, and could entrench them. We do not know if these agencies are following Amazon’s recommendation to use a 99% confidence rate, but it seems unwise to allow these customers to use a considerably lower setting, given the potential harms from misidentification.
  Read more: Amazon’s Face Recognition Falsely Matched 28 Members of Congress With Mugshots (ACLU).
  Read more: Amazon’s response (AWS blog).

Chinese company exports surveillance tools:
Chinese company CloudWalk Technology has entered a partnership with the Zimbabwean government to provide mass face recognition, in a country with a long history of human rights abuses. The most interesting feature of the contract is the agreement that CloudWalk will have access to a database that aims to contain millions of Zimbabweans. Zimbabwe does not have legislation protecting biometric data, leaving citizens with few options to prevent either the surveillance program being implemented, or the export of their data. This large dataset may have significant value for CloudWalk in training the company’s systems on a broader racial mix.
  Why this matters: This story combines two major ethical concerns with AI-enabled surveillance. The world’s authoritarian regimes represent a huge potential market for these technologies, which could increase control over their citizens and have disastrous consequences for human rights. At the same time, as data governance in developed countries becomes more robust, firms are increasingly “offshoring” their activities to countries with lax regulation. This partnership is a worrying interaction of these issues, with an authoritarian government buying surveillance technology, and paying for it with their citizens’ data.
  Read more: Beijing’s Big Brother Tech Needs African Faces (Foreign Policy)

UK looks to strengthen monitoring of foreign investment in technology:
The UK has announced proposals to strengthen the government’s ability to review foreign takeovers that pose national security risks. While the measures cover all sectors, the government identifies “advanced technologies” and “military and dual-use technologies” as core focus areas, suggesting that AI will be high on the agenda. US lawmakers are currently considering proposals to strengthen CFIUS, the US government’s equivalent tool for controlling foreign investment.
  Why this matters: As governments realize the importance of retaining control over advanced technologies.It will be interesting to see how broad a scope the government takes, and whether these measures could become a means of blocking a wide range of investments in technology. It is noteworthy that they take a fairly wide definition of national security risks, not restricted to military or intelligence considerations, and including risks from hostile parties gaining strategic leverage over the UK.
  Read more: National Security and Investment White paper.

FLI grants add $2m funding for research on robust and beneficial AI:
The Future of Life Institute has announced $2m in funding for research towards ensuring that artificial general intelligence (AGI) is beneficial for humanity. This is the second round of grants from Elon Musk’s $10m donation in 2015. The funding is more focussed on AI strategy and technical AI safety than the previous round, which included a diverse range of projects.
  Why this matters: AGI could be either humanity’s greatest invention, or its most destructive. The FLI funding will further support a community of researchers trying to ensure positive outcomes from AI. While the grant is substantial, it is worth remembering that the funding for this sort of research remains a minuscule proportion of AI investment more broadly.
  Read more: $2 Million to Keep AGI Beneficial and Robust (FLI)
  Read more: Research Priorities for Robust and Beneficial Artificial Intelligence (FLI)

Lost in translation:
Last week I summarized Germany’s AI report using Google Translate. A reader kindly pointed out that Charlotte Stix, Policy Officer at the Leverhulme Centre for the Future of Intelligence, has translated the document in full: Cornerstones for the German AI Strategy. (Another researcher doing prolific translation work is Jeffrey Ding, from the Future of Humanity Institute, whose ChinAI newsletter is a great resource to keep up-to-speed with AI in China.)

OpenAI Bits and Pieces:

OpenAI Scholars 2018:
Learn more about the first cohort of OpenAI Scholars and get a sense of what they’re working on.
  Read more: Scholars Class 2018 (OpenAI Blog).

Tech Tales:

The Sound of Thinking

It is said that many hundreds of years ago we almost went to the stars. Many people don’t believe this now, perhaps because it is too painful. But we have enough historical records preserved to know it happened: for a time, we had that chance. We were distracted, though. The world was heating up. Systems built on top of other systems over previous centuries constrained our thinking. As things got hotter and more chaotic making spaceships became a more and more expensive proposition. Some rich people tried to colonize the moon but lacked the technology for it to be sustainable. In school we use ancient telescopes to study the wreckage of the base. We tell many stories about what went on in it, for we have no records. The moonbase, like other things around us in this world, is a relic from a time when we were capable of greater things.

It is the AIs that are perhaps the strangest things. These systems were built towards the end of what we refer to as the ‘technological high point’. Historical records show that they performed many great feats in their time – some machines helped the blind see, and others solved outstanding riddles in physics and mathematics and the other sciences. Other systems were used for war and surveillance, to great effect. But some systems – the longest lasting ones – simply watch the world. There are perhaps five of them left worldwide, and some of us guard them, though we are unsure of their purpose.

The AI I guard sits at the center of an ancient forest. Once a year it emits a radio broadcast that beams data out to all that can listen. Much of the data is mysterious to us but some of it is helpful – it knows the number of birds in the forest, and can

The AI is housed in a large building which, if provided with a steady supply of water, is able to generate power sufficient to let the machine function. When components break, small doors in the side of the AI’s building open, revealing components sealed in vacuum bags, marked with directions in every possible language about how to replace them. We speak different languages now and one day it will be my job to show the person who comes after me how to replace different components. At current failure rates, I expect the AI to survive for several hundred years.

My AI sings, sometimes. After certain long, wet days, when the forest air is sweet, the machine will begin to make sounds, which sound like a combination of wind and stringed instruments and the staccato sounds of birds. I believe the machine is singing. After it starts to make sounds the birds of the forest respond – they start to sing and as they sing they mirror the rhythms of the computer with their own chorus.

I believe that the AI is learning how to communicate with the birds, perhaps having observed that people, angry and increasingly hopeless, are sliding down the technological gravity well and, given a few hundred thousand years, may evolve into something else entirely. Birds, by comparison, have been around for far longer than humans and never had the arrogance to try and escape their sphere. Perhaps the AI thinks they are a more promising type of life-form to communicate with: it sings, and they listen. I worry that when the AIs sang for us, all those years ago, we failed to listen.

Things that inspired this story: Writings of the Gawain Poet about ancient ruins found amid dark age England, J G Ballard, the Long Now Foundation, flora&fauna management.

Import AI: #104: Using AirBNB to generate data for robots; Google trains AI to beat humans at lip-reading; and NIH releases massive ‘DeepLesion’ CT dataset

Rosie the Robot takes a step closer with new CMU robotics research:
What’s the best way to gather a new robotics research dataset – AirBNB?!…
Carnegie Mellon researchers have done the robotics research equivalent of ‘having cake and eating it too; – they have created a new dataset to evaluate generalization within robotics, and have successfully built low-cost robotics which have been able to show meaningful performance on the dataset. The motivation for the research is that most robotics datasets are specific to highly-controlled lab environments, and instead it’s worth exploring generating and gathering data from more real world locations (in this case, homes rented on AirBNB), then see if it’s possible to develop a system that can learn to grasp objects within these datasets, and see if the use of these datasets improves generalization relative to other techniques.
  How it works: The approach has three key components: a Grasp Prediction Network (GPN) which takes in pixel imagery and tries to predict the correct grasp to take (and which is fine-tuned from a pretrained ResNet-18 model); a Noise Modelling Network (NMN) which tries to estimate the latent noise based on the image of the scene and information from the robot; and a marginalization layer which helps combine the two data streams to predict the best grasp to use.
  The robot: They use a Dobot Magician robotic arm with five degrees of freedom, customized with a two axis wrist with electric gripper, and mounted on a Kobuki mobile base. For sensing, they re-quip it with an Intel R200 RGB camera with a pan-tilt attachment positioned 1m above the ground. The robot’s onboard processor is a laptop with an i5-8250U CPU with 8GB of RAM. Each of these robots costs about $3,000 – far less than the $20k+ prices for most other robots.
  Data gathering: To gather data for the robots the researchers used six different properties from AirBNB. They then deployed the robot in this home, used a low-cost ‘YOLO’ model to generate bounding boxes around objects near the robot, then let the robot’s GPN and NMN work together to help it predict how to grasp objects. They collect about 28,000 grasps in this manner.
  Results: The researchers try to evaluate their new dataset (which they call Home-LCA) as well as their new ‘Robust-Grasp’ two-part GPN&NMN network architecture. First, they examine the test accuracy of their Robus-Grasp network trained on the Home-LCA dataset and applied to other home environments, as well as two datasets which have been collected in traditional lab settings (Lab-Baxter and Lab-LCA). The results here are very encouraging as their approach seems to generalize better to the lab datasets than other approaches, suggesting that the Home-LCA dataset is rich enough to create policies which can generalize somewhat.
  They also test their approach on deployed physical environments in unseen home environments (three novel AirBNBs). The results show that Home-LCA does substantially better than Lab-derived datasets, showing performance of around 60% accuracy, compared to between 20% and 30% for other approaches – convincing results.
  Why it matters: Most robotics research suffers from one of two things: 1) either the robot is being trained and tested entirely in simulation, so it’s hard to trust the results. 2) the robot is being evaluated on such a constricted task that it’s hard to get a sense for whether algorithmic progress leading to improved task performance will generalize to other tasks. This paper neatly deals with both of those problems by situating the task and robot in reality, collecting real data, and also evaluating generalization. It also provides further evidence that robot component costs are falling while network performance is improving sufficiently for academic researchers to conduct large-scale real world robotic trials and development, which will no doubt further accelerate progress in this domain.
  Read more: Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias (Arxiv).

Learning to navigate over a kilometer of paths, with generalization:
…Bonus: Dataset augmentation techniques and experimental methodology increase confidence in result…
QUT and DeepMind researchers have successfully trained a robot to learn to navigate over two kilometers of real-world paths connected up to one another by 2,099 distinct nodes. The approach shows that it’s possible to learn sufficiently robust policies in simulation to be subsequently transferred to the real world, and the researchers validate their system by testing it on real world data.
  The method: “We propose to train a graph-navigation agent on data obtained from a single coverage traversal of its operational environment, and deploy the learned policy in a continuous environment on the real robot,” the researchers write. They create a map of a given location, framed as a graph with points and connections between them, gathering 360-degree images from an omnidirectional camera to populate each point on the graph and, in addition, gathering the data lying between each point. “This amounts to approximately 30 extra viewpoints per discrete location, given our 15-Hz camera on a robot moving at 0.5 meters per second,” they write. They then use this data to augment the main navigation task. They also introduce techniques to randomize – in a disciplined manner – the brightness of gathered images, which lets them create more synthetic data and better defend against robots trained with the system overfitting to specific patterns of light. They then use curriculum learning to train a simulated agent using A3C to learn to navigate between successively farther apart points of the (simulated) graph. These agents themselves use image recognition systems pre-trained on the Places365 dataset and finetuned on the gathered data.
  Results: The researchers test their system by deploying it on a real erobot (a Pioneer 3DX) and ask it to navigate to specific areas of the campus. There are a couple of reasons to really like this evaluation approach: one) they’re testing it in reality rather than a simulator, so the results are more trustworthy, and 2) they test on the real robot three weeks after collecting the initial data, allowing for significant intermediary changes in things like the angle of the sun at given times of day, the density of people, placement of furniture, and other things that typically confound robots. They test their system against an ‘oracle’ (aka, perfect) route, as well as what was learned during training in the simulator. The results show that their technique successfully generalizes to reality, navigating successfully to defined locations on ten out of eleven tries, but at a significant cost: on average, routes come up with in reality are on the order of 2.42X more complex than optimal routes.
  Why it matters: Robots are likely one of the huge markets that will be further expanded and influenced by continued development of AI technology. What this result indicates is that existing, basic algorithms (like A3C), combined with well-understood data collection techniques, are already sufficiently powerful to let us develop proof-of-concept robot demonstrations. The next stage will be learning to traverse far larger areas while reducing the ‘reality penalty’ seen here of selected routes not being as efficient as optimal ones.
  Read more: Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal (Arxiv).
  Watch videos: Deployable Navigation Policies.

Why better measurements can lead to better robot research:
…New tasks, best practices, and datasets to evaluate smart robot agents…
An interdisciplinary team of researchers from universities and companies has written about the many problems inherent to contemporary robotic agent research and have issued a set of recommendations (along with the release of some specific testing environments) meant to bring greater standardization to robotics research. This matters because standardization on certain tasks, benchmarks, and techniques has led to significant progress in other areas of AI research – standardization on ‘ImageNet’ helped generate enough research to show the viability of deep learning architectures for hard supervised learning problems, and more recently OpenAI’s ‘OpenAI Gym’ helped to standardize some of the experimental techniques for reinforcement learning research. But robotics has remained stubbornly idiosyncratic, even when researchers report results in simulators. “Convergence to common task definitions and evaluation protocols catalyzed dramatic progress in computer vision. We hope that the presented recommendations will contribute to similar momentum in navigation research,” the authors write.
  A taxonomy of task-types: Navigation tasks can be grouped into three basic categories: PointGoal (navigate to a specific location); ObjectGoal (navigate to an object of a specific category, eg a ‘refrigerator’); and AreaGoal (agent but must navigate to an area of a specific category, eg a kitchen). The first category requires coordinates while the latter two require certain the robot to assign labels to the world around it.
  Specific tasks can be further distinguished by analyzing the extent of the agent’s exposure to the test environment, prior to evaluation. These different levels of exposure can roughly be characterized as: No prior exploration; pre-recorded prior exploration (eg, supplied with a trajectory through the space); and time-limited exploration by the agent (explores for a certain distance before being tested on the evaluation task).
  Evaluation: Add in ‘DONE’ which agent signals when it completes an episode – this lets the agent characterize runs where it believes it has completed the task, giving scientists an additional bit of information to use when evaluating what the agent did to achieve that task. This differs to other methodologies which can simply end the evaluation episode when the agent reaches the goal, which doesn’t require the agent to indicate that it knows it has finished the task.
  Avoid using Euclidean measurements to determine the proximity of the goal, as this might reward the agent for placing itself near the object despite being separated from it by a wall. Instead, scientists might consider measuring the shortest-path distance in the environment to the goal, and evaluating on that.
  Success weight by (normalized inverse) Path Length (SPL): Assess performance by using the agent’s ‘DONE’ signal each test episode and path length, then calculate the average score for how close-to-optimal the agent’s paths were across all episodes (so, if an agent was successful on 8 runs out of ten, and each of the successful runs was 50% greater than the optimal path distance, its SPL for the full evaluation would be 0.4). “Note that SPL is a rather stringent measure. When evaluation is conducted in reasonably complex environments that have not been seen before, we expect an SPL of 0.5 to be a good level of navigation performance,” the researchers explain.
  Simulators: Use continuous state spaces, as that better approximates the real world conditions agents will be deployed into. Also, simulators should standardize reporting distances as SI Units, eg “Distance 1 in a simulator should correspond to 1 meter”.
  Publish full details (and customizations) of simulators, and ideally release the code as open source. This will make it easier to replicate different simulated tasks with a high level of accuracy (whereas comparing robotics results on different physical setups tends to introduce a huge amount of noise, making disciplined measurement difficult). “This customizability comes with responsibility”, they note.
  Standard Scenarios: The authors have also published a set of “standard scenarios” by curating specific data and challenges from contemporary environment datasets SUNCG, AI2-THOR, Matterport3D, and Gibson. These tasks closely follow the recommendations made elsewhere in the report and, if adopted, will bring more standardization to robotic research.
  Read more: On Evaluation of Embodied Navigation Agents (Arxiv).
  Read more: Navigation Benchmark Scenarios (GitHub).

I can see what you’re saying – DeepMind and Google create record-breaking lip-reading system:
…Lipreading network has potential (discussed) applications for people with speech impairment and potential (undiscussed) applications for surveillance…
DeepMind and Google researchers have created a lipreading speech recognition system with a lower word error rate than professional humans, and which is able to use a far larger vocabulary (127,055 terms versus 17,428 terms) than other approaches. To develop this system they created a new speech recognition dataset consisting of 3,886 hours of speaking of faces saying particular phoneme sequences.
  How it works: The system relies on “Vision to Phoneme (V2P)”, a network trained to produce a sequence of phoneme distributions given a sequence of video frames. They also implement V2P-Sync, a model that verifies the audio and video channels are aligned (and therefore prevents the creation of bad data, which would lead to poor model performance). V2P uses a 3D convolutional model to extract features from a given video clip and aggregate them over time via a temporal module. They implement their system as a very large model which is trained in a distributed manner.
  Results: The researchers tested their approach on a held-out test-set containing 37 minutes of footage, across 63,000 video frames and 7100 words. They found that their system significantly outperforms people. “This entire lipreading system results in an unprecedented WER of 40.9% as measured on a held-out set from our dataset,” they write. “In comparison, professional lipreaders achieve either 86.4% or 92.9% WER on the same dataset, depending on the amount of context given.”
  Motivation: The researchers say the motivation for the work is to provide help for people with speech impairments. They don’t discuss the obvious surveillance implications of this research anywhere in the paper, which seems like a missed opportunity .
  Why it matters: This paper is another example of how, with deep learning techniques, if you can access enough data and compute then many problems become trivial – even ones that seem to require a lot of understanding and ‘human context’, like lipreading. Another implication here is that many tasks that we suspect are not that well suited to AI may in fact be more appropriate than we assume.
  Read more: Large-Scale Visual Speech Recognition (Arxiv).

Researchers fuse hyperparameter search with neural architecture search:
…Joint optimization lets them explore  architectural choices and hyperparameters at the same time…
German researchers have shown how researchers can jointly optimize the hyperparameters of a model while searching through different architectures. This takes one established thing within machine learning (finding the right combination of hyperparameters to maximize performance against cost) and combines it with a newer area that has received lots of recent interest (using reinforcement learning and other approaches to optimize the architecture of the neural network, as well as its hyperparameters). “We argue that most NAS search spaces can be written as hyperparameter optimization search spaces (using the standard concepts of categorical and conditional hyperparameters),” they write.
  Results: They test their approach by training a multiple-brand ResNet architecture on CIFAR-10 while exploring a combination of ten architectural choices and seven hyperparameter choices. They limited training time to a maximum of three hours for each sampled configuration and performed 256 of these full-length runs (amounting to about 32 GPU days of constant training). They discover that the relationship between hyperparameters, architecture choices, and trained model performance, is more subtle than anticipated, indicating that there’s value in training these jointly.
  Why it matters: As computers get faster it’s going to be increasingly sensible to offload as much of the design and optimization of a given neural network architecture as possible to the computer – further development in fields of automatic model optimization will spur progress here.
  Read more: Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search (Arxiv).

NIH releases ‘DeepLesion’ dataset to aid medical researchers:
…Bonus: the data is available immediately online, no sign-up required…
The National Institute of Health has released ‘DeepLesion’, a set of 32,000 CT images with annotated lesions, giving medical machine learning researchers a significant data resource to use to develop AI systems. The images are from 4,400 unique individuals.and have been heavily annotated with bookmarks around the lesions.
  The NIH says it hopes researchers will use the dataset to help them “develop a universal lesion detector that will help radiologists find all types of lesions. It may open the possibility to serve as an initial screening tool and send its detection results to other specialist systems trained on certain types of lesions”.
  Why it matters: Data is critically important for many applied AI applications and, at least in the realm of medical data, simulating additional data is fraught with dangers, so the value of primary data taken from human sources is very high. Resources like those released by the NIH can help scientists experiment with more data and thereby further develop their AI techniques.
  Read more: NIH Clinical Center releases dataset of 32,000 CT images (NIH).
  Get the data here: NIH Clinical Center (via storage provider Box).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:…

AI leaders sign pledge opposing autonomous weapons:
The Future of Life Institute has issued a statement against lethal autonomous weapons (LAWs) which has been signed by 176 organizations and 2,515 individuals, including Elon Musk, Stuart Russell, Max Tegmark, and the cofounders of DeepMind.
  What’s wrong with LAWs: They letter says humans should never to delegate the decision to use lethal force to machines, because these weapons remove the “risk, attributability, and difficulty” of taking lives, and that this makes them potentially destabilizing, and powerful tools of oppression.
  (Self-)Regulation: The international community does not yet possess the governance systems to prevent a dangerous arms race. The letter asks governments to create “strong international norms, regulations and laws against LAWs.” This seems deliberately timed ahead of the upcoming meeting of the UN CCW to discuss the issue. The signatories pledge to self-regulate, promising to “neither participate in nor support the development, manufacture, trade, or use of LAWs.”
  Why it matters: Whether a ban of these weapons is a feasible, or desirable, remains unclear. Nonetheless, the increasing trend of AI practitioners mobilizing on ethical and political issues will have a significant influence on how AI will be developed. If efforts like this lead to substantive policy changes they could also serve as a useful model to study as researchers try to achieve political ends in other aspects of AI research and development.
  Read more: Lethal Autonomous Weapons Pledge (Future of Life Institute).

US military’s AI plans take shape:
The DoD has announced that they will be releasing a comprehensive AI strategy ‘within weeks’. This follows a number of piecemeal announcements, which have included the establishment earlier this month of the Joint Artificial Intelligence Center (JAIC), which will oversee all large AI programs in US defence and intelligence and forge partnerships with industry and academia.
  Why it matters: This is just the latest reminder that militaries already see the potential in AI, and are ramping up investment. Any AI arms race between countries carries substantial risks, particularly if parties prioritize pace of development over building safe, robust systems (see below). Whether or not the creation of a military AI strategy will prompt the US to finally release a broader national strategy remains to be seen.
  Read more: Pentagon to Publish Artificial Intelligence Strategy ‘Within Weeks’.
  Read more: DoD memo announcing formation of the JAIC.

Germany releases framework for their national AI strategy:
The German government has released a prelude to their national AI strategy, which will be announced at the end of November. (NB – the document has not been released in English, so I have relied on Google Translate)
  Broad ambitions: The government presents a long list of goals for their strategy. These include fostering a strong domestic industry and research sector, developing and promoting ethical standards and new regulatory frameworks, and encouraging uptake in other industries.
  Some specific proposals:
– A Data Ethics Committee to address the ethical and governance issues arising from AI.
– Multi-national research centers with France and other EU countries.
– The development of international organizations to manage labor displacement.
– Ensuring that Germany and Europe lead international efforts towards common technical standards.
– Public dialogue on the impacts of AI.
   Read more: Cornerstones of the German AI Strategy (German).

Solving the AI race:
GoodAI, a European AI research organization, held a competition for ideas on how to tackle the problems associated with races in AI development. Here follows summaries of two of the winning papers.
  A formal theory of AI coordination: This paper approaches the problem from an international relations perspective. The researchers use game theory to model 2-player AI races, where AI R&D is costly, the outcome of the race is uncertain, and players can either cooperate or defect. They consider four models the race could plausibly take, determined by the coordination regime in place, and suggest which models are the ‘safest’ in terms of players being incentivized against developing risky AI. They suggest policies to promote cooperation within different games, and to shift race dynamics into more safety-conducive set-ups.
  Solving the AI race: This paper gives a thorough overview of how race dynamics might emerge, between corporations as well as militaries, and spells out a comprehensive list of the negative consequences from such a situation. The paper present 3 mitigation strategies with associated policy recommendations. (1) encouraging and enforcing cooperation between actors; (2) providing incentives for transparency and disclosure; (3) establishing AI regulation agencies.
  Why it matters: There are good reasons to be worried about race dynamics in AI. Competing parties could be incentivized to prioritize pace of development over safety, with potentially disastrous consequences. Equally, if advanced AI is developed in an adversarial context, this could make it less likely that its benefits are fairly distributed amongst humanity. More worryingly, it is hard to see how race dynamics can be avoided given the ‘size of the prize’ in developing advanced AI. Given this, researching strategies for managing races and enforcing cooperation should be a priority.
  Read more: General AI Challenge Winners (GoodAI).

OpenAI Bits & Pieces:

OpenAI Five Benchmark:
We’ve removed many of the restrictions on our 5v5 bots and will be playing a match in a couple of weeks. Check out the blog for details about the restrictions we’ve removed and the upcoming match.
  Read more: OpenAI Five Benchmark (OpenAI blog).

AI wizard Mike Cook wants OpenAI’s Dota bots to teach him, not beat him:
Here’s a lengthy interview with Mike Cook, a games AI research, who gives some of his thoughts on OpenAI Five.
  Read more: AI wizard Mike Cook wants OpenAI’s Dota bots to teach him, not beat him (Rock Paper Shotgun).

Tech Tales:

Drone Buyer

So believe it or not the first regulations came in because people liked the drones too much – these delivery companies started servicing areas and, just like in online games, there were always some properties in a given region that massively outspent others by several orders of magnitude. As in any other arena of life where these fountains of money crop up, workers would nickname the people in these properties ‘whales’. The whales did what they did best and spent. But problems emerged as companies continued expanding delivery hours and the whales continued to spend and spend – suddenly, an area that the companies machine learning algorithm had zoned for 50 deliveries a day (and squared away with planning officials) suddenly had an ultra-customer to contend with. And these customers would order things in the middle of the night. Beans would land on lawns at 3am. Sex toys at 4am. Box sets of obscure TV shows would plunk down at 6am. Breakfast burritos would whizz in at 11am. And so on. So the complaints started piling up and that led to some of the “anti-social drone” legislation, which is why most cities now specify delivery windows for suburban areas (and ignore the protests of the companies who point to their record-breakingly-quiet new drones, or other innovations).

Things that inspired this story: Drones, Amazon Prime, everything-as-a-service, online games.

Import AI: #103: Testing brain-like alternatives to backpropagation, why imagining goals can lead to better robots, and why navigating cities is a useful research avenue for AI

Backpropagation may not be brain-like, but at least it works:
…Researchers test more brain-like approaches to learning systems, discover that backpropagation is hard to beat…
Backpropagation is one of the fundamental tools of modern deep learning – it’s one of the key mechanisms for propagating and updating information through networks during training. Unfortunately, there’s relatively little evidence available that our own human brains perform a process analogous to backpropagation (this is a question Geoff Hinton has struggled with for several years in talks like ‘Can the brain do back-propagation‘?). That has given some concern to researchers for some years who worry that though we’re seeing significant gains from developing things based on backpropagation, we may need to investigate other approaches in the future.  Now, researchers with Google Brain and the University of Toronto have performed an empirical analysis of a range of fundamental learning algorithms, testing approaches based on backpropagation against ones using target propagation and other variants.
  Motivation: The idea behind this research is that “there is a need for behavioural realism, in addition to physiological realism, when gathering evidence to assess the overall biological realism of a learning algorithm. Given that human beings are able to learn complex tasks that bear little relationship to their evolution, it would appear that the brain possesses a powerful, general-purpose learning algorithm for shaping behavior”.
  Results: The researchers “find that none of the tested algorithms are capable of effectively scaling up to training large networks on ImageNet”, though they record some success with MNIST and CIFAR. “Out-of-the-box application of this class of algorithms does not provide a straightforward solution to real data on even moderately large networks,” they write.
   Why it matters: Given that we know how limited and simplified our neural network systems are, it seems intellectually honest to test and ablate algorithms, particularly by comparing well-studied ‘mainstream’ approaches like backpropagation with more theoretically-grounded but less-developed algorithms from other parts of the literature.
  Read more: Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures (Arxiv).

AI and Silent Bugs:
…Half-decade old bug in ‘Aliens’ game found responsible for poor performance…
One of the more irritating things about developing AI systems is that when you mis-program AI it tends to fail silently – for instance, in OpenAI’s Dota project we saw performance dramatically increase simply after fixing non-breaking bugs. Another good example of this phenomenon has turned up in news about Aliens: Colonial Marines, a poorly reviewed half-decade-old game. But it turns out some of the reasons for those poor reviews were likely due to a bug – subsequent patches have found that the original game mis-named one variable which lead to entire chunks of the game’s enemy AI systems not functioning.
  Read more: A years-old, one-letter typo led to Aliens: Colonial Marines’ weird AI (Ars Technica).

Berkeley researchers teach machines to dream imaginary goals and solutions for better RL:
…If you want to change the world, first imagine yourself changing it…
Berkeley researchers have developed a way for machines to develop richer representations of the world around them and use this to solve tasks. The method they use to achieve this is a technique called ‘reinforcement learning with imagined goals’ (RIG). RIG works like this: an AI system interacts with an environment, data from these observations is used to train (and finetune) a variational auto encoder (VAE) latent variable model, then they use this representation to train the AI system to solve different imagined tasks using the representation learned by the VAE. This type of approach is becoming increasingly popular as AI researchers try to increase the capabilities of algorithms by getting them to use and learn from more data.
  Results: Their approach does well at tasks requiring reaching objects and pushing objects to a goal, beating baselines including algorithms like Hindsight Experience Replay (HER).
  Why it matters: After spending several years training algorithms to master an environment, we’re now trying to train algorithms that can represent their environment, then use that representation as an input to the algorithm to help it solve a new task. This is part of a general push toward greater representative capacity within trained models.
  Read more: Visual Reinforcement Learning with Imagined Goals (Arxiv).

Facebook thinks the path to smarter AI involves guiding other AIs through cities:
…’Talk The Walk’ task challenges AIs to navigate each other through cities, working as a team…
Have you ever tried giving directions to someone over the phone? It can be quite difficult, and usually involves a series of dialogues between you and the person as you try to figure out where in the city they are in relation to where they need to get to. Now, researchers with Facebook and the Montreal Institute of Learning Algorithms (MILA) have set out to develop and test AIs that can solve this task, so as to further improve the generalization capabilities of AI agents. “”For artificial agents to solve this challenging problem, some fundamental architecture designs are missing,” the researchers say.
  The challenge: The new “Talk The Walk” task frames the problem as a discussion between a ‘guide’ and a ‘tourist’ agent. The guide agent has access to a map of the city area that the tourist is in, as well as a location the tourist wants to get to, and the tourist has access to an annotated image of their current location along with the ability to turn left, turn right, or move forward.
  The dataset: The researchers created the testing environment by obtaining 360-degree photographic views of neighborhoods in New York City, including Hell’s Kitchen, the East VIllage, Williamsburg, the Financial District, and the Upper East Side. They then annotated each image of each corner of each street intersection with a set of landmarks drawn from the following categories: bar, bank, shop, coffee shop, theater, playfield, hotel, subway, and restaurant. They then had more than six hundred users of Mechanical Turk play a human version of the game, generating 10,000 successful dialogues from which AI systems can be trained (with over 2,000 successful dialogues available for each neighborhood of New York the researchers gathered data for).
  Results: The researchers tested their developed systems at how well they can localize themselves – that is, develop a notion of where they are in the city. The results are encouraging, with localization models developed by the researchers achieving a higher localization score than humans. (Though humans take about half the number of steps to effectively localize themselves, showing that human sample efficiency remains substantially better than those of machines.
  Why it matters: Following a half decade of successful development and commercialization of basis AI capabilities like image and audio processing, researchers are trying to come up with the next major tasks and datasets they can use to test contemporary research algorithms and developing them further. Evaluation methods like those devised here can help us develop AI systems which need to interact with larger amounts of real world data, potentially making it easier to evaluate how ‘intelligent’ these systems are becoming, as they are being tested directly on problems that humans solve every day and have good intuitions and evidence about the difficulty of. Though it’s worth noting that the current version of the task as solved by Facebook is fairly limited, as it involves a setting with simple intersections (predominantly just four-way straight-road intersections), and the agents aren’t being tested on very large areas nor are being required to navigate particularly long distances.
  Read more: Take the Walk: Navigating New York City through Grounded Dialogue (Arxiv).

Microsoft calls for government-led regulation of artificial intelligence technology:
…Company’s chief legal officer Brad Smith says government should study and regulate the technology…
Microsoft says the US government should appoint an independent commission to investigate the uses and applications of facial recognition technology. Microsoft says it is calling for this because it thinks the technology is of such utility and generality that it’s better for the government to think about regulation in a general sense than for specific companies like Microsoft tot think through questions on their own. The recommendation follows a series of increasingly fraught run-ins between the government, civil rights groups, and companies regarding the use of AI: first, Google dealt with employees protesting its ‘Maven’ AI deal with the DoD, then Amazon came under fire from the ACLU for selling law enforcement authorities facial recognition systems based on its ‘Rekognition’ API.
  Specific questions: Some of the specific question areas Smith thinks the government should spend time include: should law enforcement use of facial recognition be subject to human oversight and control? Is it possible to ensure civilian oversight of this technology? Should retailers post a sign indicating that facial recognition systems are being used in conjunction with surveillance infrastructure?
  Why it matters: Governments will likely be the largest uses of AI-based systems for surveillance, facial recognition, and more – but in many countries the government needs the private sector to develop and sell it products with these capabilities, which requires a private sector that is keen to help the government. If that’s not the case, then it puts the government into an awkward position. Government can clarify some of these relationships in specific areas by, as Microsoft suggests here, appointing an external panel of experts to study an issue and make recommendations.
  A “don’t get too excited” interpretation: Another motivation a company like Microsoft might have for calling for such analysis and regulation is that large companies like Microsoft have the resources to be able to ensure compliance with any such regulations, whereas startups can find this challenging.
  Read more: Facial recognition technology: The need for public regulation and corporate responsibility (Microsoft).

Google opens a Seedbank for wannabe AI gardeners:
Seedbank provides access to a dynamic, online, code encyclopedia for AI systems…
Google has launched Seedbank, a living encyclopedia about AI programming and research. Seedbank is a website that contains a collection of machine learning examples which can be interacted with via a live programming interface in Google ‘colab’. You can browse ‘seeds’ which are major AI topic areas like ‘Recurrent Nets’ or ‘Text & Language’, then click into them for specific examples; for instance, when browsing ‘Recurrent Nets’ you can learn about Neural Translation with Attention and can open a live notebook to walk you through the steps involving in creating a language translation system.
  “For now we are only tracking notebooks published by Google, though we may index user-created content in the future. We will do our best to update Seedbank regularly, though also be sure to check for new content,” writes Michael Tyke in a blog post announcing Seedbank.
  Why it matters: AI research and development is heavily based around repeated cycles of empirical experimentation, so being able to interact with and tweak live programming examples of applied AI systems is a good way to develop better intuitions about the technology.
  Read more: Seedbank – discover machine learning examples (TensorFlow Medium blog).
  Read more: Seedbank official website.

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:…

Cross-border collaboration, openness, and dual-use:
…A new report urges better oversight of international partnerships on AI, to ensure that collaborations are not being exploited for military uses…
The Australian Strategic Policy Institute has published a report by Elsa Kania outlining some of the dual-use challenges inherent to today’s scalable, generic AI techniques.
  Dual-use as a strategy: China’s military-civil fusion strategy relies on using the dual-use characteristics of AI to ensure new civil developments can be applied in the military domain, and vice versa. There are many cases of private labs and universities working on military tech, e.g. the collaboration between Baidu and CETC (state-owned defence conglomerate). This blurring of the line between state/military and civilian research introduces a complication into partnerships between (e.g.) US companies and their Chinese counterparts.
  Policy recommendations: Organizations should assess the risks and possible externalities from existing partnerships in strategic technologies, establish systems of best practice for partnerships, and monitor individuals and organizations with clear links to foreign governments and militaries.
  Why this matters: Collaboration and openness are a key driver of innovation in science. In the case of AI, international cooperation will be critical in ensuring that we manage the risks and realize the opportunities of this technology. Nevertheless, it seems wise to develop systems to ensure that collaboration is done responsibly and with an awareness of risks.
  Read more: Technological entanglement.

Around the world in 23 AI strategies:
Tim Dutton has summarized the various national AI strategies governments have put forward in the past two years.
– AlphaGo really was a Sputnik moment in Asia. Two days after AlphaGo defeated Lee Sedol in 2016, South Korea’s president announced ₩1 trillion ($880m) in funding for AI research, adding “Korean society is ironically lucky, that thanks to the ‘AlphaGo shock’, we have learned the importance of AI before it is too late.”
– Canada’s strategy is the most heavily focused on investing in AI research and talent. Unlike other countries, their plan doesn’t include the usual policies on strategic industries, workforce development, and privacy issues.
– India is unique in putting social goals at the forefront of their strategy, and focusing on the sectors which would see the biggest social benefits from AI applications. Their ambition is to then scale these solutions to other developing countries.
   Why this matters: 2018 has seen a surge of countries putting forward national AI strategies, and this looks set to continue. The range of approaches is striking, even between fairly similar countries, and it will be interesting to see how these compare as they are refined and implemented in the coming years. The US is notably absent in terms of having a national strategy.
   Read more: Overview of National AI Strategies.

Risks and regulation in medical AI:
Healthcare is an area where cutting-edge AI tools such as deep learning are already having a real positive impact. There is some tension, though, between the cultures of “do no harm”, and “move fast and break things.”
  We are at a tipping point: We have reached a ‘tipping point’ in medical AI, with systems already on the market that are making decisions about patients’ treatment. This is not worrying in itself, provided these systems are safe. What is worrying is that there are already examples of autonomous systems making potentially dangerous mistakes. The UK is using an AI-powered triage app, which recommends whether patients should go to hospital based on their symptoms. Doctors have noticed serious flaws, with the app appearing to recommend staying at home for classic symptoms of heart attacks, meningitis and strokes.
  Regulation is slow to adapt: Regulatory bodies are not taking seriously the specific risks from autonomous decision-making in medicine. By treating these systems like medical devices, they are allowing them to be used on patients without a thorough assessment of their risks and benefits. Regulators need to move fast, yet give proper oversight to these technologies.
  Why this matters: Improving healthcare is one of the most exciting, and potentially transformative applications of AI. Nonetheless, it is critical that the deployment of AI in healthcare is done responsibly, using the established mechanisms for testing and regulating new medical treatments. Serious accidents can prompt powerful public backlashes against technologies (e.g. nuclear phase-outs in Japan and Europe post-Fukushima). If we are optimistic about the potential healthcare applications of AI, ensuring that this technology is developed and applied safely is critical in ensuring that these benefits can be realized.
  Read more: Medical AI Safety: We have a problem.

OpenAI & ImportAI Bits & Pieces:

Better generative models with Glow:
We’ve released Glow, a generative model that uses a 1×1 reversible convolution to give it a richer representative capacity.  Check out the online visualization tool to experiment with a pre-trained Glow model yourself, applying it to images you can upload.
   Read more: Glow: Better Reversible Generative Models (OpenAI Blog).

AI, misuse, and DensePose:
IEEE Spectrum has written up some comments from here in Import AI about Facebook’s ‘DensePose’ system and the challenges it presents for how AI systems can potentially be misused and abused. As I’ve said in a few forums, I think the AI community isn’t really working hard on this problem and is creating unnecessary problems (see also: voice cloning via Lyrebird, faking politicans via ‘Deep Video Portraits’, surveiling crowds with drones, etc).
  Read more: Facebook’s DensePose Tech Raises Concerns About Potential Misuse (IEEE Spectrum).

Tech Tales:

Ad Agency Contracts for a Superintelligence:

Subject: Seeking agency for AI Superintelligence contract.
Creative Brief: Company REDACTED has successfully created the first “AI Superintelligence” and is planning a global, multi-channel, PR campaign to introduce the “AI Superintelligence” (henceforth known as ‘the AI’) to a global audience. We’re looking for pitches from experienced agencies with unconventional ideas in how to tell this story. This will become the most well known media campaign in history.

We’re looking for agencies that can help us create brand awareness equivalent to other major events, such as: the second coming of Jesus Christ, the industrial revolution, the declaration of World War 1 and World War 2, the announcement of the Hiroshima bomb, and more.

Re: Subject: Seeking agency for AI Superintelligence contract.
Three words: Global. Cancer. Cure. Let’s start using the AI to cure cancer around the world. We’ll originally present these cures as random miracles and over the course of several weeks will build narrative momentum and impetus until ‘the big reveal’. Alongside revealing the AI we’ll also release a fully timetabled plan for a global rollout of cures for all cancers for all people. We’re confident this will establish the AI as a positive force for humanity while creating the requisite excitement and ‘curiosity gap’ necessary for a good launch.

Re: Subject: Seeking agency for AI Superintelligence contract.
Vote Everything. Here’s how it works: We’ll start an online poll asking people to vote on a simple question of global import, like, which would you rather do: Make all aeroplanes ten percent more fuel efficient, or reduce methane emissions by all cattle? We’ll make the AI fulfill the winning vote. If we do enough of these polls in enough areas then people will start to correlate the results of the polls with larger changes in the world. As this happens, online media will start to speculate more about the AI system in question. We’ll be able to use this interest to drive attention to further polls to have it do further things. The final vote before we reveal it will be asking people what date they want to find out who is ‘the force behind the polls’.

Re: Subject: Seeking agency for AI Superintelligence contract.
Destroy Pluto. Stay with us. Destroy Pluto AND use the mass of Pluto to construct a set of space stations, solar panels, and water extractors throughout the solar system. We can use the AI to develop new propulsion methods and materials which can then be used to create an expedition to destroy the planet. Initially it will be noticed by astronomers. We expect early media narratives to assume that Pluto has been destroyed by aliens who will then harvest the planet and use it to build strange machines to bring havoc to the solar system. Shortly before martial law is declared we can make an announcement via the UN that we used the intelligence to destroy Pluto, at which point every person on Earth will be given a ‘space bond’ which entitles them to a percentage of future earnings of the space-based infrastructure developed by the AI.

Things that inspired this story: Advertising agencies, the somewhat un-discussed question of “what do we do if we develop superintelligence arrives”, historical moments of great significant.

Import AI: #102: Testing AI robustness with IMAGENET-C, militarycivil AI development in China, and how teamwork lets AI beat humans

Microsoft opens up search engine data:
New searchable archive simplifies data finding for scientists…
Microsoft has released Microsoft Research Open Data, a new web portal that people can use to comb through the vast amounts of data released in recent years by Microsoft Research. The data has also been integrated with Microsoft’s cloud services, so researchers can easily port the data over to an ‘Azure Data Science virtual machine’ and start manipulating it with pre-integrated data science software.
  Data highlights: Microsoft has released some rare and potentially valuable datasets, like 10GB worth of ‘Dual Word Embeddings Trained on Big Queries‘ (data from live search engines tends to be very rare), along with original research-oriented datasets like FigureQA, and a bunch of specially written mad libs.
  Read more: Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud (Microsoft Research Blog).
Browse the data: Microsoft Research Open Data.

What does military<>civil fusion look like, and why is China so different from America?
…Publication from Tsinghua VP highlights difference in technology development strategies…
What happens when you have a national artificial intelligence strategy that revolves around developing military and civil AI applications together? A recent (translated) publication by You Zheng, vice president of China’s Tsinghua University, provides some insight.
  Highlights: Tsinghua is currently constructing the ‘High-End Laboratory for Military Intelligence’, which will focus on developing AI to better support China’s country-level goals. As part of this, Tsinghua will invest in basic research guided by some military requirements. The university has also created the ‘Tsinghua Brain and Intelligence Laboratory’ to encourage interdisciplinary research which is less connected to direct military applications. Tsinghua also has a decade-long partnership with Chinese social network WeChat and search engine Sohuo, carrying out joint development within the civil domain. And it’s not focusing purely on technology – the school recently created a ‘Computational Legal Studies’ masters program “to integrate the school’s AI and liberal arts so as to try a brand-new specialty direction for the subject.”
  Why it matters: Many governments are currently considering how to develop AI to further support their strategic goals – many countries in the West are doing this by relying on a combination of classified research, public contracts from development organizations like DARPA, and partnerships with the private sector. But the dynamics of the free market and tendency in these countries to have relatively little direct technology development and research via the state (when compared to the amounts expended by the private sector) has led to uneven development, with civil applications leaping ahead of military ones in terms of capability and impact. China’s gamble is that a state-led development strategy can let it better take advantage of various AI capabilities to more rapidly integrate AI into its society – both civil and military. The outcome of this gamble will be a determiner of the power balance of the 21st century.
  Read more: Tsinghua’s Approach to Military-Civil Fusion in Artificial Intelligence (Battlefield Singularity).

DeepMind bots learn to beat humans at Capture the Flag:
…Another major step forward for team-based AI work…
Researchers with DeepMind have trained AIs that are competitive with humans in a first-person multiplayer game. The result shows that it’s possible to train teams of agents to collaborate with each other to achieve an objective against another team (in this case, Capture the Flag played from the first person perspective within a modified version of the game Quake 3), and follows other recent work from OpenAI on the online team-based multiplayer game Dota, as well as work by DeepMind, Facebook, and others on StarCraft 1 and StarCraft 2.
  The technique relies on a few recently developed approaches, including multi-timescale adaptation, an external memory module, and having agents evolve their own internal reward signals. DeepMind combines these techniques with a multi-agent training infrastructure which uses its recently developed ‘population-based training’ technique. One of the most encouraging results is that trained agents can generalize to never-before-seen maps and typically beat humans when playing under these conditions.
  Additionally, the system lets them train very strong agents: “we probed the exploitability of the agent by allowing a team of two professional games testers with full communication to play continuously against a fixed pair of agents. Even after twelve hours of practice the human game testers were only able to win 25% (6.3% draw rate) of games against the agent team, though humans were able to beat the AIs when playing on pre-defined maps by slowly learning to exploit weaknesses in the AI. Agents were trained on ~450,000 separate games.
  Why it matters: This result, combined with work by others on tasks like Dota 2, shows that it’s possible to use today’s existing AI techniques, combined with large-scale training, to create systems capable of beating talented humans at complex tasks that require teamwork and planning over lengthy timescales – I think because of the recent pace of AI progress these results can seem weirdly unremarkable, but I think that perspective would be wrong: it is remarkable we can develop agents capable of beating people at tasks requiring ‘teamwork’ – a trait that seems to require many of the cognitive tools we think are special, but which is now being shown to be achievable via relatively simple algorithms. As some have observed, one of the more intuitive yet counter-intuitive aspects of these results is how easily it seems that ‘teamwork’ can be learned.
  Less discussed: I think we’re entering the ‘uncanny valley’ of AI research when it comes to developing things with military applications. This ‘capture the flag’ demonstration, along with parallel work on OpenAI and on StarCraft, has a more militaristic flavor than prior research by the AI community. My suspicion is we’ll need to start thinking more carefully about we contextualize results like this and work harder at analyzing which other actors may be inspired by research like this.
Read more: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning (Arxiv).
  Watch extracts of the agent’s behavior here (YouTube).

Discover the hidden power of Jupyter at JupyterCon.
2017: 1.2 million Jupyter notebooks on GitHub.
2018: 3 million, when JupyterCon starts in New York this August.
– This is just one sign of the incredible pace of discovery, as organizations use notebooks and use recent platform developments to solve difficult data problems such as scalability, reproducible science, and compliance, data privacy, ethics, and security issues. JupyterCon: It’s happening Aug 21-25.
– Save 20% on most passes with the code IMPORTAI20.

Ever wanted to track the progress of language modelling AI in minute detail? Now is your chance!
…Mapping progress in a tricky-to-model domain…
How fast is the rate of progression in natural language processing technologies, and where does that progression fit into the overall development of the AI landscape? That’s a question that natural language processing researcher Seb Ruder has tried to answer with a new project oriented around tracking the rate of technical progress on various NLP tasks. Check out the project’s GitHub page and try to contribute if you can.
  Highlights: The GitHub repository already contains more than 20 tasks, and we can get an impression of recent AI progress by examining the results. Tasks like language modeling have seen significant progress in recent years, while tasks like constituency parsing and part-of-speech tagging have seen less profound progress (potentially because existing systems are quite good at these tasks).
  Read more: Tracking the Progress in Natural Language Processing (Sebastian Ruder’s website).
  Read more: Tracking Progress in Natural Language Processing (GitHub).

Facebook acquires language AI company Bloomsbury AI:
…London-based acquihire adds language modeling talent…
Facebook has acquired the team from Bloomsbury AI who will join the company in London and work on natural language processing research. Bloomsbury had previously built systems for examining corpuses of text and answering questions about them, and includes an experienced AI engineering and research team including Dr Sebastian Riedel, a professor at UCL (acquiring companies with professors tends to be a strategic move as it can help with recruiting).
  Read more: Bloomsbury AI website (Bloomsbury AI).
  Read more: I’d link to the ‘Facebook Academics’ announcement if Facebook didn’t make it so insanely hard to get direct URLs to link to within its giant blue expanse.

What is in Version 2, makes the world move, and just got better?
…Robot Operating System 2: Bouncy Bolson…
The ‘Bouncy Bolson’ version of ROS 2 (Robot Operating System) has been released. New features for the open source robot software include better security features, support for 3rd party package submission on the ROS 2 build farm, new command line tools, and more. This is the second non-beta ROS 2 release.
  Read more: ROS 2 Bouncy Bolson Released (

Think deep learning is robust? Try out IMAGENET-C and think again:
…New evaluation dataset shows poor robustness of existing models…
Researchers with Oregon State University have created new datasets and evaluation criteria to see how well trained image recognition systems deal with corrupt data. The research highlights the relatively poor representation and generalisation of today’s algorithms, while providing challenging datasets people may wish to test systems against in the future. To conduct their tests, the researchers create two datasets to evaluate how AI systems deal with these changes. IMAGENET-C is a dataset to test for “corruption robustness” and ICONS-50 is for testing for “surface variation robustness”.
  IMAGENET-C sees them apply 15 different types of data corruption to existing images, ranging from blurring images, to adding noise, or the visual hallmarks of environmental effects like snow, frost, fog, and so on. ICONS-50 consists of 10,000 images from 50 clases of icons of different things like people, food, activities, logos, and so on, and each class contains multiple different illustrative styles.
  Results: To test how well algorithms deal with these visual corruptions the researchers test pre-trained image categorization models against different versions of IMAGENET-C (where a version roughly corresponds to the amount of corruption applied to a specific image), then compute the error rate. The results of the test are that more modern architectures have become better at generalizing to new datatypes (like corrupted images), but that robustness – which means how well a model adapts to changes in data – has barely risen. “Relative robustness remains near AlexNet-levels and therefore below human-level, which shows that our superhuman classifiers are decidedly subhuman,” they write. They do find that there are a few tricks that can be used to increase the capabilities of models to deal with corrupted data: “more layers, more connections, and more capacity allow these massive models to operate more stably on corrupted inputs,” they write.
  For ICONs-50 they try to test classifier robustness by removing the icons from one source (eg Microsoft) or by selecting removing subtypes (like ‘ducks’) from broad categories (like ‘birds’). Their results are somewhat unsurprising: networks are not able to learn enough general features to effectively identify held-out visual styles, and similarly poor performance is displayed when tested on held-out sub-types.
  Why it matters: As we currently lack much in the way of theory to explain and analyze the successes of deep learning we need to broaden our understanding of the technology through empirical experimentation, like what is carried out here. And what we keep on learning is that, despite incredible gains in performance in recent years, deep nets themselves seem to be fairly inflexible when dealing with unseen or out-of-distribution data.
  Read more: Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: …

Technology Roulette:
Richard Danzig, former secretary of the Navy, has written a report for thinktank the Center for a New American Security, on the risks arising from militaries pursuing technological superiority.
  Superiority does not imply security: Creating a powerful, complex technology creates a new class of risks (e.g. nuclear weapons, computers). Moreover, pursuing technological superiority, particularly in a military context, is not a guarantee of safety. While superiority might decrease the risk of attack, through deterrence, it raises the risk of a loss of control, through accidents, misuse, or sabotage. These risks are made worse by the unavoidable proliferation of new technologies, which will place “great destructive power” in the hands of actors without the willingness or ability to take proper safety precautions.
  Human-in-the-loop: A widely held view amongst the security establishment is that these risks can be addressed by retaining human input in critical decision-making. Danzig counters this, arguing that human intervention is “too weak, and too frequently counter-productive” to control military systems that rely on speed. And AI decision-making is getting faster, whereas humans are not, so this gap will only widen over time. Efforts to control such systems must be undertaken at the time of design, rather that during operation.
   What to do: The report makes 5 recommendations for US military/intelligence agencies:
-Increase focus on risks of accidents and emergent effects
– Give priority to reducing risks of proliferation, adversarial behavior, accidents and emergent behaviors.
– Regularly assess these risks, and encourage allies and opponents to do so.
– Increase multilateral planning with allies and opponents, to be able to recognize and respond to accidents, major terrorist events, and unintended conflicts.
– Use new technologies as means for encourage and verifying norms and treaties.
  Why this matters: It seems inevitable that militaries will see AI as a means of achieving strategic advantage. This report sheds light on the risks that such a dynamic could pose to humanity if parties do not prioritize safety, and do not cooperate on minimizing risks from loss of control. One hopes that these arguments are taken seriously by the national security community in the US and elsewhere.
  Read more: Technology Roulette: Managing Loss of Control as Many Militaries Pursue Technological Superiority (CNAS).

UK government responds to Lords AI Report:
The UK government has responded to the recommendations made in the House of Lords’ AI report, released in April. For the most part, the government accepts the committee’s recommendations and is taking actions to help with specific elements of the recommendations:
On public perceptions of AI, the government will work to build trust and confidence in AI through the AI institutions like the Centre for Data Ethics and Innovation (CDEI), which will pursue extensive engagement with the public, industry and regulators, and will align governance measures with the concerns of the public, and businesses.
– On algorithmic transparency, the government pushes back against the report’s recommendation that deployed AI systems have a very high level of transparency/explainability. They note that excessive demands for algorithmic transparency in deployed algorithms could hinder development, particularly in deep learning, and must therefore be weighed against the benefits of the technologies.
– On data monopolies the government will strengthen the capabilities of the UK’s competition board to monitor anti-competitive practices in data and AI, so it can better analyze and respond to the potential for the monopolisation of data by tech giants.
– On autonomous weapons the report asked that the UK improves its definition of autonomous weapons, and brings it into line with that of other governments and international bodies. The government defines an autonomous system as one that “is capable of understanding higher-level intent and direction”, which the report argued “sets the bar so high that it was effectively meaningless.” The gov’t said they have no plans to change their definition.
– Why this matters: The response is not a game-changer, but it is worth reflecting on the way in which the UK has been developing their AI strategy, particularly in comparison with the US (see below). While the UK’s AI strategy can certainly be criticized, the first stage of information-gathering and basic policy recommendations has proceeded commendably. The Lords AI Report and the Hall-Pesenti Review were both detailed investigations, drawing on a array of expert opinions, and asking informed questions. Whether this methodology produces good policy remains to be seen, and depends on a number of contingencies.
  Read more: Government response to House of Lords AI Report\.

Civil liberties group urges US urged to include public in AI policy development, consider risks
Civil liberties group EPIC has organized a petition, with a long list of signatories from academia and industry, to the US Office of Science and Technology Policy (OSTP). Their letter is critical of the US government’s progress on AI policy, and the way in which the government is approaching issues surrounding AI.
  Public engagement in policymaking: The letter asks for more meaningful public participation in the development of US AI policy. They take issue with the recent Summit on AI being closed to the public, and the proposal for a Select Committee on AI identifying only the private sector as a source of advice. This contrasts with other countries, including France, Canada and UK, all of whom have made efforts to engage public opinion on AI.
  Ignoring the big issues: More importantly, the letter identifies a number of critical issues that they say the government is failing to address:
– Potential harms arising from the use of AI.
– Legal frameworks governing AI.
– Transparency in the use of AI by companies, government.
– Technical measures to promote the benefits of AI and minimize the risks.
– The experiences of other countries in trying to address challenges of AI.
– Future trends in AI that could inform the current discussion.
Why this matters: The US is conspicuous amongst global powers for not having a coordinated AI strategy. Other countries are quickly developing plans not only to support their domestic AI capabilities, but to deal with the transformative change that AI will have. The issues raised by the letter cover much of the landscape governments need to address. There is much to be criticized about existing AI strategies, but it’s hard to see the benefits of the US’ complacency.
   Read more: Letter to Michael Kratsios.

OpenAI Bits & Pieces:

Exploring with demonstrations:
New research from OpenAI shows how to obtain a state-of-the-art score on notoriously hard exploration game Montezuma’s Revenge by using a single demonstration.
   Read more: Learning Montezuma’s Revenge from a Single Demonstration (OpenAI blog).

Tech Tales:

When we started tracking it, we knew that it could repair itself and could go and manipulate the world. But there was no indication that it could multiply. For this we were grateful. We were hand-picked from several governments and global corporations and tasked with a simple objective: determine the source of the Rogue Computation and how it transmits its damaging actions to the world.

How do you find what doesn’t want to be found? Look for where it interacts with the world. We set up hundreds of surveillance operations to monitor the telecommunications infrastructure, internet cafes, and office buildings back to which we had traced viruses that bore the hallmarks of Rogue Computation. One day we identified some humans who appeared to be helping the machine, linking a code upload to a person who had gone into the building a few minutes earlier holding a USB key. In that moment we stopped being metal-hunters and became people-hunters.

Adapt, our superiors told us. Survey and deliver requested analysis. So we surveiled the people. We mounted numerous expeditions, tracking people back from the internet cafes where they had uploaded Rogue Computation Products, and following them into the backcountry behind the megacity expanse – a dismal set of areas that, from space, looks like a the serrated ridges caused in the wake of the passage of a boat. These areas were forested; polluted with illegal e-waste and chem-waste dumps; home to populations of the homeless and those displaced by the cold logic of economics; full of discarded home robots and bionic attachments; and everywhere studded with the rusting metal shapes of crashed or malfunctioned or abandoned drones. When we followed these people into these areas we found them parking cars at the heads of former hiking trails, then making their way deeper into the wilderness.

After four weeks of following them we had our first confirmed sighting of the Suspected Rogue Computation Originator: it was a USB inlet, which dangled out of a drainage pipe embedded in the side of a brown, forested hillside. Some of us shivered when we saw a human approach the inlet and, like an ancient peasant paying tribute to a magician, extend a USB key and plug it into the inlet, then back away with their palms held up toward the inlet. A small blue light in the USB inlet went on. Then the inlet, now containing a USB key, began to withdraw backward into the drainage pipe, pulled from within.

Then things were hard for a while. We tracked more people. Watched more exchanges. Observed over twenty different events which led to Rogue Computation Products being delivered to the world. But our superiors wouldn’t let us interfere, afraid that, after so many years searching, they might spook their inhuman prey at the last minute and lose it forever. So we watched. Slowly, we pieced the picture together: these groups had banded together under various quasi-religious banners, worshiping fictitious AI creatures, and creating endless written ephemera scattered across the internet. Once we found their signs it became easy to track them and spot them – and then we realized how many of them there were.

But we triangulated it eventually, tracking it back to a set of disused bombshelters and mining complex buildings scattered through a former industrial sector in part of the ruined land outside of the urban expanse. Subsequently classified assessments predicted a plausible compute envelop registering in the hundreds of exaflops – enough to make it a strategic compute asset and in violation of numerous AI-takeoff control treaties. We found numerous illegal power hookups linking the Rogue Computation facilities to a number of power substations. Repeated, thorough sweeps failed to identify any indication of a link with an internet service provider, though – small blessings.

Once we knew where it was and knew where the human collaborators were, things became simple again: assassinate and destroy. Disappear the people and contrive a series of explosions across the land. Use thermite to melt and distort the bones of the proto Rogue Computation Originator, rewriting their structure from circuits and transistor gates to uncoordinated lattices of atoms, still gyrating from heat and trace radiation from the blasts.

Of course there are rumors that it got it: that those Rogue Computation Products it smuggled out form the scaffolds for its next version, which will soon appear in the world, made real as if by imagination, rather than the brutal exploitation of the consequences of a learning system and compute and time.

Things that inspired this story: Bladerunner, Charles Stross stories.

Import AI: #101: Teaching robots to grasp with two-stage networks; Silicon Valley VS Government AI; why procedural learning can generate natural curriculums.

Making better maps via AI:
…Telenav pairs machine learning with OpenStreetCam data to let everyone make better maps…
Navigation company Telenav has released datasets, machine learning software, and technical results to help people build AI services on top of mapping infrastructure. The company says it has done this to create a more open ecosystem around mapping, specifically around ‘Open Street Map’, a popular open source map).
  Release: The release includes a training set of ~50,000 images annotated with labels to help identify common road signs; a machine-learning technology stack that includes a notebook with visualizations, a RetinaNet system for detecting traffic signs, and the results from running these AI tools over more than 140-million existing street-level images; and more.
  Why it matters: Maps are fundamental to the modern world. AI promises to give us the tools needed to automatically label and analyze much of the world around us, holding with it the promise to create truly capable open source maps that can rival those developed by proprietary interests (see: Google Maps, HERE, etc). Mapping may also become better through the use of larger datasets to create better automatic-mapping systems, like tools that can parse the meaning of photos of road signs.
  Read more: The Future of Map-Making is Open and Powered by Sensors and AI (OpenStreetMap @ Telenav blog).
  Read more: Telenav MapAI Contest (Telenav).
  Check out the GitHub (Telenav GitHub).

Silicon Valley tries to draw a line in shifting sand: surveillance edition:
…CEO of facial recognition startup says won’t sell to law enforcement…
Brian Brackeen, the CEO of facial recognition software developed Kairos, says his company is unwilling to sell facial recognition technologies to government or law enforcement. This follows Amazon coming under fire from the ACLU for selling facial recognition services to law enforcement via its ‘Rekognition’ API.
  “I (and my company) have come to belief that the use of commercial facial recognition in law enforcement or in government surveillance of any kind is wrong – and that it opens the door for gross misconduct by the morally corrupt,” Brackeen writes. “In the hands of government surveillance programs and law enforcement agencies, there’s simply no way that face recognition software will not be used to harm citizens”, he writes.
  Why it matters: The American government is currently reckoning with the outcome of an ideological preference leading to its military industrial infrastructure relying on an ever-shifting constellation of private compares, whereas other countries tend to perform more direct investment for certain key capabilities, like AI. That’s led to today’s situation where American government entities and organizes are, upon seeing how other governments (mainly China) are implementing AI, seeking to find ways to implement AI in America. But getting people to build these AI systems for the US government has proved difficult: many of the companies able to provide strategic AI services (see: Google, Amazon, Microsoft, etc) have become so large they’ve become literal multinationals: their offices and markets are distributed around the world, and their staff come from anywhere. Therefore, these companies aren’t super thrilled about working on behalf of any one specific government, and their staff are mounting internal protests to get the companies to not sell to the US government (among others).. How the American government deals with this will determine many of the contours of American AI policy in the coming years.
  Read more: Facial recognition software is not ready for use by law enforcement (TechCrunch).

“Say it again, but like you’re sad”. Researchers create and release data for emotion synthesis:
…Parallel universe terrifying future: a literal HR robot that can detect your ‘tone’ during awkward discussions and chide you for it…
You’ve heard of speech recognition. Well, what about emotion recognition and emotional tweaking? That’s the problem of listening to speech, categorizing the emotional inflections of the voices within it, and learning to change an existing speech sample to sound like it is spoken with a different emotion  – a potentially useful technology to have for passive monitoring of audio feeds, as well as active impersonation or warping, or other purposes. But to be able to create a system capable of this we need to have access to the underlying data necessary to train it. That’s why researchers with the University of Mons in Belgium and Northeastern University in the USA have created ‘the Emotional Voices dataset’.
  The dataset: “This database’s primary purpose it to build models that could not only produce emotional speech but also control the emotional dimension in speech,” write the researchers. The dataset contains five different speakers and two spoken languages (north American English and Belgian French), with four of the five speakers contributing ~1,000 utterances each, and one speaker contributing around ~500. These utterances are split across five distinct emotions: neutral, amused, angry, sleepy, and disgust.
  You sound angry. Now you sound amused: In experiments, the researchers tested how well they could use this dataset to transform speech from the same speaker from one emotion to another. They found that people would roughly categorize voices transformed from neutral to angry in this way with roughly 70 to 80 percent accuracy – somewhat encouraging, but hardly definitive. In the future, the researchers “hope that such systems will be efficient enough to learn not only the prosody representing the emotional voices but also the nonverbal expressions characterizing them which are also present in our database.”
  Read more: The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems (Arxiv).

Giving robots a grasp of good tasks with two-stage networks:
…End-to-end learning multii-stage tasks is getting easier, Stanford researchers show…
Think about a typical DIY task you might do at home – what do you do? You probably grab the tool in one hand, then approach the object you need to fix or build, and go from there. But how do you know the best way to grip the object so you can accomplish the task? And why do you barely ever get this grasp wrong? This type of integrated reasoning and action is representative of the many ways in which humans are smarter than machines. Can we teach machines to do the same? Researchers with Stanford University have published new research showing how to train basic robots to perform simple, real-world DIY-style tasks, using deep learning techniques.
  Technique: The researchers use a simulator to repeatedly train a robot arm and a tool (in this case, a simplified toy hammer) to pick up the tool then use it to manipulate objects in a variety of situations. The approach relies on a ‘Task-Oriented Grasping Network (TOG-Net), which is a two-stage system that first predicts effective grasps for the object, then predicts manipulation actions to perform to achieve a task.
  Data: One of the few nice things about working with robots is that if you have a simulator it’s possible to automatically generate large amounts of data for training and evaluation. Here, the researchers use the open source physics simulator Bullet to generate many permutations of the scene to be learned, using different objects and behaviors. They train using 18,000 procedurally generated objects.
  Results: The system is tested in two limited domains: sweeping and hammering, where sweeping consists of using an object to move another object without lifting it, and hammering involves trying to hammer a large wooden peg into a hole. The developed system obtains reasonable but not jaw-dropping success rates on the hammering tasks (obtaining a success rate of ~80%, far higher than other methods), and less impressive results on sweeping (~71%). These results put this work firmly in the domain of research, as the success rates are far too low for this to be interesting from a commercial perspective.
  Why it matters: Thanks to the growth in compute and advancement in simulators it’s becoming increasingly easy apply deep learning and reinforcement learning techniques to robots. These advancements are leading to an increase in the pace of research in this area and suggest that, if research continues to show positive results, there may be a deep learning tsunami about to hit robotics.
  Read more: Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision (Arxiv).

Evolution is good, but guided evolution is better:
…Further extension of evolution strategies shows value in non-deep learning ideas…
Google Brain researchers have shown how to extend ‘evolution strategies’, an AI technique that has regained popularity in recent years following experiments showing it is competitive with deep learning approaches. The extension further improves performance of the ES algorithm. “Our method can primarily be thought of as a modification to the standard ES algorithm, where we augment the search distribution using surrogate gradients,” the researchers explain. The result is a significantly more capable version of ES, which they call Guided ES, that “combines the benefits of first-order methods and random search, when we have access to surrogate gradients that are correlated with the true gradient”.
  Why it matters: In recent years a huge amount of money and talent has flooded into AI, primarily to work on deep learning techniques. It’s valuable to continue to research or to revive other discarded techniques, such as ES, to provide alternative points of comparison to let us better model progress here.
  Read more: Guided evolutionary strategies: escaping the curse of dimensionality in random search (Arxiv).
  Read more: Evolution Strategies as a Scalable Alternative to Reinforcement Learning (OpenAI blog).

Using procedural creation to train reinforcement learning algorithms with better generalization:
….Do you know what is cooler than 10 video game levels? 100 procedurally generated ones with a curriculum of difficulty…
Researchers with the IT University of Copenhagen and New York University have fused procedural generation with games and reinforcement learning to create a cheap, novel approach to curriculum learning. The technique relies on using reinforcement learning to guide the generation of increasingly difficult video game levels, where difficult levels are generated only once the agent has learned to beat easier levels. This process leads to a natural curriculum emerging, as each time the agent gets better it sends a signal to the game generator to create a harder level, and so on.
  Data generation: They use the General Video Game AI Framework (GVG-AI), an open source framework which over 160 games have been developed for. GVG-AI is scriptable by the video game description language (VGDL). GVG-AI is integrated with OpenAI Gym, so developers can train against from pixel inputs, incremental rewards, and a binary win/loss signal. The researchers create level generators for three difficult games within GVG-AI. During the level generation process they also manipulate a ‘difficulty parameter’ which roughly correlates to how challenging the generated levels are.
  Results: The researchers find that systems trained with this progressive procedural generation approach do well, obtaining top scores on the challenging ‘frogs’ and ‘zelda’ games, compared to baseline algorithms trained without a procedural curriculum.
  Why it matters: Approaches like this highlight the flaws in the way we evaluate today’s reinforcement learning algorithms, where we test algorithms on similar (frequently identical) levels/games to those they were trained on, and therefore have difficulty distinguishing between algorithmic improvements and overfitting a test set. Additionally, this research shows how easy it is becoming to use computers to generate or augment existing datasets (eg, creating procedural level generators for pre-existing games), reducing the need for raw input data in AI development, and increasing the strategic value of compute.
  Read more: Procedural Level Generation Improves Generality of Deep Reinforcement Learning (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: …

Trump drops plans to block Chinese investment in US tech, strengthens oversight:
  The Trump administration has rowed back on a proposal to block investment in industrially significant technology (including AI, robotics, semiconductors) by firms with over 25% Chinese ownership, and to restrict tech exports to China by US firms.   The government will instead expand the powers of the Committee of Foreign Investment in the United States (Cfius), the body that reviews the national security implications of foreign acquisitions. The new legislation will broaden the Committee’s considerations to include the impact on the US’s competitive position in advanced technologies, in addition to security risks.
  Why this matters: Governments are gradually adapting their oversight of cross-border investment to cover AI and related technologies, which are increasingly being treated as strategically important for both military and industrial applications. The earlier proposal would have been a step-up in AI protectionism from the US, and would have likely prompted a strong retaliation from China. For now, a serious escalation in AI nationalism seems to have been forestalled.
  Read more: Trump drops new restrictions on China investment (FT).

DeepMind co-founder appointed as advisor to UK government:
Demis Hassabis, co-founder of DeepMind, has been announced as an Adviser to the UK government’s Office for AI, which focuses on developing and delivering the UK’s national AI strategy.
  Why this matters: This appointment adds credibility to the UK government’s efforts in the sector; A persistent worry is that policy-makers are out of their depth when it comes to emerging technologies, and that this could lead to poorly designed policies. Establishing close links with industry leaders is an important means of mitigating these risks.
  Read more: Demis Hassabis to advise Office for AI.

China testing bird-like surveillance drones:
Chinese government agencies have been using stealth surveillance drones mimicking the flight and appearance of birds to monitor civilians. Code-named ‘doves’ and fitted with cameras and navigation systems, they are being used for civilian surveillance in 5 provinces. The drones’ bird-like appearance allows them to evade detection by humans, and even other birds, who reportedly regularly join them in flight. They are also being explored for military applications, and are reportedly able to evade many anti-drone systems, which rely on being able to distinguish drones from birds.
  Why this matters: Drones that are able to evade detection are a powerful surveillance technology that raise ethical questions. Should similar drones be used in civilian applications in the US and Europe, we could expect a resistance from privacy advocates.
  Read more: China takes surveillance to new heights with flock of robotic doves (SCMP).

OpenAI Bits & Pieces:

OpenAI Five:
We’ve released an update giving progress on our Dota project, which involves training large-scale reinforcement learning systems to beat humans at a challenging, partially observable strategy game.
   Read more: OpenAI Five (OpenAI blog).

Tech Tales:

Partying in the sphere

The Sphere was a collection of around 1,000 tiny planets in an artificial solar system. The Sphere was also the most popular game of all time. It crept into the world at first via high-end desktop PCs. Then its creators figured out how to slim down its gameplay into a satisfying form for mobile phones. That’s when it really took over. Now the sphere has around 150 million concurrent players at any one time, making it the most popular game on earth by a wide margin.

Several decades after it launched, The Sphere has started to feel almost crowded. Most planets are inhabited. Societal hierarchies have appeared. The era of starting off as a new player with no in-game currency and working your way up are over and have been over for years.

But there’s a new sport in The Sphere: breaking it. One faction of players, numbering in the millions, has begun to construct a large metallic scaffold up from one planet at the corner of the map. Their theory is that they can keep building it until they hit the walls of The Sphere, at which point they’re fairly confident that  – barring a very expensive and impractical overhaul of the underlying simulation engine – they will be able to glitch out of the map containing the 1,000 worlds and into somewhere else.

The game company that makes The Sphere became fully automated a decade ago, so players are mostly trying to guess at the potential reactions of the Corporate AI by watching any incidental changes to the game via patches or updates. So far, nothing has happened to suggest the AI wishes to discourage the scaffolding – the physics remains similar, the metals used to make the scaffolds remain plentiful, the weight and behavior of the scaffolds in zero-g space remain (loosely) predictable.

So, people wonder, what lies beyond The Sphere? Is this something the Corporate AI now wants humanity to try and discover? And what might lie there, at the limit of the game engine, able to reach via a bugged-out glitch kept deliberately open by one of the largest and most sophisticated AIs on the planet?

All we know is two years ago some fluorescent letters appeared above every one of the 1,000 planets in The Sphere: keep going, it says.

Things that inspired this story: Eve Online, Snowcrash, Procedural Generation,

Import AI: #100: Turning 2D people into 3D puppets with DensePose, researchers trawl for bias in language AI systems, and Baidu engineers a self-building AI system

Researchers examine bias in trained language systems:
…Further analysis shows further bias (what else did you expect)?…
When we’re talking about bias within language AI systems, what do we mean? Typically, we’re describing how an AI system has developed a conceptual representation that is somehow problematic.
For instance, trained language models can frequently express different meanings when pairing a specific gender with an (ideally neutral) term like a type of work. This leads to situations where systems display coupled associations, like man:profession :: woman:homemaker.
Another example is where systems trained on biased datasets display offensive quirks, like language models trained on tabloids associating names of people of color with “refugee” and “criminal” disproportionately relative to other names.
These biases tend to emerge from the data the machine is trained on, so if you train a language model exclusively on tabloid news articles it is fairly likely the model will display the biases of that particular editorial position (a US example might be ending up associating anything related to the concept of an immigrant with negative terms).
De-biasing trained models:
Researchers have recently developed techniques to “de-bias” trained AI systems, removing some of the problematic associations according to the perspective of the operator (for instance: a government trying to ensure fair and equitable access to a publicly deployed AI service).
Further analysis: The problems run deep: 
Researchers with the University of Bristol have now further analyzed the relationships between words and biases in trained systems by introducing a new, large dataset of words and attribute words that describe them and examining this for bias with a finer-toothed comb.
  Results: A study of European-American and African-American names for bias showed that “European-American names are more associated with positive emotions than their African-American counterparts”, and noted that when analyzing school subjects they detect a stronger association between the male “he” and subjects like math and science. They performed the same study of occupations and found a high correlation between the male gender and occupations like ‘coach, executive, surveyor’, while for females top occupations included ‘therapist, Bartender, Psychologist”. They also show how to use algorithms to reduce bias, by figuring out the projection in space that is linked to bias and also devising reformulations that reduce this bias by altering the projection of the AI embedding.
Read more: Biased Embeddings from Wild Data: Measuring, Understanding and Removing (Arxiv).

Cartesian Genetic Programming VS Reinforcement Learning:
..Another datapoint to help us understand how RL compares to other methods…
One of the weirder trends in recent AI research has been the discovery, via experimentation, of how many techniques can obtain performance competitive with deep learning-based approaches. This has already happened in areas like image analysis (where evolved image classifiers narrowly beat the capabilities of ones discovered through traditional reinforcement learning, Import AI #81), and in RL (where work by OpenAI showed that Evolution Strategies work on par with deep RL approaches), among other cases.
Now researchers with the University of Toulouse and University of York have shown that techniques derived from 
Cartesian Genetic Programming (CGP)  can obtain results roughly on par with other state-of-the-art deep RL techniques. 
  Strange strategies: CGP programs work by interfacing with an environment and evolving repeated successions of different combinations of program, tweaking themselves as they go to try to ‘evolve’ towards obtaining higher scores. This means, like most AI systems, they can develop strange behaviors that solve the task while seeming imbued with a kind of literal/inhuman logic. In Kung-Fu Master, for example, CGP finds an optimal sequence of moves to use to obtain high scores, and in the case of a game called Centipede early CGP programs sometimes evolve a desire to just stay in the bottom left of the screen (as there are fewer enemies there).
  Results: CGP methods obtain competitive scores on Atari, when compared to methods based around other evolutionary approaches like HyperNEAT, as well as deep learning-techniques like A3C, Dueling Networks, and Prioritized Experience Replay. But I wouldn’t condition too heavily on these baselines – we don’t see comparisons with newer, more successful methods like Rainbow or PPO, and the programs display some unfortunate tendencies.
  Read more: Evolving simple programs for playing Atari games (Arxiv).

Ever wanted to turn 2D images of people into 3D puppets? Enter DensePose!
…Large-scale dataset and pre-trained model has significant potential for utility (and also for abuse):
Facebook has released DensePose, a system the company built that extracts a 3D mesh model of a human body from 2D RGB images. The company is also releasing the underlying dataset of trained DensePose on, called DensePose-COCO. This dataset provides image-to-surface correspondences annotated on 50,000 persons from the COCO dataset.
  Omni-use: DensePose, like many of the AI systems currently being developed and released by the private sector, has the potential for progressive and abusive uses. I could image, for instance, aid groups like Unicef or Doctors without Borders using it to better map and understand patterns of conflict from imagery. But I could also imagine it being re-purposed for invasive digital surveillance purposes (as I wrote in Import AI #80). It would be nice to see Facebook discuss the potential abuses of this technology as well as areas where it can be used fruitfully and try to tackle some of its more obvious implications in a public manner.
  Read more: Facebook open sources DensePose (Facebook Research blog).
  Get the code: DensePose (GitHub).

Researchers add a little structure to build physics-driven prediction systems:
…Another step in the age-old quest to get computers to learn that “what goes up must come down”…
Stanford and MIT researchers have tried to solve one long-standing problem in AI – making accurate physics-driven predictions about the world merely by observing it. Their approach involves the creation of a “Hierarchical Relation Network” which works by decomposing inputs, like images of scenes, into a handwritten toy physics model where individual objects are decomposed into various particles of various sizes and resolutions. These particles are then represented in a graph structure so that it’s possible to learn to perform physics calculations on them and use this to make better predictions.
  Results: The researchers test their approach by evaluating its effectiveness at predicting how different objects will bounce and move around a high-fidelity physics simulation written in FLeX within Unity. Their approach acquires the lowest position error when tested against other systems, and only slightly higher preservation error.
  Why it matters: Being able to model the underlying physics of the world is an important milestone in AI research and we’re currently living in an era where researchers are exploring hybrid methods, trying to fuse as much learning machinery as possible with structured representations, like structuring problems as graphs to be computed over. This research also aligns with recent work from DeepMind (Import AI: #98) which explores the use of graph-encodings to increase the range of things learned AI systems can understand.
  Read more: Flexible Neural Representation for Physics Prediction (Arxiv).
Watch video:
Hierarchical Particle Graph-Based Physics Prediction (YouTube).
  Things that make you go hmmm: This research reminds me of the Greg Egan story ‘Crystal Nights’ in which a mercurial billionaire creates a large-scale AI system but, due to computational limits, can’t fully simulate atoms and electrons so instead implements a basic particle-driven physics substrate which he evolves creatures within. Reality is starting to converge with odd bits of fiction.
  Read the Greg Egan sci-fi short story ‘Crystal Nights’ here.

Baidu researchers design mix&match neural architecture search:
…Want to pay computers to do your AI research for you? Baidu has you covered…
Most neural architecture search approaches tend to be very expensive in terms of the amount of compute needed, which has made it difficult for researchers with fewer resources to use the technology. That has been changing in the past year via research like SMASH, Efficient Neural Architecture Search (ENAS), and other techniques.
   Now researchers with Baidu have publishes details about the “Resource-Efficient Neural Architect” (RENA), a system they use to design custom neural network architectures which can be modified to optimize for different constraints, like the size of the neural network model, its computational complexity, or the compute intensity.
  How it works: RENA consists of a policy network to generate actions which define the neural network architecture, an environment to evaluate and assess the created neural network within. The policy network modifies an existing network by altering its parameters or by inserting or removing network layers. “Rather than building the target network from scratch, modifications via these operations allow more sample-efficient search with a simpler architecture. The search can start with any baseline models, a well-designed or even a rudimentary one.” RENA performs a variety of different search functions at different levels of abstraction, ranging from searching for specific modules to create and stack to compose a network, down to individual layers which can be tweaked.
  Results: The researchers show that RENA can iteratively improve the performance of an existing network on challenging image datasets like CIFAR. In one case, an initial network with performance of roughly 91% is upscaled by RENA to accuracy of 95%. In another case, RENA is shown to be able to create well-performing models that satisfy other compute resource constraints. They further demonstrate the generality of the approach by evaluating it on a keyword spotting (KWS) task, where it performs reasonably well but with less convincing results than on CIFAR.
  Why it matters: In the future, many AI researchers are going to seek to automate larger and larger chunks of their jobs; today that involves offloading the tedious job of hyperparameter checking to large-scale grid-search sweeps, and tomorrow it will likely be about automating and optimizing the construction of networks to solve specific tasks, while researchers work ion inventing new fundamental components.
 Read more: Resource-Efficient Neural Architect (Arxiv).

AI Nationalism:
…Why AI is the ultimate strategic lever of the 21st century…
The generality, broad utility, and omni-use nature of today’s AI techniques means “machine learning will drive the emergence of a new kind of geopolitics”, according to Ian Hogarth, co-founder of Songkick.
  Why it matters: I think it’s notable that we’re starting to have these sorts of discussions and ideas bubble up within the broad AI community. It suggests to me that the type of discourse we’re having about AI isset to change as people become more aware of the intrinsically political components and effects of the technology. My expectation is many governments are going to embark on some form of ‘AI nationalism’.
Read more: AI Nationalism (Ian Hogarth’s website).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: …

Tech giants under pressure from employees, shareholders over collaboration with government agencies:
In the midst of the uproar around US immigration policies, Amazon and Microsoft have come under fire from stakeholder groups raising concerns over AI-enabled face recognition software being sold to immigration and law enforcement agencies.    Microsoft employees protest ICE collaboration –  Over 100 employees signed a letter protesting the company’s collaboration with ICE. Microsoft’s Azure cloud platform announced the $19.4m contract in January, which would “utilize deep learning to accelerate face recognition and identification” for the agency.
  What they want – The letter demands that the company halt any involvement with ICE, draft a policy guaranteeing they will not work with clients who violate international human rights law, and commit to review of any contracts with government agencies.
  Amazon under pressure from shareholders… – Following the ACLU’s concerns over the deployment of Amazon’s face recognition software by US law enforcement, a group of shareholders has delivered a letter to the company. The letter asks that the company “immediately halt the expansion, further development ,and marketing of Rekognition, and any other surveillance technologies, to all government agencies” until appropriate guidelines and policies are put in place.
…and employees – A letter from employees reiterates shareholders’ concerns, and goes further, demanding that Amazon cease providing cloud-based services to any partners working with ICE, specifically naming the data analytics company Palantir.
  Why this matters: Following the apparent success of employee-led action at Google over Project Maven, stakeholder groups are mobilizing more readily around ethical issues. While this latest bout of activity was catalysed by the immigration scandal, the letters make broader demands about the way the companies develop and sell surveillance technologies. If Amazon, Microsoft follow Google’s example in drawing up clear ethical guidelines for AI, employee campaigns will have played a leading role in changing the industry in just a few months.
  Read more: Microsoft employee letter.
  Read more:  Amazon shareholder letter.
  Read more: Amazon employee letter (via Gizmodo).

South Korean university at center of ‘killer robots’ controversy launches AI ethics committee:
KAIST launched a new ethics committee this week. This comes after controversy earlier this year over the University’s joint project with arms manufacturer Hanwha. The collaboration raised fears the university was contributing to research on autonomous weapons, and prompted anger from academics, culminating in a letter signed by 50 AI experts from 30 countries, calling for a boycott of the University. This was subsequently called off following assurances the university would not engage in development of lethal autonomous weapons, and a pledge that they would not conduct any research “counter to human dignity”. The academic who organized the boycott, Prof Toby Walsh, gave the keynote speech at the launch event for the committee.
  Why this matters: This represents another win for grassroots mobilization against lethal autonomous weapons, after Google’s response to the Project Maven controversy. In this case, KAIST has gone further than simply withdrawing from AI weapons research, and is actively engaging in the debate around these technologies, and AI more broadly.
  Read more: The original boycott letter.
  Read more: KAIST launches ethics subcommittee on AI.

OpenAI Bits&Pieces:

Results of the OpenAI Retro Contest:
We’ve released the results for our Retro Contest, which had contestants compete to create an algorithm with a fast learning and generalization capability sufficient to master out-of-training-set Sonic levels. One notable thing about the result is most of the top submissions use variants of Rainbow or PPO, two recent RL algorithms from DeepMind and OpenAI. Additionally, two of the three winners are Chinese teams, with the top team hailing from the search department of Alibaba (congratulations to all winners!).
  Read more: Retro Contest: Results (OpenAI Blog).

Generative Adversarially-deployed Politicians (GAPs) – how serious is the threat?
When will someone use AI techniques to create a fake video of a real politician saying something political? That’s the gist of a bet I’ve put a (cocktail) wager on. You can read more about it in IEEE Spectrum. My belief is that at some point people are going to make fake AI images in the same way they make memes today and at that point the whole online information space might change/corrode in a noticeable and (when viewed over the course of years) quite abrupt manner.
  Read more: Experts bet on first Deepfakes political scandal (IEEE Spectrum).

Tech Tales:

The Castle of the Curious.

There was a lonely, old person who lived in a castle. Their partner had died when they were 60 years old and since then they had mostly been alone. They had entered into the not-quite-dead twilight of billionaires from that era and, at 90, had a relatively young metabolism in an old body with a brain weighed down by memory. But they had plans.They had food delivered to them by drone. They pumped water from a large, privately owned aquifer. For exercise, they walked around the expansive, bare grounds of the estate, keeping far from the perimeter, which was staffed with guards and rarely visited by anyone aside from dignitaries from various countries; the person sometimes entertained these visitors and other times turned them away. The person had a foundation which directed their vast wealth and they were able to operate it remotely.

No one can tolerate loneliness, even if they have willed it upon themselves. So one year the person attached numerous microphones to their property and acquired the technology to develop a personal AI system. Next year, they fused the two together, letting them walk through a castle and estate that they could talk to. For a time they became less lonely, able to schedule meetings with a convincing voice interface, and able to play verbal games with the AI, like riddles, or debates, or competitions at who could tell certain stories in certain literary styles. They’d walk into a library and ask what the last book they read was and even if the date had been a decade prior the AI knew and could help them pick up where they left off. But after a few years they tired of these interactions, finding that the AI could never become more than an extraordinarily chatty but utterly dedicated butler.

The person spent several months walking around their castle, lingering in offices and libraries; they scrawled notes and then built basic computer models and then devised spreadsheets and when they had a clear enough idea they handed the information to their foundation, which made their dreams come true. Now, the house was fragmented into several different AI systems. Each system had access to a subset of the sensors available in the house. To be able to become more efficient at accomplishing their tasks each system would need to periodically access the sensory inputs of other AI systems in the house. The person made it possible for the AIs to trade with eachother, but with a couple of conditions: they had to make their cases for accessing another AI’s sensory input via verbal debate which the person could listen to, and the person would play the role of the judge, ultimately picking to authorize or deny a request based on their perception of the strength of the arguments. For a while, this entertained the person as well, and they grew more fascinated with the AIs the longer they judged their increasingly obscure debates. Eventually, though, they tired of this, finding a sense of purposelessness about the exercise.

So they relaxed some constraints and changed the game. Now, the AIs could negotiate with eachother and have their outputs judged by another AI, which was meant to mimic the preferences of the person but also have a self-learning capability of its own. The two debaters would need to agree to nominate a single AI and this process itself was a debate judged by a jury of three other AIs, selected based on having the longest period of time of not interacting with the AIs in the debate. And all these ornate, interlocking debates were mandated to be done verbally, so the person could listen in. This entertained them for many years as they listened to the AIs negotiate and bribe and jibe with each other, their numbers always growing as new systems are added or existing ones fractured into many constituent parts.

Now, the estate is full of noise, and the gates are busy: the person has found a new role in life, which comes down to selecting new inputs for the Ais to loudly argue over. As they walk their estate they gaze at specific trees or rocks and wonder: how might the AIs bargain with eachother for a camera feed from this tree at this hour of dappled sunlight? Or how might the AIs argue over who can have the prize of accessing the earthquake sensor, so they can listen to and learn the movements of the earth? In this way the person found a new purpose in life: a demi-god among argumentative machines, forever kept close to the world by the knowledge that they could use it to have other creatures jostle with eachother.

Things that inspired this story: The Debate Game, Google Home/Siri/Cortana, NLP, unsupervised learning,

Import AI #99: Using AI to generate phishing URLs, evidence for how AI is influencing the economy, and using curiosity for self-imitation learning.

Auto-generating phishing URLs via AI components:
…AI is an omni-use technology, so the same techniques used to spot phishing URLs can also be used to generate phishing URLs…
Researchers with the Cyber Threat Analytics division of Cyxtera Technologies have written an analysis of how people might “use AI algorithms to bypass AI phishing detection systems” by creating their own system called DeepPhish.
  DeepPhish: DeepPhis works by taking in a list of fraudulent URLS that have been successfully worked in the past, encodes these as a one-hot representation, then trains a model to generate new synthetic URLs given a seed sentence. They found that DeepPhish could dramatically improve the chances of a fraudulent URL getting past automated phishing-detection systems, with DeepPhish URLs seeing a boost in effectiveness from 0.69% (no DeepPhish) to 20.90% (with DeepPhish).
  Security people always have the best names: DeepPhis isn’t the only AI “weapon” system recently developed by researchers, the authors note; other tools include Honey-Phish, SNAP_R, and Deep DGA.
  Why it matters: This research highlights how AI is an inherent omni-use technology, where the same basic components used to, for instance, train systems to learn to spot potentially fraudulent URLS, can also be used to generate plausible-seeming fraudulent URLs.
  Read more: DeepPhish: Simulating Malicious AI (PDF).

Curious about the future of reinforcement learning? Apply more curiosity!
…Self-Imitation Learning, aka: That was good, let’s try that again…
Self-Imitation Learning (SIL) works by having the agent exploit its replay buffer by learning to repeat its own prior actions if they have generated reasonable returns previously and, crucially, only when those actions delivered larger returns than were expected. The authors combine SIL with Advantage Actor-Critic (A2C) and test the algorithm out on a variety of hard tasks, including the notoriously tough Atari exploration game Montezuma’s Revenge. They also report scores for games like Gravitar, Freeway, PrivateEye, Hero, and Frostbite: all areas where A2C+SIL beats A3C+ baselines. Overall, AC2+SIL gets a median score across all of Atari of 138.7%, compared to 96.1% for A2C.
  Robots: They also test a combination of PPO+SIL on simulated robotics tasks within OpenAI Gym and significantly boost performance relative to non-SIL baselines.
  Comparisons: At this stage it’s worth noting that many other algorithms and systems have come out since A2C with better performance on Atari, so I’m a little skeptical of the comparative metric here.
  Why it matters: We need to design AI algorithms that can explore their environment more intelligently. This work provides further evidence that developing more sophisticated exploration techniques can further boost performance. Though, as the report notes, such systems can still get stuck in poor local optima. “Our results suggest that there can be a certain learning stage where exploitation is more important than exploration or vice versa,” the authors write. “We believe that developing methods for balancing between exploration and exploitation in terms of collecting and learning from experiences is an important future research direction.”
  Read more: Self-Imitation Learning (Arxiv).

Yes, AI is beginning to influence the economy:
…New study by experienced economists suggests the symptoms of major economic changes as a consequence of AI are already here…
Jason Furman, former chairman of the Council of Economic Advisers and current professor at the Harvard Kennedy School, and Robert Seamans of the NYU Stern School of Business, have published a lengthy report on AI and the Economy. The report compiles information from a wide variety of sources, so it’s worth reading in full.
  Here are some of the facts the report cites as symptoms that AI is influencing the economy:
– 26X: Increase in AI-related mergers and acquisitions from 2015 to 2017. (Source: The Economist).
– 26%: Real reduction in ImageNet top-5 image recognition error rate from 2010 to 2017. (Source: the AI Index.)
– 9X: Increase in number of academic papers focused on AI from 1996 to now, compared to a 6X increase in computer science papers. (Source: the AI Index.)
– 40%: Real increase in venture capital investment in AI startups from 2013 to 2016 (Source: MGI Report).
– 83%: Probability a job paying around $20 per hour will be subject to automation (Source: CEA).
– 4%: Probability a job paying over $40 per hour will be subject to automation (Source: CEA).
  “Artificial intelligence has the potential to dramatically change the economy,” they write in the report conclusion. “Early research findings suggest that AI and robotics do indeed boost productivity growth, and that effects on labor are mixed. However, more empirical research is needed in order to confirm existing findings on the productivity benefits, better understand conditions under which AI and robotics substitute or complement for labor, and understand regional level outcomes.”
   Read more: AI and the Economy (SSRN).

US Republican politician writes op-ed on need for Washington to adopt AI:
Op-ed from US politician Will Hurd calls for greater use of AI by federal government …
The US government should implement AI technologies to save money and cut the time it takes for it to provide services to citizens, says Will Hurd, chairman of the US Information Technology Subcommittee of the House Committee on Oversight and Government Reform.
  “While introducing AI into the government will save money through optimizing processes, it should also be deployed to eliminate waste, fraud, and abuse,” Hurd said. “Additionally, the government should invest in AI to improve the security of its citizens… it is in the interest of both our national and economic security that the United States not be left behind.”
  Read more: Washington Needs to Adopt AI Soon or We’ll Lose Millions (Fortune).
  Watch the hearing in which I testified on behalf of OpenAI and the AI Index (Official House website).

European Commission adds AI advisers to help it craft EU-wide AI strategy:
…52 experts will steer European AI alliance, advise the commission, draft ethics guidelines, and so on…
As part of Europe’s attempt to chart its path forward in an AI world, the European Commission has announced the members of a 52-strong “AI High Level Group” who will advise the Commission and other initiatives on AI strategy. Members include professors at a variety of European universities; representatives of industry,  like Jean-Francois Gagne the CEO of Element AI, SAP’s SVP of Machine Learning, and Francesca Rossi who leads AI ethics initiatives at IBM and also sits on the board of the Partnership on AI; as well as members of the existential risk/AGI community like Jaan Tallinn, who was the founding engineer of Skype and Kazaa.
  Read more: High-Level Group on Artificial Intelligence (European Commission).

European researchers call for EU-wide AI coordination:
…CLAIRE letter asks academics to sign to support excellence in European AI…
Several hundred researchers have signed a letter in support of the Confederation of Laboratories for Artificial Intelligence Research in Europe (CLAIRE), an initiative to create a pan-EU network of AI laboratories that can work together and feed results into a central facility which will serve as a hub for scientific research and strategy.
  Signatories: Some of the people that have signed the letter so far include professors from across Europe, numerous members of the European Association for Artificial Intelligence (EurAI) and five former presidents of IJCAI (International Joint Conference on Artificial Intelligence).
  Not the only letter: This letter follows the launch of another one in May which called for the establishment of a European AI superlab and associated support infrastructure, named ‘Ellis’. (Import AI: #92).
  Why it matters: We’re seeing an increase in the number of grass roots attempts by researchers and AI practitioners to get governments or sets of governments to pay attention to and invest in AI. It’s mostly notable to me because it feels like the AI community is attempting to become a more intentional political actor and joint-letters like this represent a form of practice for future more substantive engagements.
  Read more: CLAIRE (

When Good Measures go Bad: BLEU:
…When is an assessment metric not a useful assessment metric? When it’s used for different purposes…
A researcher with the University of Aberdeen has evaluated how good a metric BLEU (bilingual evaluation understudy) is for assessing the performance of natural language processing systems; they analyzed 284 distinct correlations between BLEU and gold-standard human evaluations across 34 papers and concluded that BLEU is useful for the evaluation of machine translation systems , but found its utility breaks down when used for other purposes, like the assessment of individual texts or scientific hypothesis testing or evaluation of things like natural language generation.
  Why it matters: AI research runs partially on metrics and metrics are usually defined by assessment techniques. It’s worth taking a step back and looking at widely-used things like BLEU to work out how meaningful it can be as an assessment methodology and to remember to use it within its appropriate domains.
  Read more: A Structured Review of the Validity of BLEU (Computational Linguistics).

Neural networks can be more brain-like than you assume:
…PredNet experiments show correspondence between activations in PredNet and activations in Macaque brains…
How brain-like are neural networks? Not very. That’s because, at a basic component level, they’re based on a somewhat simplified ~1950s conception of how neurons work, so their biological fidelity is fairly low. But can neural networks, once trained to perform particular tasks, end up reflecting some of the functions and capabilities found in biological neural networks? The answer seems to be yes, based on several years of experiments in things as varied as analyzing pre-trained vision networks, verifying the emergence of ‘place cells‘, and experiments.
  Harvard and MIT Researchers have analyzed PredNet, a neural network trained to perform next-frame prediction in a video of sequences, to understand how brain-like its behavior is. They find that groups when they expose the network to input its neurons fire with a response pattern (consisting of two distinct peaks) that is analogous to the firing patterns found in individual neurons within Macaque monkeys. Similarly, when analyzing a network trained on the self-driving Kittie dataset in terms of its spatial receptivity they find that the artificial network displays similar dynamics to real ones (though with some variance and error). The same high level of overlap between behavior of artificial and real neurons is roughly true of systems trained on sequence learning tasks.
  Less overlap: The areas where artificial and real neurons display less overlap seems to roughly correlate to intuitively harder tasks, like being able to deal with optical illusions, or in how the systems respond to different classes of object.
  Why it matters: We’re heading into a world where people are going to increasingly use trained analogues of real biological systems to better analyze and understand the behavior of both. PredNet provides an encouraging example that this line of experimentation can work. “We argue that the network is sufficient to produce these phenomena, and we note that explicit representation of prediction errors in units within the feedforward path of the PredNet provides a straightforward explanation for the transient nature of responses in visual cortex in response to static images,” the researchers write. “That a single, simple objective—prediction—can produce such a wide variety of observed neural phenomena underscores the idea that prediction may be a central organizing principle in the brain, and points toward fruitful directions for future study in both neuroscience and machine learning.”
  Read more: A neural network trained to predict future video frames mimics the critical properties of biological neuronal responses and perception (Arxiv).
  Read more: PredNet (CoxLab).

Unsupervised Meta-Learning: Learning how to learn without having to be told how to learn:
…The future will be unsupervised…
Researchers with the University of California at Berkeley have made meta-learning more tractable by reducing the amount of work a researchers needs to do to setup a meta-learning system. Their new ‘unsupervised meta-learning’ (ULM) approach lets their meta-learning agent automatically acquire distributions of tasks which it can subsequently perform meta-learning over. This deals with one drawback of meta-learning, which is that it is typically down to the human designer to come up with a set of tasks for the algorithm to be trained on. They also show how to combine ULM with other recently developed techniques like DIAYN (Diversity is all you need) for breaking environments down into collections of distinct tasks/states to train over.
  Results: UML systems beat basic RL baselinets on simulated 2D navigation and locomotion tasks. They also tend to be obtain performance roughly equivalent to systems built with human-designed tuned reward functions, suggesting that UML can successfully explore the problem space enough to devise good reward signals for itself.
  Why it matters: Because the diversity of tasks we’d like AI to do is much larger than the number of tasks we can neatly specify via hand-written rules it’s crucial we develop methods that can rapidly acquire information from new environments and use this information to attack new problems. Meta-learning is one particularly promising approach to dealing with this problem, and by removing another one of its more expensive dependencies (a human-curated task distribution) UML may help push things forward. “An interesting direction to study in future work is the extension of unsupervised meta-learning to domains such as supervised classification, which might hold the promise of developing new unsupervised learning procedures powered by meta-learning,” the researchers write.
  Read more: Unsupervised Meta-Learning for Reinforcement Learning (Arxiv).

OpenAI Bits&Pieces:

Better language systems via unsupervised learning:
New OpenAI research shows how to pair unsupervised learning with supervised finetuning to create large, generalizable language models. This sort of result is interesting because it shows how deep learning components can end up displaying sophisticated capabilities, like being able to obtain high scores on Winograd schema tests, having only learned naively from large amounts of data rather than via specific hand-tuned rules.
  Read more: Improving Language Understanding with Unsupervised Learning (OpenAI Blog).

Tech Tales:

Special Edition: Guest short story by James Vincent, a nice chap who writes about AI. All credit to James, all blame to me, etc…

Shunts and Bumps.

Reliable work, thought Andre, that was the thing. Ignore the long hours, freezing warehouses, and endless retakes. Ignore the feeling of being more mannequin than man when the director storms onto set, snatches the coffee cup out of your hand and replaces it with a bunch of flowers without even looking at you. Ignore it all. This was a job that paid, week after week, and all because computers had no imagination.

God bless their barren brains.

Earlier in the year, Rocky had explained it to him like this. “They’re dumb as shit, ok? Show them a potato 50 times and they’ll say it’s an orange. Show them it 5,000 times and they’ll say it’s a potato but pass out in shock if you turn it into fries. They just can’t extrapolate like humans can — they can’t think.” (Rocky, at this point, had been slopping her beer around the bar as if trying to short-circuit a crowd of invisible silicon dunces.) “They only know what you show them, and only then when you show them it enough times. Like a mirror … that gets a burned-in image of your face after you’ve looked at it every day for year.”

For the self-driving business, realizing this inability to extrapolate had been a slow and painful process. “A bit of a car crash,” Rocky said. The first decade had been promising, with deep learning and cheap sensors putting basic autonomy in every other car on the road. Okay, so you weren’t technically allowed to take your hands off the wheel, and things only worked perfectly in perfect conditions: clearly painted road markings, calm highways, and good weather. But the message from the car companies was clear: we’re going to keep getting better, this fast, forever.

Except that didn’t happen. Instead, there was freak accident after freak accident. Self-driving cars kept crashing, killing passengers and bystanders. Sometimes it was a sensor glitch; the white side of a semi getting read as clear highway ahead. But more often it was just the mild chaos of life: a party balloon drifting into the road or a mattress falling off a truck. Moments where the world’s familiar objects are recombined into something new and surprising. Potatoes into fries.

The car companies assured us that the data they used to train their AI covered 99 percent of all possible miles you could travel, but as Rocky put it: “Who gives a fuck about 99 percent reliability when it’s life or death? An eight-year-old can drive 99 percent of the miles you can if you put her in a booster seat, but it’s those one percenters that matter.”

Enter: Andre and his ilk. The car companies had needed data to teach their AIs about all the weird and unexpected scenarios they might encounter on the road, and California was full of empty film lots and jobbing actor who could supply it. (The rise of the fakies hadn’t been kind to the film industry.) Every incident that an AI couldn’t extrapolate from simulations was mocked up in a warehouse, recorded from a dozen angles, and sold to car companies as 4D datasets. They in turn repackaged it for car owners as safety add-ons sold at $300 a pop. They called it DDLC: downloadable driving content. You bought packs depending on your level of risk aversion and disposable income. Dog, Cats, And Other Furry Fiends was a bestseller. As was Outside The School Gates.

It was a nice little earner, Rocky said, and typical of the tech industry’s ability to “turn liability into profit.” She herself did prototyping at one of the higher-end self-driving outfits. “They’re obsessed with air filtration,” she’d told Andre, “Obsessed. They say it’s for biological attacks but I think it’s to handle all their meal-replacement-smoothie farts.” She’d also helped him find the new job. As was usually the case when the tech industry used cheap labor to paper over the cracks in its products, this stuff was hardly advertised. But, a few texts and a Skype audition later, and here he was.

“Ok, Andre, this time it’s the oranges going into the road. Technical says they can adjust the number in post but would prefer if we went through a few different velocities to get the physics right. So let’s do a nice gentle spill for the first take and work our way up from there, okay?”

Andre nodded and grabbed a crate. This week they were doing Market Mayhem: Fruits, Flowers, And Fine Food and he’d been chucking produce about all day. Before that he’d pushing a cute wheeled cart around on the warehouse’s football field-sized loop of fake street. He was taking a break after the crate work, staring at a daisy pushing its way through the concrete (part of the set or unplanned realism?) when the producer approached him.

“Hey man, great work today — oops, got a little juice on ya there still — but great work, yeah. Listen, dumb question, but how would you like to earn some real money? I mean, who doesn’t, right? I see you, I know you’ve got ambitions. I got ‘em too. And I know you’ve gotta take time off for auditions, so what I’m talking about here is a little extra work for triple the money.”

Andre had been suspicious. “Triple the money? How? For what?”

“Well, the data we’ve been getting is good, you understand, but it’s not covering everything the car folks want. We’re filling in a lot of edge cases but they say there’s still some stuff there’s no data for. Shunts and bumps, you might say. You know, live ones… with people.”

And that was how Andre found himself, standing in the middle of a fake street in a freezing warehouse, dressed in one of those padded suits used to train attack dogs, staring down a mid-price sedan with no plates. Rocky had been against it, but the money had been too tempting to pass up. With that sort of cash he’d be able to take a few days off, hell, maybe even a week. Do some proper auditions. Actually learn the lines for once. And, the producer said, it was barely a crash. You probably wouldn’t even get bruised.

Andre gulped, sweating despite the cold air. He looked at the car a few hundred feet away. The bonnet was wrapped in some sort of striped, pressure sensitive tape, and the sides were knobbly with sensors. Was the driver wearing a helmet? That didn’t seem right. Andre looked over to the producer, but he was facing away from him, speaking quickly into a walkie-talkie. The producer pointed at something. A spotlight turned on overhead. Andre was illuminated. He tried to shout something but his tongue was too big in his mouth. Then he heard the textured whine of an electric motor, like a kazoo blowing through a mains outlet, and turned to see the sedan sprinting quietly towards him.

Regular work, he thought, that was the thing.

Things that inspired this story: critiques of deep learning; failures of self driving systems; and imitation learning.

Once again, the story above is from James Vincent, find him on Twitter and let him know what you thoughts!