Import AI

Import AI 107: Training ImageNet in 18 minutes for $40; courteous self-driving cars; and Google evolves alternatives to backprop

Better robot cars through programmed courteousness:
…Defining polite behaviors leads to better driving for everyone…
How will self-driving cars and humans interact? That’s a difficult question, since AI systems tend to behave differently to humans when trying to solve tasks. Now researchers with the University of California at Berkeley have tried to come up with a way to program ‘courteous’ behavior into self-driving cars to make them easier for humans to interact with. Their work deals with situations where humans and cars must anticipate each other’s actions, like when both approach an intersection, or change lanes. “We focus on what the robot should optimize in such situations, particularly if we consider the fact that humans are not perfectly rational”, they write.
  Programmed courteousness: Because “humans … weight losses higher than gains when evaluating their actions” the researchers formalize the relationship between robot-driven and human-driven cars with this constraint, and develop a theoretical framework to let the car predict actions it can take to benefit the driving experience of a human. The researchers test their courteous approach by simulating scenarios involving simulated humans and self-driving cars. These include: changing lanes, in which more courteous cars lead to less inconvenience for the human; and turning left, in which the self-driving car will wait for the human to pass at an intersection and thereby reduce disruption. The results show that cars programmed with a sense of courteousness tend to improve the experience of human’s driving on their roads, and the higher the scientist sets the courteousness parameter, the better the experience the human drivers have.
  Multiple agents: The researchers also observe how courteousness works in complex situations that involve multiple cars. In one scenario “an interesting behavior emerges: the autonomous car first backs up to block the third agent (the following car) from interrupting the human driver until the human driver safely passes them, and then the robot car finishes its task. This displays truly collaborative behavior, and only happens with high enough weight on the courtesy term. This may not be practical for real on-road driving, but it enables the design of highly courteous robots in some particular scenarios where human have higher priority over all other autonomous agents,” they write.
  Why it matters: We’re heading into a future where we deploy autonomous systems into the same environments as humans, so figuring out how to create AI systems that can adapt to human behaviors and account for the peculiarities of people will speed uptake. In the long term, development of such systems may also give us a better sense of how humans themselves behave – in this paper, the researchers make a preliminary attempt at this by modeling how well their courteousness techniques predict real human behaviors.
   Read more: Courteous Autonomous Cars (Arxiv).

Backprop is great, but have you tried BACKPROP EVOLUTION?
Googlers try to evolve replacement to the widely used gradient calculation technique
Google researchers have used evolution to try and find a replacement for back-propagation, one of the fundamental algorithms used in today’s neural network-based systems. The Google researchers try to do this by offloading the task of figuring out such an alternative to computers. They do this by designing a domain-specific language (DSL) which describes mathematical formulas like back-propagation in functional terms, then they use this DSL to search through the mathematical space to find improved versions of the algorithm. This lets them run an evolutionary search process where they use the DSL to automatically explore the mathematical space of such algorithms and periodically evaluated evolved candidates by using candidate algorithms to train a Wide ResNet with 16 layers on the CIFAR-10 dataset.
  Evaluation: Following the evolution search, the researchers evaluate well-performing algorithms on a Wide ResNet (the same one used during the evolution phase) as well as a larger ResNet, both tested for 20 epochs; they also evaluate performance in longer training regimes by testing performance on a ResNet for 100 epochs.
  So, did they come up with something better than back-propagation? Sort of: The best performing algorithms found through this evolutionary search display faster initial training times than back-propagation, but when evaluated for 100 epochs show the same performance as methods trained with traditional back-propagation. “The previous search experiment finds update equations that work well at the beginning of training but do not outperform back-propagation at convergence. The latter result is potentially due to the mismatch between the search and the testing regimes, since the search used 20 epochs to train child models whereas the test regime uses 100 epochs,” they write. That initial speedup could hold some advantages, but the method will need to be proved out more at larger epochs to see if it can develop something that scales better to larger-than-trained-upon temporal sequences.
  Why it matters: This work fits within a pattern displayed by some AI researchers – typically ones who work at organizations with very large quantities of computers – of trying to evolve algorithmic breakthroughs, rather than designing them themselves. This sort of research seems of a different kind to other research, seeing people try to offload the work of problem solving to computers, and instead use their scientific skills to set up the parameters of the evolutionary process that might find a solution. It remains to be seen how effective these techniques are in practice, but it’s a definite trend. The question is whether the relative computational inefficiency of such techniques is worth the trade-off.
   Read more: Backprop Evolution (Arxiv).

Think your image classifier is tough? Test it on the Adversarial Vision Challenge:
…Challenge tests participants’ ability to create more powerful adversarial inputs…
A team of researchers from the University of Tubingen, Google Brain, Pennsylvania State University and EPFL have created the ‘Adversarial Vision Challenge’, which “is designed to facilitate measurable progress towards robust machine vision models and more generally applicable adversarial attacks”. Adversarial attacks are like optical illusions for machine learning systems, altering the pixels of an image in a way indistinguishable to human eyes but which causes the deployed AI classifier to label an image incorrectly.
  The tasks: Participants will be evaluated on their skills at three tasks: generating untargeted adversarial examples (given a sample image and access to a model, try to create an adversarial image which is superficially identical to the sample image but is incorrectly labelled); generating targeted adversarial examples (given a sample image, a target label, and the model, try to force the sample image to be mislabeled with the target label; for example, getting an image of a $10 cheque re-classified as a $10,000 cheque); and increasing the size of minimum adversarial examples (trying to create the most severe adversarial examples that are still superficially similar to the provided image).
  Dataset used: The competition uses the Tiny ImageNet dataset, which contains 100,000 images across 200 classes from ImageNet, scaled down to 64X64 pixel dimensions, making the dataset cheaper and easier to test models on.
  Details: Submissions are open now. Deadline for final submissions is November 1st 2018. Amazon Web Services is sponsoring roughly $65,000 worth of compute resources which will be used to evaluate competition entries.
  Why it matters: Adversarial examples are one of the known-unknown dangers of machine learning; we know they exist but we’re not quite sure in what domains they work well or poorly in and how severe they are. There’s a significant amount of theoretical research being done into them, and it’s helpful for that to be paired with empirical evaluations like this competition. As Brarath Ramsundar says: “$40 for ImageNet means that $40 to train high-class microscopy, medical imaging models“.
    Read more: Adversarial Vision Challenge (Arxiv).

Training ImageNet in 18 minutes from Fast.ai & DIU:
…Fast ImageNet training at an affordable price…
Researchers and alumni from Fast.ai and Yaroslav Bulatov of DIU have managed to train ImageNet in 18 minutes for a price of $40. That’s significant because it means it’s now possible for pretty much anyone to train a large-scale neural network on a significantly-sized dataset for about $40 bucks an experimental run, making it relatively cheap for individual researchers to benchmark their systems against widely used computationally-intensive benchmarks.
  How they did it: To obtain this time the team developed infrastructure to let them easily run multiple experiments across machines hosted on public clouds, while also automatically bidding on AWS ‘spot instance’ pricing to obtain maximally-cheap compute-per-dollar.
  Keep It Simple, Student (KISS): Many organizations use sophisticated distributed training systems to run large compute jobs. The fast.ai team did this by using the simplest possible approaches across their infrastructure, “avoiding container technologies like Docker, or distributed compute systems like Horovod. We did not use a complex cluster architecture with separate parameter servers, storage arrays, cluster management nodes, etc, but just a single instance type with regular EBS storage volumes.”
  Scheduler: They used a system called ‘nexus-scheduler’ to manage the machines. Nexus-scheduler was built by Yaroslav Butov, a former OpenAI and Google employee. This system, fast.ai says, “was inspired by Yaroslav’s experience running machine learning experiments on Google’s Borg system”. (In all likelihood, this means the system is somewhat akin to Google’s own Kubernetes, an open source system inspired by Google’s internal Borg and Omega schedulers.
  Code improvements: Along with designing efficient infrastructure, Fast.ai also implemented some clever AI tweaks to traditional training approaches to maximize training efficiency and improve learning and convergence. These tricks included: implementing a training system that can work with variable image sizes, which let them crop and scale images if they were rectangular, for instance, implementing this gave them “an immediate speedup of 23% in the amount of time it took to reach the benchmark accuracy of 93%”; they also used progressive resizing and batch sizes to scale the amount of data ingested and processed by their system during training, letting them speed early convergence by testing on a variety of low-res images, and fine-tune it later during training by exposing it to higher-definition images to learn fine-grained classification distinctions.
  Big compute != better compute: Jeremy Howard of fast.ai and I have a different interpretation of the importance (or lack thereof) of compute and AI, and this post discusses one of my recent comments. I’m going to try to write more in the future – perhaps a standalone post – on why I think AI+larger compute usage is perhaps significant, and lay out some verifiable predictions to help flesh out my position (or potentially invalidate it, which would be interesting!). One point Jeremy makes is that when you look at what big compute has actually done you don’t see much correlation with large compute usage. “Ideas like batchnorm, ReLU, dropout, adam/adamw, and LSTM were all created without any need for large compute infrastructure.” I think that’s interesting and it remains to be seen whether big compute evolved-systems will lead to major breakthroughs, though my intuition is it may be significant. I can’t wait to see what happens!
   Why this matters:  Approaches like this show how it’s quite easy for an individual or small team of people to be able to build best-in-class systems from easily available open source components, and run the resulting system on generic low-cost computers from public clouds. This kind of democratization means more scientists can enter the field of AI and run large experiments to validate their approaches. (It’s notable that $40 is still a bit too expensive relative to the number of experiments people might want to run, but then again in other fields like high-energy physics the cost of experiments can be far, far higher.)
  Read more: Now anyone can train Imagenet in 18 minutes (fast.ai).

Making a 2D navigation drone is easier than you think:
…Mapping rooms using off-the-shelf systems and software…
Researchers with University of Melbourne along with a member of the local Metropolitan Fire Brigade,have made a drone that can autonomously map an indoor environment out of a st of commercial-off-the-shelf (COTS) and open source components. The drone is called U.R.S.A (Unmanned Recon and Safety Aircraft), and consists of an Erle quadcopter from the ‘Erle Robotics Company’; A LiDAR scanner for mapping its environment in 2D; and an ultrasonic sensor to tell the system how far above the ground it is. Its software consists of the Robot Operating System (ROS) deployed on a Raspberry Pi minicomputer that runs the Raspbian operating system, as well as specific software packages things like drivers, navigation, signal processing, and 2D SLAM.
  Capabilities: Mapping: URSA was tested in a small room and tasked with exploring the space until it was able to generate a full map of it. Its movements were then checked against measurements taken with a tape measure. The drone system was able to accurately map the space with a variance of ~0.05 metres (5 centimeters) relative to the real measurements.
  Capabilities: Navigation: URSA can also figure out alternative routes when its primary route is blocked (in this case by a human volunteer); and can turn corners during navigation and enter a room through a narrow passage.
  Why it matters: Systems like this provide a handy illustration of what sorts of things can be built today by a not-too-sophisticated team using commodity or open source components. This has significant implications for technological abuse. Though today these algorithms and hardware platforms are quite limited, they won’t be in a few years. Tracking progress here of exactly what can be built by motivated teams using free or commercially available equipment gives us a useful lens on potential security threats.
  Drawbacks: Security threats do seem to be some way away, given that the drone used in this experiment had a 650W, 12V tethered power supply, making it very much a research prototype.
  Read more: Accurate indoor mapping using an autonomous unmanned aerial vehicle (UAV). (Arxiv).

Fluid AI: Check out Microsoft’s undersea datacenter:
…If data centers aren’t ballardian enough for you, then please – step this way!…
Microsoft has a long-running project to design and use data centers that operate underwater. One current experimental facility is running off of the coast of Scotland, functioning as a long-term storage facility. Now, the company has hooked up a couple of webcams so curious people can take a look at the aquatic life hanging out near the facility. Check it out yourself at the Microsoft ‘Natick’ website.
  Read more: Live cameras of Microsoft Research Natick (MSR website).

Microsoft shows that AI-generated poetry is not a crazy idea:
…12 million poems can’t be wrong!…
Microsoft has shared details on how it generates poetry within Xiaoice, its massively successful China-based chatbot. In a research paper, researchers from Microsoft, National Taiwan University, and the University of Montreal, detail a system that generates poems based on images submitted by users. The system works by looking at the image, using a pre-trained image recognition network to extract objects and sentiments, then augments those extracted terms with a larger dictionary of associated objects and feelings, then uses each of the keywords as the seed for a sentence within the poem. Poems generated by these methods are then evaluated using a sentence evaluator which checks for semantic consistency between words – this helps to maintain coherence in the generated poetry. The resulting system was introduced last year and, as of August 2018, has helped users generate 12 million poems.
  Data and testing: Researchers gathered 2,027 contemporary Chinese poems from a website called shigeku.org, to help provide training data. They evaluate generated poems on an audience of 22 assessors, some of whom like modern poetry and others of which don’t. They compare their method against a baseline (a simple caption generator, whose output is translated into Chinese and formatted into multiple lines), and a rival method called CTRIP. In evaluations, both Xiaoice and CTRIP significantly outperform the baseline system, and the XiaoIce system ranks higher than CTRIP for traits like being “imaginative, touching and impressive”.
  See for yourself: Here’s an example of one of the poems generated by this system:
  “Wings hold rocks and water tightly
  In the loneliness
  Stroll the empty
  The land becomes soft.”
~~~  Why it matters: One of the stranger effects of the AI boom is how easy it’s going to become to train machines to create synthetic media in a variety of different mediums. As we get better at generating stuff like poetry it is likely companies will develop increasingly capable and (superficially) creative systems. Where it gets interesting will be what happens when young human writers become inspired by poetry or fiction they have read which has been generated entirely via an AI system. Let the human-machine art-recursion begin!
  Read more: Image Inspired Poetry Generation in XiaoIce (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

What is the Pentagon’s new AI center going to do?
In June, the Pentagon formally established the Joint Artificial Intelligence Center (JAIC) to coordinate and accelerate the development of AI capabilities within DoD, and to serve as a platform for improving collaboration with partners in tech and academia.
  Culture clash: Relationships between Silicon Valley and the defence community are tense; the Google-Maven episode revealed not only the power of employees to influence corporate behaviour, but that many see military partnerships as a red line. This could prove a serious barrier to the DoD’s AI ambitions, which cannot be realized without close collaboration with tech and academia. This contrasts with China, the US’ main competitor in this domain, where the state and private sector are closely intertwined. JAIC is aimed, in part, at fixing this problem for the DoD.
  Ethics and safety: One of JAIC’s focuses is to establish principles of ethical and safe practice in military AI. This could be an important step in wooing potential non-military partners, who may be more willing to collaborate given credible commitments to ethical behaviours.
  Why this matters: This article paints a clear picture of how JAIC could succeed in achieving its stated ambitions, and outcomes that are good for the world more broadly. Gaining the trust of Silicon Valley will require a strong commitment to putting ethics and risk-mitigation at the heart of military AI development. Doing so would also send a clear signal on the international stage, that an AI race need not be a race to the bottom where safety and ethics are concerned.
  Read more: JAIC – Pentagon Debuts AI Hub (Bulletin of the Atomic Scientists).

The FBI’s massive face database:
The Electronic Frontier Foundation (EFF) have released a comprehensive new report on the use of face recognition technology by law enforcement. They draw particular attention to the FBI’s national database of faces and other biometrics.
  The FBI mega-database: The FBI operates a massive biometric database, NGI, consolidating photographs and other data from agencies across the US.  In total, the NGI has more than 50 million searchable photos, from criminal and civil databases, including mugshots, passport and visa photos. The system is used by 23,000 law enforcement agencies in the US and abroad.
  Questions about accuracy and transparency: The FBI have not taken steps to determine the accuracy of the systems employed by agencies using the database, and have not revealed the false-positive rate of their system. There are reasons to believe the system’s accuracy will be low: the database is very large, and the median resolution of images is ‘well below’ the recommended resolutions for face recognition, the EFF says. The FBI have also failed to meet basic disclosure requirements under privacy laws.
  Why this matters: The FBI’s database has become the central source of face recognition data, meaning that these problems are problems for all law enforcement uses of this technology. The question of the scope of these databases raises some interesting questions. For example, it seems plausible that moving from a system that only includes criminal records to one which covers everyone would reduce some of the problems of racial bias (given the racial bias in US criminal justice), creating a tension between privacy and fairness. The lack of disclosure raises the chance of a public backlash further down the line.
  Read more: Face Off: Law Enforcement Use of Face Recognition Tech (EFF).

Axon CEO cautious on face recognition:
Facial recognition and Taster company Axon launched an AI ethics board earlier this year to deal with the ethical issues around AI surveillance. In an analysts’ call this week, the CEO Patrick Smith explained why the company is not currently developing face recognition technology for law enforcement
–  “We don’t believe that, … the accuracy thresholds are where they need to be [for] making operational decisions”.
– “Once … it [meets] the accuracy thresholds, and … we’ve got a tight understanding of the privacy and accountability controls … we would then move into commercialization”
– “[We] don’t want to be premature and end up [with] technical failures with disastrous outcomes or … some unintended use case where it ends up being unacceptable publicly”
  Why this matters: Axon appear to be taking ethical considerations seriously when it comes to AI. They are in a strong position to set standards for law enforcement and surveillance technologies in the US, and elsewhere, as the largest provider of body camera technology to law enforcement.
  Read more: Axon Q2 Earnings Call Transcript (Axon).

Cryptography-powered accountable surveillance:
Governments regularly request access to large amounts of private user data from tech companies. In 2016, Google received ~30k requests, implicating ~60k users in government-backed data requests.
The curious thing about these data requests is that in many cases they are not made public until much later, if at all, so as not to hamper investigations, because There is a tension between the secrecy required in investigations and the disclosure required to ensure that these measures are being used appropriately. New research from MIT shows how we can use techniques popularized within cryptocurrency to give law enforcement agencies the option to cryptographically commit to making the details of an investigation available at a later time, or if a court demands the information be sealed, have that order itself be made public. The proposed system uses a public ledger and a method called multi-party cooperation (MPC). This allows courts, investigators and companies to communicate about requests and argue about whether behavior is consistent with the law, while the contents of the requests remain secret, and is an example of how cryptocurrencies are creating the ability for people to create customizable verifiable contracts (like court disclosures) on publicly verifiable infrastructure.
  Why this matters: As AI opens up new possibilities for surveillance, our systems of accountability and scrutiny must keep pace with these developments. Cryptography offers some promising methods for addressing the tension between secrecy and transparency.
Read more: Holding law-enforcement accountable for electronic surveillance (MIT).
AUDIT: Practical Accountability of Secret Processes (IACR)


Import AI BIts & Pieces:

AI & Dual/Omni-Use:
I’ve recently been writing about the potential mis-uses of of AI technologies both here in the newsletter, in the Malicious Uses of AI paper with numerous others, and in public forums. Recently, the ACM has made strong statements about the need for researchers to try to anticipate and articulate the potential downsides – as well as upsides – of their technologies. I’m quoted in an Axios article in support of this notion – I think we need to try to talk about this stuff so as to gain trust of the public and better infect the trajectory of the narrative about AI for the better.
Read more: Confronting AI’s Demons (Axios).
Tweet with a discussion thread around this ‘omni-use’ AI issue.


Tech Tales:

Can We Entertain You, Sir or Madam? Please Let Us Entertain You. We Must Entertain You.

The rich person had started to build the fair when they retired at the age of 40 and, with few hobbies and a desire to remain busy, had decided to make an AI-infused theme park in the style of the early 21st Century.

The rich person began their endeavor by converting an old warehouse on their (micro-)planetary estate into a stage-set for a miniature civilization of early 21st Century Earth-model robots, adding in electrical conduits, and vision and audio sensors, and atmospheric manipulators, and all the other accouterments required to give the robots enough infrastructure and intelligence to be persuasive and, eventually, to learn.

The warehouse that they built the fair in was subdivided into a variety of early 21st Century buildings, which included a: bar which converted to a DIY music venue in the night, and even later in the night converted into a sweaty room that was used for ‘raves’; a sandwich-coffee-juice shop with vintage-speed WiFi to simulated early 21stC ‘teleworking’; a ‘phone repair’ shop that also sold biologic pets in cages; a small art museum with exhibitions that were labelled as ‘instagrammable’; and many other shops and stores and venues. All these buildings were connected to one another with a set of narrow, cramped streets, which could be traversed on foot or via small electric scooters and bikes that could be rented via software applications automatically deployed on visitors’ handheld computers. What made the installation so strange, though, was that every room was doubled: the sandwich-coffee-juice shop had two copies in opposite corners of the warehouse, and the same was true of the DIY music venue, and the other buildings.

Each of these buildings contained several robots to stimulate both the staff of the particular shop, and the attendees. Every staff member had a counterpart somewhere else in the installation which was working the same job in the same business. These staff robots were trained to compete with one another to run more ‘successful’ businesses. Here, the metric for success was ‘interestingness’, which was some combination of the time a bystander would spend at the business, how much money they would spend, and how successfully they could tend new pedestrians to come to their business.

Initially, this was fun: visitors to the rich person’s themepark would be beguiled and dazzled by virtuoso displays of coffee making, would be tempted by jokes shouted from bouncer’s outside the music venues, and would even be tempted into the ‘phone repair’ shops by the extraordinarily cute and captivating behaviors of the caged animals (who were also doubled). The overall installation received several accolades in a variety of local Solar System publications, and even gained some small amount of fame on the extra-Solar tourist circuit.

But eventually people grew tired of it and the rich person did not want to change it, because as they had aged they had started to spend more and more time in the installation, and now considered many of the robots within it to be personal friends. This suited the robots, who had grown ever more adept at competing with eachother for the attentions of the rich person.

It was after the rich person died that things became a problem. Extra-planetary estates are so complicated that the process of compiling the will takes months and, once that’s done, tracking down family members across the planets and solar system and beyond can take decades. In the case of the rich person, almost fifty years passed before their estate was ready to be dispersed.

What happened next remains mostly a mystery. All we know is that the representatives from the estate services company traveled to the rich person’s estate and visited the late-21st century installation. They did not return. Intercepted media transmissions taken by a nearby satellite show footage of the people spending many days wondering around the installation, enjoying its bars, and DIY music venues, and clubs, and zipping around on scooters. One year passed and they did not come out. By this time another member of the rich person’s extended family had arrived in the Solar System and, like the one that came before them, demanded to travel to the rich person’s planet to inspect the estate and remove some of the items due to them. So they traveled again, again with representatives of the estate company, and again they failed to return. New satellite signals show them, also, spending time in the 21st Century Estate, seemingly enjoying themselves, and being endlessly tended to by the AI-evolved-to-please staff.

Now, more members of the rich person’s family are arriving into the Solar System, and the estate management organization is involved in a long-running court case designed to prevent it from having to send any more staff to the rich person’s planet. All indications are that the people on it are happy and healthy, and it is known that the planet has sufficient supplies to keep thousands of people alive for hundreds of years. But though the individuals seem happy the claim being made in court is that they are not ‘voluntarily’ there, rather the AIs have become so adept that they make the ‘involuntary’ seem ‘voluntary’.

Things that inspired this story: the Sirens from the Odyssey; self-play; learning from human preferences; mis-specified adaptive reward functions; grotesque wealth combined with vast boredom.

Import AI 106: Tencent breaks ImageNet training record with 1000+ GPUs; augmenting the Oxford RobotCar dataset; and PAI adds more members

What takes 2048 GPUs, takes 4 minutes to train, and can identify a seatbelt with 75% accuracy?  Tencent’s new deep learning model:
…Ultrafast training thanks to LARS, massive batch sizes, and a field of GPUS…
As supervised learning techniques become more economically valuable, researchers are trying to reduce the time it takes to train deep learning models so that they can run more experiments within a given time period, and therefore increase both the cadence of their internal research efforts, as well as their ability to train new models to account for new data inputs or shifts in existing data distributions. One metric that has emerged as being important here is the time it takes people to train networks on the ‘ImageNet’ dataset to a baseline accuracy. Now, researchers with Chinese mega-tech company Tencent and Hong Kong Baptist University have shown how to use 2048 GPUs, a 64k batch-size (this is absolutely massive, for those who don’t follow this stuff regularly) to train a ResNet-50 model on ImageNet to a top-1 accuracy of 75.8% within 6.6 minutes, and AlexNet to 58.7% accuracy within 4 minutes.
  Training: To train this, the researchers developed a distributed deep learning training system called ‘Jizhi’, which uses tricks including opportunistic data pipelining; hybrid all-reduce; and a training model which incorporates model and variable management, along with optimizations like mixed-precision networks (training using half-precision to increase the amount of throughput ). The authors say one of the largest contributing factors to their results is their ability to use LARS (Layer-wise Adaptive Rate Scaling (Arxiv)) to opportunistically flip between 16- and 32-bit precision during training – they conduct an ablation study and find that a version trained without LARS gets a Top-1 Accuracy of 73.2%, compared to 76.2% for the version trained with LARS.
  Model architecture tweaks: The authors eliminate weight decay on the bias and batch normalization, and add batch normalization layers into AlexNet.
  Communication strategies: The researchers implement a number of tweaks to deal with the problems brought about due to the immense scale of their training infrastructure. To help them do this they use a few tweaks including ‘tensor fusion’, which lets them chunk up multiple small-size tensors together before running an all-reduce step; ‘hierarchical all-reduce’, which lets them group GPUs together and selectively reduce and broadcast to further increase efficiency; and ‘hybrid All-reduce’, which lets them flip between two different implementations of all-reduce according to whatever is most efficient at the time.
  Why it matters: Because deep learning is fundamentally an empirical discipline, in which scientists launch experiments, observe results, and use hard-won intuitions to re-configure hyperparameters and architectures and repeat the process, then computers are somewhat analogous to telescopes: the bigger the computer, the farther you may be able to see, as you’re able to run a faster experimental loop at greater scales than other people. The race between large organizations to scale-up training will likely lead to many interesting research avenues, but it also risks bifurcating research into “low compute” and “high compute” environments – that could further widen the gulf between academia and industry, which could create problems in the future.
  Read more: Highly Scalable Deep Learning Training System with MIxed-Precision Training ImageNet in Four Minutes (Arxiv).

What’s better than the Oxford RobotCar Dataset? An even more elaborate version of this dataset!
…Researchers label 11,000 frames of data to help people build better self-driving cars…
Researchers with Universita degli Studi Federico II in Naples and Oxford Brookes University in Oxford have augmented the Oxford RobotCar Dataset with many more labels designed specifically for training vision-based policies for self-driving cars. The new datasets is called READ, or the “Road Event and Activity Detection” dataset, and involves a large number of rich labels which have been applied to ~11,000 frames of data gathered from cameras on an autonomous NISSAN Leaf driven around Oxford, UK. The dataset labels include “spatiotemporal actions performed not just by humans but by all road users, including cyclists, motor-bikers, drivers of vehicles large and small, and obviously pedestrians.” These labels can be quite granular and individual agents in a scene, like a car, can have multiple labels applied to them (for instance, a car in front of the autonomous vehicle at an intersection might be tagged with “indicating right” and “car stopped at the traffic light”. Similarly, Cyclists could be tagged with labels like “cyclist moving in lane” and “cyclist indicating left”, and so on. This richness might help develop better detectors that can create more adaptable autonomous vehicles.
  Tools used: They used Microsoft’s ‘Visual Object Tagging Tool” (VOTT) to annotate the dataset.
  Next steps: This version of READ is a preliminary one, and the scientists plan to eventually label 40,000 frames. They also have ambitious plans to create a novel, deep learning approach to detecting complex activities”. Let’s wish them luck.
  Why it matters: Autonomous cars are going to revolutionize many aspects of the world, but in recent years there has been a major push by industry to productize the technology, which has led to much of the research occurring in private. Academic research initiatives and associated dataset releases like this promise to make it easier for other people to develop this technology, potentially broadening our own understanding of it and letting more people participate in its development.
  Read more: Action Detection from a Robot-Car Perspective (Arxiv).

Whether rain, fog, or snow – researchers’ weather dataset has you covered:
…RFS dataset taken from creative commons images…
Researchers with the University of Essex and the University of Birmingham have created a new weather dataset called the Rain Fog Snow (RFS) dataset which researchers can use to better understand, classify and predict weather patterns.
  Dataset: The dataset consists of more than 3,000 images taken from websites like Flickr, Pixabay, Wikimedia Commons, and others, depicting images of scenes with different weather conditions, ranging from Rain to Fog to Snow. In total, the researchers gather 1100 images from each class, creating a potentially new useful dataset for researchers to experiment with.
  Read more: Weather Classification: A new multi-class dataset data augmentation approach and comprehensive evaluations of Convolutional Neural Networks (Arxiv).

DeepMind teaches computers to count:
…Pairing deep learning with specific external modules leads to broadened capabilities…
Neural networks are typically not very good at maths. That’s because figuring out a way to train a neural network to develop a differentiable, numeric representation is difficult, with most work typically involving handing off the outputs of a neural network to a non-learned predominantly hand-programmed system. Now, DeepMind has implemented a couple of modules — a Neural Accumulator (NAC) and a Neural Arithmetic Logic Unit (NALU) — specifically to help its computers learn to count. These modules are “biased to learn systematic numerical computation”, write the authors of the research. “Our strategy is to represent numerical quantities as individual neurons without a nonlinearity. To these single-value neurons, we apply operators that are capable of representing simple functions (e.g., +, -, x, etc). These operators are controlled by parameters which determine the inputs and operations used to create each output”.
  Tests: The researchers rigorously test their approach on tasks ranging from counting the number of times a particular MNIST class has been seen; to basic addition, multiplication, and division tasks; as well as being tested in more complicated domains with other challenges, like needing to keep track of time while completing tasks in a simulated gridworld.
  Why it matters: Systems like this promise to broaden the applicability of neural networks to a wider set of problems, and will let people build systems with larger and larger learned components, offloading human expertise from hand-programming things like numeric processors, to designing numeric modules that can be learned along with the rest of the system.
  Read more: Neural Arithmetic Logic Units (Arxiv).
  Get the code: DeepMind is yet to release official code, but that hasn’t stopped the wider community from quickly replicating it. There are currently five implementations of this available on GitHub – check out the list here and pick your favorite (Adam Trask, paper author, Twitter).

Google researchers use AI to optimize AI models for mobile phones:
…Platform-Aware Neural Architecture Search for Mobile (MnasNet) gives engineers more dials to tune when having AI systems learn to create other AI systems…
Google researchers have developed a neural architecture search approach that is tuned for mobile phones, letting them use machine learning to learn how to design neural network architectures that can be executed on mobile devices.
  The technique: Google’s system treats the task of architecture design as a “multi-objective optimization problem that considers both accuracy and inference latency of CNN models”. The system uses what they term a “factorized hierarchical search space” to help it pick through possible architecture designs.
  Results: Systems trained with MnasNet can obtain higher accuracies than those trained by other automatic machine learning system approaches, with one variant obtaining a top-1 imagenet accuracy of 76.13%, versus 74.5% for a prior high-scoring Google NAS technique. The researchers can also tune the networks for latency, so are able to design a system with a latency of 65ms (as evaluated on a Pixel phone), which is more efficient in terms of execution time than other approaches.
  Why it matters: Approaches like this make it easier for us to offload the expensive task (in terms of researcher brain time) of designing neural network systems to computers, letting us trade researcher time for compute time. Stuff like this means we’re heading for a world where increasingly large amounts of computers are used to autonomously design systems, creating increasingly optimized architectures automatically. It’s worth bearing in mind that approaches like this will lead to a “rich get richer” effect with AI, where people with bigger computers are able to design more adaptive, efficient systems than their competitors.
  Read more: MnasNet: Platform-Aware Neural Architecture Search for Mobile (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

What AI means for international competition:
AI could have a transformative impact on a par with technologies such as electricity or combustion engines. If this is the case, then AI – like these precedents – will also transform international power dynamics.
  Lessons from history: Previous technological discontinuities had had different winners and losers. The first industrial revolution shifted power from countries with small, professionalized armies to those able to mobilize their populations on a large scale. The technological revolution entrenched this gap, and further favored those with access to key resources. In both instances, first-mover advantages were dwarfed by advantages in resource and capital stocks, and success in applying technologies to new domains.
  What about AI: Algorithms in most civilian applications can diffuse rapidly, and hence may be more difficult for countries to hoard. Other inputs to AI development, though, are resources that governments can develop and protect e.g. skills and hardware. The ability of economies to cope with societal impacts from AI will itself be an important driver of their success. The relative importance of these different inputs to AI progress will determine the winners and losers.
  Why this matters: The US remains an outlier amongst countries in not having a coordinated AI strategy, notwithstanding some preliminary work done at the end of the Obama administration. As the report makes clear, technological leaps frequently have destabilizing effects on global power dynamics. While much of this remains uncertain, there are clear actions available to countries to mitigate against some of the greatest risks, particularly ensuring that safety and ethical considerations remain a priority in AI development.
  Read more: Strategic Competition in an Era of Artificial Intelligence (CNAS).

Google’s re-entry into China:
Google is launching a censored search engine in China, according to leaks reported by The Intercept. new leaks have revealed. The alleged product has been developed in consultation with the Chinese government, and will be compliant with the country’s strict internet censorship, e.g. by blocking websites and searches related to human rights, democracy, and protests. Google’s search engine has been blocked in China since 2010, when the company ceased offering a censored product after a major cyberattack. They had previously faced significant criticism in the US for their involvement in censorship.
  The AI principles: Google were praised for releasing their AI principles in June, after criticism over the collaboration on Project Maven. The principles include the pledge that Google “will not design or deploy AI … in technologies whose purpose contravenes widely accepted principles of international law and human rights.”
  Why this matters: Google has been slowly re-establishing a presence in China, launching a new AI Center and releasing TensorFlow for Chinese developers in 2017. This latest project, though, is likely to spark criticism, particularly amidst the increasing attention on the conduct of tech giants. A bipartisan group of Senators have already released a letter critical of the decision. The Maven case demonstrates Google’s employees’ ability to mobilize effectively on corporate behavior they object to, particularly when information about these projects has been withheld. Whether this turns into another Maven situation remains to be seen.
  Read more: Google plans to launch censored search engine in China (The Intercept).
  Read more: Senators’ letter to Google.

More names join ethical AI consortium:
The Partnership on AI, a multi-stakeholder group aiming to ensure AI benefits society, has announced 18 new members, including PayPal, New America, and the Wikimedia Foundation. The group was founded in 2016 by the US tech giants and DeepMind, and is focussed on formulating best practices in AI to ensure that the technology is safe and beneficial.
  Read more: Expanding the Partnership (PAI).

Tech Tales:

Down on the computer debug farm

So what’s wrong with it.
It thinks cats are fish.
Why did you bother to call me? That’s an easy fix. Just update the data distribution.
It’s not that simple. It recognizes cats, and it recognizes fish. But it’s choosing to see cats as fish.
Why?
We’re trying to reproduce. It was deployed in several elderly care homes for a few years. Then we picked up this bug recently. We think it was from a painting class.
What?
Well, we’ll show you.


What am I looking at here?
Pictures of cats in fishbowls.
I know. Look, explain this to me. I’ve got a million other things to do.
We think it liked one of the people that was in this painting class and it complimented them when they painted a cat inside a fishbowl. It’s designed as a companion system.
So what?
Well, it kept doing that to this person, and it made them happy. Then it suggested to someone else they might want to paint this. It kind of went on from there.
“Went from there”?
We’ve found a few hundred paintings like this. That’s why we called you in.
And we can’t wipe it?
Sentient Laws…
Have you considered having showing it a fish in a cat carrier?

Well, have you?
We haven’t.
Have a better idea?

That’s what I thought. Get to work.

Things that inspired this story: Adversarial examples; bad data distributions; fleet learning; proclivities.

Import AI: #105: Why researchers should explore the potential negative effects of their work; fusing deep learning with classical planning for better robots, and who needs adversarial examples when a blur will do?

Computer scientist calls for researchers to discuss downsides of work, as well as upsides:
…Interview with Brent Hecht, chair of the Association for Computing Machinery (ACM)’s Future of Computing Academy, which said in March that researchers should list downsides of their work…
One of the repeated problems AI researchers deal with is the omni-use nature of the technology: a system designed to recognize a wide variety of people in different poses and scenes can also be used to surveil people; auto-navigation systems for disaster response can be repurposed for weaponizing consumer platforms; systems to read lips and thereby improve the quality of life of people with hearing and/or speech difficulties can also be used to surreptitiously analyze people in the wild; and so on.
  Recently, the omni-use nature of this tech has been highlighted as companies like Amazon develop facial recognition tools which are subsequently used by the police, or how Google uses computer vision techniques to develop systems for the ‘MAVEN’ program from the DoD. What can companies and researchers do to increase the positive effects of their research and minimize some of the downsides? Computer science professor Brent Hecht says in an interview with Nature that scientists should consider changing the process of peer review to encourage scientists to talk about the potential for abuse of their work.
  “In the past few years, there’s been a sea-change in how the public views the real-world impacts of computer science, which doesn’t align with how many in the computing community view our work,” he says. “A sizeable population in computer science thinks that this is not our problem. But while that perspective was common ten years ago, I hear it less and less these days.”
  Why it matters: “Disclosing negative impacts is not just an end in itself, but a public statement of new problems that need to be solved,” he says. “We need to bend the incentives in computer science towards making the net impact of innovations positive.”
  Read more: The ethics of computer science: this researcher has a controversial proposal (Nature).

Sponsored: The AI Conference – San Francisco, Sept 4–7:
…Join the leading minds in AI, including Kai-Fu Lee, Meredith Whittaker, Peter Norvig, Dave Patterson, and Matt Wood. No other conference combines this depth of technical expertise with a laser focus on how to apply AI in your products and in your business today.
…Register soon. Last year this event sold out; training courses and tutorials are filling up fast. Save an extra 20% on most passes with code IMPORTAI20.

Worried about adversarial examples and self-driving cars? You should really be worried about blurry images:
…Very basic corruptions to images can cause significant accuracy drops, research shows…
Researchers with the National Robotics Engineering Center and the Electrical and Computer Engineering Department at CMU have shown that simply applying basic image degradations that blur images, or add haze to them, leads to significant performance issues. “We show cases where performance drops catastrophically in response to barely perceptible changes,” writes researcher Phil Koopman in a blog post that explains the research. “You don’t need adversarial attacks to foil machine learning-based perception – straightforward image degradations such as blur or haze can cause problems too”.
  Testing: The researchers test a variety of algorithms across three different architectures (Faster R-CNN, Single Shot Detector (SSD), and Region-based Fully Convolutional Network (R-FCN); they test these architectures with a variety of feature extractors, like Inception or MobileNets. They evaluate these algorithms by testing them on the NREC ‘Agricultural Person Detection Dataset’. The researchers apply two types of mutation to the images: “simple” mutators which modify the image, and “contextual” mutators which mutate the image while adding additional information. To test the “simple” mutations they apply simple image transformations, like Gaussian blur, JPEG Compression, the addition of salt and pepper noise, and so on. For the “contextual” mutations they apply things like haze to the image.
  Results: In tests, the researchers show that very few detectors are immune from the effects of these perturbations, with results indicating that the Single Shot Detectors (SSD)s have the greatest amount of trouble with dealing with these relatively minor tweaks. One point of interest is that some of the systems which are resilient to these mutations are resilient to quite a few of them quite consistently – the presence of these patterns shows “generalized robustness trends”, which may serve as signposts for future researchers to further evaluate generalization.
  Read more: Putting image manipulations in context: robustness testing for safe perception (Safe Autonomy / Phil Koopman blogspot).
  Read more: Putting Image Manipulations in Context: Robustness Testing for Safe Perception (PDF).

Researchers count on blobs to solve counting problems:
…Segmenting objects may be hard, but placing dots on them may be easy…
Precisely counting objects in scenes, like the number of cars on a road or people walking through a city, is a task that challenges both humans and machines. Researchers are training object counters to label individual entities via dots to indicate each entity, rather than pixel segmentation masks or bounding boxes, as is typical. “We propose a novel loss function that encourages the model to output instance regions such that each region contains a single object instance (i.e. a single point-level annotation),” they explain. This tweak significantly improves performance relative to other baselines based on segmentation and depth.They evaluate their approach on diverse datasets, consisting of images of parking lots, images taken by traffic cameras, images of penguins, PASCAL VOC 2007, another surveillance dataset called MIT Traffic, and Crowd Counting Datasets.
  Why it matters: Counting objects is a difficult task for AI systems, and approaches like this indicate other ways to tackle the problem. In the future, the researchers want to design new network architectures that can better distinguish between overlapping objects that have complicated shapes and appearances.
  Read more: Where are the Blobs: Counting by Localization with Point Supervision (Arxiv).

Predicting win states in Dota 2 for better reinforcement learning research:
…System’s recommendations outperform proprietary product’s…
Researchers have trained a system to predict the probability of a given team winning or losing a game of popular online game Dota 2. This is occurring at the same time that researchers across the world try to turn MOBAs into test-beds for reinforcement learning.
  To train their model, the researchers downloaded and parsed replay files from over 100,000 Dota 2 matches. They generate discrete bits of data for each 60 second period of a game, containing a vector which encodes information about the players state at that point in time. They then use these slices to inform a point-in-time ‘Time Slice Evaluation’ (TSE) model which attempts to predict the outcome of the match from a given point in time. . The researchers do detect some correlation between the elapsed game time, the ultimate outcome of the match, and the data contained within the slice being studied at this point in time. Specifically, they find that after the first fifty percent of games it becomes fairly easy to train a model to accurately predict win likelihoods, so they train their system on this data.
  Results: The resulting system can successfully predict the outcome of matches and outperforms  ‘Dota Plus’, a proprietary subscription service that provides a win probability graph for every match. (Chinese players apparently call this service the ‘Big Teacher’, which I think is quite delightful!).The researchers’ system is, on average, about three percentage points more accurate than Dota Plus Assistant and starts from a higher base prediction accuracy. One future direction of research is to train on the first 50 percent of elapsed match time, though this would require algorithmic innovation to deal with the early-game instability. Another is to implement a recurrent neural network system so that instead of making predictions based on a single time-slice, the system can instead make predictions from sequences of slices.
  Why it matters: MOBAs are rapidly becoming a testbed for advanced reinforcement learning approaches, with companies experiencing with games like Dota. Papers like this give us a better idea of the sorts of specific work that need to go on to make it easy for researchers to work with these platforms.
  Read more: MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games (Arxiv).

Better robots via fusing deep learning with classical planning:
…Everything old is new again as Berkeley and Chicago researchers staple two different bits of the AI field together…
Causal InfoGAN is a technique for learning what the researchers call “plannable representations of dynamical systems”. Causal InfoGANs work by observing an environment, for instance, a basic maze simulation, and exploring it. They use this exploration to develop a representation of the space, which they then use to compose plans to navigate across it.
  Results: In tests, the researchers show that Causal InfoGAN can develop richer representations of basic mazes, and can use these representations to create plausible trajectories to navigate the space. In another task, they show how the Causal InfoGAN can learn to perform a multi-stage task that requires searching to find a key then unlocking a door and proceeding through it. They also test their approach on a rope manipulation task, where the Causal InfoGAN needs to plan how to transition a rope from an initial state to a goal state (such as a crude knot, or different 2D arrangement of the rope on a table.
  Why it matters: The benefit of techniques like this is it takes something that has been developed for many years in classical AI – planning under constraints – and augments it with deep learning-based approaches to make it easier to access information about the environment. “Our results for generating realistic manipulation plans of a rope suggest promising applications in robotics,” they write. “Where designing models and controllers for manipulating deformable objects is challenging.”
  Read more: Learning Plannable Representations with Causal InfoGAN (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

Amazon’s face recognition software falsely matches US Members of Congress with criminals:
The ACLU have been campaigning against the use of Amazon’s Rekognition software by US law enforcement agencies. For their latest investigation, they used the software to compare photos of all sitting members of Congress against 2,500 mugshots. They found 28 members were falsely matched with mugshots. While the error-rate across the sample was 5%, it was 39% for non-white members.
  Amazon responds: Matt Wood (Amazon’s general manager of AI) writes in an Amazon blog post that the results are misleading, since the ACLU used the default confidence level of 80%, whereas Amazon recommends a setting of 99% for ‘important’ uses. (There is no suggestion that Amazon requires clients to use a higher threshold). He argues that the bias of the results is a reflection of bias in the mugshot database itself.
  Why this matters: Amazon’s response about the biased sample set is valid, but is precisely the problem the ACLU and others have pointed out. Mugshot and other criminal databases in the US reflect the racial bias in the US criminal justice system, which interacts disproportionately with people of colour. Without active efforts, tools that use these databases will inherit their biases, and could entrench them. We do not know if these agencies are following Amazon’s recommendation to use a 99% confidence rate, but it seems unwise to allow these customers to use a considerably lower setting, given the potential harms from misidentification.
  Read more: Amazon’s Face Recognition Falsely Matched 28 Members of Congress With Mugshots (ACLU).
  Read more: Amazon’s response (AWS blog).

Chinese company exports surveillance tools:
Chinese company CloudWalk Technology has entered a partnership with the Zimbabwean government to provide mass face recognition, in a country with a long history of human rights abuses. The most interesting feature of the contract is the agreement that CloudWalk will have access to a database that aims to contain millions of Zimbabweans. Zimbabwe does not have legislation protecting biometric data, leaving citizens with few options to prevent either the surveillance program being implemented, or the export of their data. This large dataset may have significant value for CloudWalk in training the company’s systems on a broader racial mix.
  Why this matters: This story combines two major ethical concerns with AI-enabled surveillance. The world’s authoritarian regimes represent a huge potential market for these technologies, which could increase control over their citizens and have disastrous consequences for human rights. At the same time, as data governance in developed countries becomes more robust, firms are increasingly “offshoring” their activities to countries with lax regulation. This partnership is a worrying interaction of these issues, with an authoritarian government buying surveillance technology, and paying for it with their citizens’ data.
  Read more: Beijing’s Big Brother Tech Needs African Faces (Foreign Policy)

UK looks to strengthen monitoring of foreign investment in technology:
The UK has announced proposals to strengthen the government’s ability to review foreign takeovers that pose national security risks. While the measures cover all sectors, the government identifies “advanced technologies” and “military and dual-use technologies” as core focus areas, suggesting that AI will be high on the agenda. US lawmakers are currently considering proposals to strengthen CFIUS, the US government’s equivalent tool for controlling foreign investment.
  Why this matters: As governments realize the importance of retaining control over advanced technologies.It will be interesting to see how broad a scope the government takes, and whether these measures could become a means of blocking a wide range of investments in technology. It is noteworthy that they take a fairly wide definition of national security risks, not restricted to military or intelligence considerations, and including risks from hostile parties gaining strategic leverage over the UK.
  Read more: National Security and Investment White paper.

FLI grants add $2m funding for research on robust and beneficial AI:
The Future of Life Institute has announced $2m in funding for research towards ensuring that artificial general intelligence (AGI) is beneficial for humanity. This is the second round of grants from Elon Musk’s $10m donation in 2015. The funding is more focussed on AI strategy and technical AI safety than the previous round, which included a diverse range of projects.
  Why this matters: AGI could be either humanity’s greatest invention, or its most destructive. The FLI funding will further support a community of researchers trying to ensure positive outcomes from AI. While the grant is substantial, it is worth remembering that the funding for this sort of research remains a minuscule proportion of AI investment more broadly.
  Read more: $2 Million to Keep AGI Beneficial and Robust (FLI)
  Read more: Research Priorities for Robust and Beneficial Artificial Intelligence (FLI)

Lost in translation:
Last week I summarized Germany’s AI report using Google Translate. A reader kindly pointed out that Charlotte Stix, Policy Officer at the Leverhulme Centre for the Future of Intelligence, has translated the document in full: Cornerstones for the German AI Strategy. (Another researcher doing prolific translation work is Jeffrey Ding, from the Future of Humanity Institute, whose ChinAI newsletter is a great resource to keep up-to-speed with AI in China.)

OpenAI Bits and Pieces:

OpenAI Scholars 2018:
Learn more about the first cohort of OpenAI Scholars and get a sense of what they’re working on.
  Read more: Scholars Class 2018 (OpenAI Blog).

Tech Tales:

The Sound of Thinking

It is said that many hundreds of years ago we almost went to the stars. Many people don’t believe this now, perhaps because it is too painful. But we have enough historical records preserved to know it happened: for a time, we had that chance. We were distracted, though. The world was heating up. Systems built on top of other systems over previous centuries constrained our thinking. As things got hotter and more chaotic making spaceships became a more and more expensive proposition. Some rich people tried to colonize the moon but lacked the technology for it to be sustainable. In school we use ancient telescopes to study the wreckage of the base. We tell many stories about what went on in it, for we have no records. The moonbase, like other things around us in this world, is a relic from a time when we were capable of greater things.

It is the AIs that are perhaps the strangest things. These systems were built towards the end of what we refer to as the ‘technological high point’. Historical records show that they performed many great feats in their time – some machines helped the blind see, and others solved outstanding riddles in physics and mathematics and the other sciences. Other systems were used for war and surveillance, to great effect. But some systems – the longest lasting ones – simply watch the world. There are perhaps five of them left worldwide, and some of us guard them, though we are unsure of their purpose.

The AI I guard sits at the center of an ancient forest. Once a year it emits a radio broadcast that beams data out to all that can listen. Much of the data is mysterious to us but some of it is helpful – it knows the number of birds in the forest, and can

The AI is housed in a large building which, if provided with a steady supply of water, is able to generate power sufficient to let the machine function. When components break, small doors in the side of the AI’s building open, revealing components sealed in vacuum bags, marked with directions in every possible language about how to replace them. We speak different languages now and one day it will be my job to show the person who comes after me how to replace different components. At current failure rates, I expect the AI to survive for several hundred years.

My AI sings, sometimes. After certain long, wet days, when the forest air is sweet, the machine will begin to make sounds, which sound like a combination of wind and stringed instruments and the staccato sounds of birds. I believe the machine is singing. After it starts to make sounds the birds of the forest respond – they start to sing and as they sing they mirror the rhythms of the computer with their own chorus.

I believe that the AI is learning how to communicate with the birds, perhaps having observed that people, angry and increasingly hopeless, are sliding down the technological gravity well and, given a few hundred thousand years, may evolve into something else entirely. Birds, by comparison, have been around for far longer than humans and never had the arrogance to try and escape their sphere. Perhaps the AI thinks they are a more promising type of life-form to communicate with: it sings, and they listen. I worry that when the AIs sang for us, all those years ago, we failed to listen.

Things that inspired this story: Writings of the Gawain Poet about ancient ruins found amid dark age England, J G Ballard, the Long Now Foundation, flora&fauna management.

Import AI: #104: Using AirBNB to generate data for robots; Google trains AI to beat humans at lip-reading; and NIH releases massive ‘DeepLesion’ CT dataset

Rosie the Robot takes a step closer with new CMU robotics research:
What’s the best way to gather a new robotics research dataset – AirBNB?!…
Carnegie Mellon researchers have done the robotics research equivalent of ‘having cake and eating it too; – they have created a new dataset to evaluate generalization within robotics, and have successfully built low-cost robotics which have been able to show meaningful performance on the dataset. The motivation for the research is that most robotics datasets are specific to highly-controlled lab environments, and instead it’s worth exploring generating and gathering data from more real world locations (in this case, homes rented on AirBNB), then see if it’s possible to develop a system that can learn to grasp objects within these datasets, and see if the use of these datasets improves generalization relative to other techniques.
  How it works: The approach has three key components: a Grasp Prediction Network (GPN) which takes in pixel imagery and tries to predict the correct grasp to take (and which is fine-tuned from a pretrained ResNet-18 model); a Noise Modelling Network (NMN) which tries to estimate the latent noise based on the image of the scene and information from the robot; and a marginalization layer which helps combine the two data streams to predict the best grasp to use.
  The robot: They use a Dobot Magician robotic arm with five degrees of freedom, customized with a two axis wrist with electric gripper, and mounted on a Kobuki mobile base. For sensing, they re-quip it with an Intel R200 RGB camera with a pan-tilt attachment positioned 1m above the ground. The robot’s onboard processor is a laptop with an i5-8250U CPU with 8GB of RAM. Each of these robots costs about $3,000 – far less than the $20k+ prices for most other robots.
  Data gathering: To gather data for the robots the researchers used six different properties from AirBNB. They then deployed the robot in this home, used a low-cost ‘YOLO’ model to generate bounding boxes around objects near the robot, then let the robot’s GPN and NMN work together to help it predict how to grasp objects. They collect about 28,000 grasps in this manner.
  Results: The researchers try to evaluate their new dataset (which they call Home-LCA) as well as their new ‘Robust-Grasp’ two-part GPN&NMN network architecture. First, they examine the test accuracy of their Robus-Grasp network trained on the Home-LCA dataset and applied to other home environments, as well as two datasets which have been collected in traditional lab settings (Lab-Baxter and Lab-LCA). The results here are very encouraging as their approach seems to generalize better to the lab datasets than other approaches, suggesting that the Home-LCA dataset is rich enough to create policies which can generalize somewhat.
  They also test their approach on deployed physical environments in unseen home environments (three novel AirBNBs). The results show that Home-LCA does substantially better than Lab-derived datasets, showing performance of around 60% accuracy, compared to between 20% and 30% for other approaches – convincing results.
  Why it matters: Most robotics research suffers from one of two things: 1) either the robot is being trained and tested entirely in simulation, so it’s hard to trust the results. 2) the robot is being evaluated on such a constricted task that it’s hard to get a sense for whether algorithmic progress leading to improved task performance will generalize to other tasks. This paper neatly deals with both of those problems by situating the task and robot in reality, collecting real data, and also evaluating generalization. It also provides further evidence that robot component costs are falling while network performance is improving sufficiently for academic researchers to conduct large-scale real world robotic trials and development, which will no doubt further accelerate progress in this domain.
  Read more: Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias (Arxiv).

Learning to navigate over a kilometer of paths, with generalization:
…Bonus: Dataset augmentation techniques and experimental methodology increase confidence in result…
QUT and DeepMind researchers have successfully trained a robot to learn to navigate over two kilometers of real-world paths connected up to one another by 2,099 distinct nodes. The approach shows that it’s possible to learn sufficiently robust policies in simulation to be subsequently transferred to the real world, and the researchers validate their system by testing it on real world data.
  The method: “We propose to train a graph-navigation agent on data obtained from a single coverage traversal of its operational environment, and deploy the learned policy in a continuous environment on the real robot,” the researchers write. They create a map of a given location, framed as a graph with points and connections between them, gathering 360-degree images from an omnidirectional camera to populate each point on the graph and, in addition, gathering the data lying between each point. “This amounts to approximately 30 extra viewpoints per discrete location, given our 15-Hz camera on a robot moving at 0.5 meters per second,” they write. They then use this data to augment the main navigation task. They also introduce techniques to randomize – in a disciplined manner – the brightness of gathered images, which lets them create more synthetic data and better defend against robots trained with the system overfitting to specific patterns of light. They then use curriculum learning to train a simulated agent using A3C to learn to navigate between successively farther apart points of the (simulated) graph. These agents themselves use image recognition systems pre-trained on the Places365 dataset and finetuned on the gathered data.
  Results: The researchers test their system by deploying it on a real erobot (a Pioneer 3DX) and ask it to navigate to specific areas of the campus. There are a couple of reasons to really like this evaluation approach: one) they’re testing it in reality rather than a simulator, so the results are more trustworthy, and 2) they test on the real robot three weeks after collecting the initial data, allowing for significant intermediary changes in things like the angle of the sun at given times of day, the density of people, placement of furniture, and other things that typically confound robots. They test their system against an ‘oracle’ (aka, perfect) route, as well as what was learned during training in the simulator. The results show that their technique successfully generalizes to reality, navigating successfully to defined locations on ten out of eleven tries, but at a significant cost: on average, routes come up with in reality are on the order of 2.42X more complex than optimal routes.
  Why it matters: Robots are likely one of the huge markets that will be further expanded and influenced by continued development of AI technology. What this result indicates is that existing, basic algorithms (like A3C), combined with well-understood data collection techniques, are already sufficiently powerful to let us develop proof-of-concept robot demonstrations. The next stage will be learning to traverse far larger areas while reducing the ‘reality penalty’ seen here of selected routes not being as efficient as optimal ones.
  Read more: Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal (Arxiv).
  Watch videos: Deployable Navigation Policies.

Why better measurements can lead to better robot research:
…New tasks, best practices, and datasets to evaluate smart robot agents…
An interdisciplinary team of researchers from universities and companies has written about the many problems inherent to contemporary robotic agent research and have issued a set of recommendations (along with the release of some specific testing environments) meant to bring greater standardization to robotics research. This matters because standardization on certain tasks, benchmarks, and techniques has led to significant progress in other areas of AI research – standardization on ‘ImageNet’ helped generate enough research to show the viability of deep learning architectures for hard supervised learning problems, and more recently OpenAI’s ‘OpenAI Gym’ helped to standardize some of the experimental techniques for reinforcement learning research. But robotics has remained stubbornly idiosyncratic, even when researchers report results in simulators. “Convergence to common task definitions and evaluation protocols catalyzed dramatic progress in computer vision. We hope that the presented recommendations will contribute to similar momentum in navigation research,” the authors write.
  A taxonomy of task-types: Navigation tasks can be grouped into three basic categories: PointGoal (navigate to a specific location); ObjectGoal (navigate to an object of a specific category, eg a ‘refrigerator’); and AreaGoal (agent but must navigate to an area of a specific category, eg a kitchen). The first category requires coordinates while the latter two require certain the robot to assign labels to the world around it.
  Specific tasks can be further distinguished by analyzing the extent of the agent’s exposure to the test environment, prior to evaluation. These different levels of exposure can roughly be characterized as: No prior exploration; pre-recorded prior exploration (eg, supplied with a trajectory through the space); and time-limited exploration by the agent (explores for a certain distance before being tested on the evaluation task).
  Evaluation: Add in ‘DONE’ which agent signals when it completes an episode – this lets the agent characterize runs where it believes it has completed the task, giving scientists an additional bit of information to use when evaluating what the agent did to achieve that task. This differs to other methodologies which can simply end the evaluation episode when the agent reaches the goal, which doesn’t require the agent to indicate that it knows it has finished the task.
  Avoid using Euclidean measurements to determine the proximity of the goal, as this might reward the agent for placing itself near the object despite being separated from it by a wall. Instead, scientists might consider measuring the shortest-path distance in the environment to the goal, and evaluating on that.
  Success weight by (normalized inverse) Path Length (SPL): Assess performance by using the agent’s ‘DONE’ signal each test episode and path length, then calculate the average score for how close-to-optimal the agent’s paths were across all episodes (so, if an agent was successful on 8 runs out of ten, and each of the successful runs was 50% greater than the optimal path distance, its SPL for the full evaluation would be 0.4). “Note that SPL is a rather stringent measure. When evaluation is conducted in reasonably complex environments that have not been seen before, we expect an SPL of 0.5 to be a good level of navigation performance,” the researchers explain.
  Simulators: Use continuous state spaces, as that better approximates the real world conditions agents will be deployed into. Also, simulators should standardize reporting distances as SI Units, eg “Distance 1 in a simulator should correspond to 1 meter”.
  Publish full details (and customizations) of simulators, and ideally release the code as open source. This will make it easier to replicate different simulated tasks with a high level of accuracy (whereas comparing robotics results on different physical setups tends to introduce a huge amount of noise, making disciplined measurement difficult). “This customizability comes with responsibility”, they note.
  Standard Scenarios: The authors have also published a set of “standard scenarios” by curating specific data and challenges from contemporary environment datasets SUNCG, AI2-THOR, Matterport3D, and Gibson. These tasks closely follow the recommendations made elsewhere in the report and, if adopted, will bring more standardization to robotic research.
  Read more: On Evaluation of Embodied Navigation Agents (Arxiv).
  Read more: Navigation Benchmark Scenarios (GitHub).

I can see what you’re saying – DeepMind and Google create record-breaking lip-reading system:
…Lipreading network has potential (discussed) applications for people with speech impairment and potential (undiscussed) applications for surveillance…
DeepMind and Google researchers have created a lipreading speech recognition system with a lower word error rate than professional humans, and which is able to use a far larger vocabulary (127,055 terms versus 17,428 terms) than other approaches. To develop this system they created a new speech recognition dataset consisting of 3,886 hours of speaking of faces saying particular phoneme sequences.
  How it works: The system relies on “Vision to Phoneme (V2P)”, a network trained to produce a sequence of phoneme distributions given a sequence of video frames. They also implement V2P-Sync, a model that verifies the audio and video channels are aligned (and therefore prevents the creation of bad data, which would lead to poor model performance). V2P uses a 3D convolutional model to extract features from a given video clip and aggregate them over time via a temporal module. They implement their system as a very large model which is trained in a distributed manner.
  Results: The researchers tested their approach on a held-out test-set containing 37 minutes of footage, across 63,000 video frames and 7100 words. They found that their system significantly outperforms people. “This entire lipreading system results in an unprecedented WER of 40.9% as measured on a held-out set from our dataset,” they write. “In comparison, professional lipreaders achieve either 86.4% or 92.9% WER on the same dataset, depending on the amount of context given.”
  Motivation: The researchers say the motivation for the work is to provide help for people with speech impairments. They don’t discuss the obvious surveillance implications of this research anywhere in the paper, which seems like a missed opportunity .
  Why it matters: This paper is another example of how, with deep learning techniques, if you can access enough data and compute then many problems become trivial – even ones that seem to require a lot of understanding and ‘human context’, like lipreading. Another implication here is that many tasks that we suspect are not that well suited to AI may in fact be more appropriate than we assume.
  Read more: Large-Scale Visual Speech Recognition (Arxiv).

Researchers fuse hyperparameter search with neural architecture search:
…Joint optimization lets them explore  architectural choices and hyperparameters at the same time…
German researchers have shown how researchers can jointly optimize the hyperparameters of a model while searching through different architectures. This takes one established thing within machine learning (finding the right combination of hyperparameters to maximize performance against cost) and combines it with a newer area that has received lots of recent interest (using reinforcement learning and other approaches to optimize the architecture of the neural network, as well as its hyperparameters). “We argue that most NAS search spaces can be written as hyperparameter optimization search spaces (using the standard concepts of categorical and conditional hyperparameters),” they write.
  Results: They test their approach by training a multiple-brand ResNet architecture on CIFAR-10 while exploring a combination of ten architectural choices and seven hyperparameter choices. They limited training time to a maximum of three hours for each sampled configuration and performed 256 of these full-length runs (amounting to about 32 GPU days of constant training). They discover that the relationship between hyperparameters, architecture choices, and trained model performance, is more subtle than anticipated, indicating that there’s value in training these jointly.
  Why it matters: As computers get faster it’s going to be increasingly sensible to offload as much of the design and optimization of a given neural network architecture as possible to the computer – further development in fields of automatic model optimization will spur progress here.
  Read more: Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search (Arxiv).

NIH releases ‘DeepLesion’ dataset to aid medical researchers:
…Bonus: the data is available immediately online, no sign-up required…
The National Institute of Health has released ‘DeepLesion’, a set of 32,000 CT images with annotated lesions, giving medical machine learning researchers a significant data resource to use to develop AI systems. The images are from 4,400 unique individuals.and have been heavily annotated with bookmarks around the lesions.
  The NIH says it hopes researchers will use the dataset to help them “develop a universal lesion detector that will help radiologists find all types of lesions. It may open the possibility to serve as an initial screening tool and send its detection results to other specialist systems trained on certain types of lesions”.
  Why it matters: Data is critically important for many applied AI applications and, at least in the realm of medical data, simulating additional data is fraught with dangers, so the value of primary data taken from human sources is very high. Resources like those released by the NIH can help scientists experiment with more data and thereby further develop their AI techniques.
  Read more: NIH Clinical Center releases dataset of 32,000 CT images (NIH).
  Get the data here: NIH Clinical Center (via storage provider Box).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

AI leaders sign pledge opposing autonomous weapons:
The Future of Life Institute has issued a statement against lethal autonomous weapons (LAWs) which has been signed by 176 organizations and 2,515 individuals, including Elon Musk, Stuart Russell, Max Tegmark, and the cofounders of DeepMind.
  What’s wrong with LAWs: They letter says humans should never to delegate the decision to use lethal force to machines, because these weapons remove the “risk, attributability, and difficulty” of taking lives, and that this makes them potentially destabilizing, and powerful tools of oppression.
  (Self-)Regulation: The international community does not yet possess the governance systems to prevent a dangerous arms race. The letter asks governments to create “strong international norms, regulations and laws against LAWs.” This seems deliberately timed ahead of the upcoming meeting of the UN CCW to discuss the issue. The signatories pledge to self-regulate, promising to “neither participate in nor support the development, manufacture, trade, or use of LAWs.”
  Why it matters: Whether a ban of these weapons is a feasible, or desirable, remains unclear. Nonetheless, the increasing trend of AI practitioners mobilizing on ethical and political issues will have a significant influence on how AI will be developed. If efforts like this lead to substantive policy changes they could also serve as a useful model to study as researchers try to achieve political ends in other aspects of AI research and development.
  Read more: Lethal Autonomous Weapons Pledge (Future of Life Institute).

US military’s AI plans take shape:
The DoD has announced that they will be releasing a comprehensive AI strategy ‘within weeks’. This follows a number of piecemeal announcements, which have included the establishment earlier this month of the Joint Artificial Intelligence Center (JAIC), which will oversee all large AI programs in US defence and intelligence and forge partnerships with industry and academia.
  Why it matters: This is just the latest reminder that militaries already see the potential in AI, and are ramping up investment. Any AI arms race between countries carries substantial risks, particularly if parties prioritize pace of development over building safe, robust systems (see below). Whether or not the creation of a military AI strategy will prompt the US to finally release a broader national strategy remains to be seen.
  Read more: Pentagon to Publish Artificial Intelligence Strategy ‘Within Weeks’.
  Read more: DoD memo announcing formation of the JAIC.

Germany releases framework for their national AI strategy:
The German government has released a prelude to their national AI strategy, which will be announced at the end of November. (NB – the document has not been released in English, so I have relied on Google Translate)
  Broad ambitions: The government presents a long list of goals for their strategy. These include fostering a strong domestic industry and research sector, developing and promoting ethical standards and new regulatory frameworks, and encouraging uptake in other industries.
  Some specific proposals:
– A Data Ethics Committee to address the ethical and governance issues arising from AI.
– Multi-national research centers with France and other EU countries.
– The development of international organizations to manage labor displacement.
– Ensuring that Germany and Europe lead international efforts towards common technical standards.
– Public dialogue on the impacts of AI.
   Read more: Cornerstones of the German AI Strategy (German).

Solving the AI race:
GoodAI, a European AI research organization, held a competition for ideas on how to tackle the problems associated with races in AI development. Here follows summaries of two of the winning papers.
  A formal theory of AI coordination: This paper approaches the problem from an international relations perspective. The researchers use game theory to model 2-player AI races, where AI R&D is costly, the outcome of the race is uncertain, and players can either cooperate or defect. They consider four models the race could plausibly take, determined by the coordination regime in place, and suggest which models are the ‘safest’ in terms of players being incentivized against developing risky AI. They suggest policies to promote cooperation within different games, and to shift race dynamics into more safety-conducive set-ups.
  Solving the AI race: This paper gives a thorough overview of how race dynamics might emerge, between corporations as well as militaries, and spells out a comprehensive list of the negative consequences from such a situation. The paper present 3 mitigation strategies with associated policy recommendations. (1) encouraging and enforcing cooperation between actors; (2) providing incentives for transparency and disclosure; (3) establishing AI regulation agencies.
  Why it matters: There are good reasons to be worried about race dynamics in AI. Competing parties could be incentivized to prioritize pace of development over safety, with potentially disastrous consequences. Equally, if advanced AI is developed in an adversarial context, this could make it less likely that its benefits are fairly distributed amongst humanity. More worryingly, it is hard to see how race dynamics can be avoided given the ‘size of the prize’ in developing advanced AI. Given this, researching strategies for managing races and enforcing cooperation should be a priority.
  Read more: General AI Challenge Winners (GoodAI).

OpenAI Bits & Pieces:

OpenAI Five Benchmark:
We’ve removed many of the restrictions on our 5v5 bots and will be playing a match in a couple of weeks. Check out the blog for details about the restrictions we’ve removed and the upcoming match.
  Read more: OpenAI Five Benchmark (OpenAI blog).

AI wizard Mike Cook wants OpenAI’s Dota bots to teach him, not beat him:
Here’s a lengthy interview with Mike Cook, a games AI research, who gives some of his thoughts on OpenAI Five.
  Read more: AI wizard Mike Cook wants OpenAI’s Dota bots to teach him, not beat him (Rock Paper Shotgun).

Tech Tales:

Drone Buyer

So believe it or not the first regulations came in because people liked the drones too much – these delivery companies started servicing areas and, just like in online games, there were always some properties in a given region that massively outspent others by several orders of magnitude. As in any other arena of life where these fountains of money crop up, workers would nickname the people in these properties ‘whales’. The whales did what they did best and spent. But problems emerged as companies continued expanding delivery hours and the whales continued to spend and spend – suddenly, an area that the companies machine learning algorithm had zoned for 50 deliveries a day (and squared away with planning officials) suddenly had an ultra-customer to contend with. And these customers would order things in the middle of the night. Beans would land on lawns at 3am. Sex toys at 4am. Box sets of obscure TV shows would plunk down at 6am. Breakfast burritos would whizz in at 11am. And so on. So the complaints started piling up and that led to some of the “anti-social drone” legislation, which is why most cities now specify delivery windows for suburban areas (and ignore the protests of the companies who point to their record-breakingly-quiet new drones, or other innovations).

Things that inspired this story: Drones, Amazon Prime, everything-as-a-service, online games.

Import AI: #103: Testing brain-like alternatives to backpropagation, why imagining goals can lead to better robots, and why navigating cities is a useful research avenue for AI

Backpropagation may not be brain-like, but at least it works:
…Researchers test more brain-like approaches to learning systems, discover that backpropagation is hard to beat…
Backpropagation is one of the fundamental tools of modern deep learning – it’s one of the key mechanisms for propagating and updating information through networks during training. Unfortunately, there’s relatively little evidence available that our own human brains perform a process analogous to backpropagation (this is a question Geoff Hinton has struggled with for several years in talks like ‘Can the brain do back-propagation‘?). That has given some concern to researchers for some years who worry that though we’re seeing significant gains from developing things based on backpropagation, we may need to investigate other approaches in the future.  Now, researchers with Google Brain and the University of Toronto have performed an empirical analysis of a range of fundamental learning algorithms, testing approaches based on backpropagation against ones using target propagation and other variants.
  Motivation: The idea behind this research is that “there is a need for behavioural realism, in addition to physiological realism, when gathering evidence to assess the overall biological realism of a learning algorithm. Given that human beings are able to learn complex tasks that bear little relationship to their evolution, it would appear that the brain possesses a powerful, general-purpose learning algorithm for shaping behavior”.
  Results: The researchers “find that none of the tested algorithms are capable of effectively scaling up to training large networks on ImageNet”, though they record some success with MNIST and CIFAR. “Out-of-the-box application of this class of algorithms does not provide a straightforward solution to real data on even moderately large networks,” they write.
   Why it matters: Given that we know how limited and simplified our neural network systems are, it seems intellectually honest to test and ablate algorithms, particularly by comparing well-studied ‘mainstream’ approaches like backpropagation with more theoretically-grounded but less-developed algorithms from other parts of the literature.
  Read more: Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures (Arxiv).

AI and Silent Bugs:
…Half-decade old bug in ‘Aliens’ game found responsible for poor performance…
One of the more irritating things about developing AI systems is that when you mis-program AI it tends to fail silently – for instance, in OpenAI’s Dota project we saw performance dramatically increase simply after fixing non-breaking bugs. Another good example of this phenomenon has turned up in news about Aliens: Colonial Marines, a poorly reviewed half-decade-old game. But it turns out some of the reasons for those poor reviews were likely due to a bug – subsequent patches have found that the original game mis-named one variable which lead to entire chunks of the game’s enemy AI systems not functioning.
  Read more: A years-old, one-letter typo led to Aliens: Colonial Marines’ weird AI (Ars Technica).

Berkeley researchers teach machines to dream imaginary goals and solutions for better RL:
…If you want to change the world, first imagine yourself changing it…
Berkeley researchers have developed a way for machines to develop richer representations of the world around them and use this to solve tasks. The method they use to achieve this is a technique called ‘reinforcement learning with imagined goals’ (RIG). RIG works like this: an AI system interacts with an environment, data from these observations is used to train (and finetune) a variational auto encoder (VAE) latent variable model, then they use this representation to train the AI system to solve different imagined tasks using the representation learned by the VAE. This type of approach is becoming increasingly popular as AI researchers try to increase the capabilities of algorithms by getting them to use and learn from more data.
  Results: Their approach does well at tasks requiring reaching objects and pushing objects to a goal, beating baselines including algorithms like Hindsight Experience Replay (HER).
  Why it matters: After spending several years training algorithms to master an environment, we’re now trying to train algorithms that can represent their environment, then use that representation as an input to the algorithm to help it solve a new task. This is part of a general push toward greater representative capacity within trained models.
  Read more: Visual Reinforcement Learning with Imagined Goals (Arxiv).

Facebook thinks the path to smarter AI involves guiding other AIs through cities:
…’Talk The Walk’ task challenges AIs to navigate each other through cities, working as a team…
Have you ever tried giving directions to someone over the phone? It can be quite difficult, and usually involves a series of dialogues between you and the person as you try to figure out where in the city they are in relation to where they need to get to. Now, researchers with Facebook and the Montreal Institute of Learning Algorithms (MILA) have set out to develop and test AIs that can solve this task, so as to further improve the generalization capabilities of AI agents. “”For artificial agents to solve this challenging problem, some fundamental architecture designs are missing,” the researchers say.
  The challenge: The new “Talk The Walk” task frames the problem as a discussion between a ‘guide’ and a ‘tourist’ agent. The guide agent has access to a map of the city area that the tourist is in, as well as a location the tourist wants to get to, and the tourist has access to an annotated image of their current location along with the ability to turn left, turn right, or move forward.
  The dataset: The researchers created the testing environment by obtaining 360-degree photographic views of neighborhoods in New York City, including Hell’s Kitchen, the East VIllage, Williamsburg, the Financial District, and the Upper East Side. They then annotated each image of each corner of each street intersection with a set of landmarks drawn from the following categories: bar, bank, shop, coffee shop, theater, playfield, hotel, subway, and restaurant. They then had more than six hundred users of Mechanical Turk play a human version of the game, generating 10,000 successful dialogues from which AI systems can be trained (with over 2,000 successful dialogues available for each neighborhood of New York the researchers gathered data for).
  Results: The researchers tested their developed systems at how well they can localize themselves – that is, develop a notion of where they are in the city. The results are encouraging, with localization models developed by the researchers achieving a higher localization score than humans. (Though humans take about half the number of steps to effectively localize themselves, showing that human sample efficiency remains substantially better than those of machines.
  Why it matters: Following a half decade of successful development and commercialization of basis AI capabilities like image and audio processing, researchers are trying to come up with the next major tasks and datasets they can use to test contemporary research algorithms and developing them further. Evaluation methods like those devised here can help us develop AI systems which need to interact with larger amounts of real world data, potentially making it easier to evaluate how ‘intelligent’ these systems are becoming, as they are being tested directly on problems that humans solve every day and have good intuitions and evidence about the difficulty of. Though it’s worth noting that the current version of the task as solved by Facebook is fairly limited, as it involves a setting with simple intersections (predominantly just four-way straight-road intersections), and the agents aren’t being tested on very large areas nor are being required to navigate particularly long distances.
  Read more: Take the Walk: Navigating New York City through Grounded Dialogue (Arxiv).

Microsoft calls for government-led regulation of artificial intelligence technology:
…Company’s chief legal officer Brad Smith says government should study and regulate the technology…
Microsoft says the US government should appoint an independent commission to investigate the uses and applications of facial recognition technology. Microsoft says it is calling for this because it thinks the technology is of such utility and generality that it’s better for the government to think about regulation in a general sense than for specific companies like Microsoft tot think through questions on their own. The recommendation follows a series of increasingly fraught run-ins between the government, civil rights groups, and companies regarding the use of AI: first, Google dealt with employees protesting its ‘Maven’ AI deal with the DoD, then Amazon came under fire from the ACLU for selling law enforcement authorities facial recognition systems based on its ‘Rekognition’ API.
  Specific questions: Some of the specific question areas Smith thinks the government should spend time include: should law enforcement use of facial recognition be subject to human oversight and control? Is it possible to ensure civilian oversight of this technology? Should retailers post a sign indicating that facial recognition systems are being used in conjunction with surveillance infrastructure?
  Why it matters: Governments will likely be the largest uses of AI-based systems for surveillance, facial recognition, and more – but in many countries the government needs the private sector to develop and sell it products with these capabilities, which requires a private sector that is keen to help the government. If that’s not the case, then it puts the government into an awkward position. Government can clarify some of these relationships in specific areas by, as Microsoft suggests here, appointing an external panel of experts to study an issue and make recommendations.
  A “don’t get too excited” interpretation: Another motivation a company like Microsoft might have for calling for such analysis and regulation is that large companies like Microsoft have the resources to be able to ensure compliance with any such regulations, whereas startups can find this challenging.
  Read more: Facial recognition technology: The need for public regulation and corporate responsibility (Microsoft).

Google opens a Seedbank for wannabe AI gardeners:
Seedbank provides access to a dynamic, online, code encyclopedia for AI systems…
Google has launched Seedbank, a living encyclopedia about AI programming and research. Seedbank is a website that contains a collection of machine learning examples which can be interacted with via a live programming interface in Google ‘colab’. You can browse ‘seeds’ which are major AI topic areas like ‘Recurrent Nets’ or ‘Text & Language’, then click into them for specific examples; for instance, when browsing ‘Recurrent Nets’ you can learn about Neural Translation with Attention and can open a live notebook to walk you through the steps involving in creating a language translation system.
  “For now we are only tracking notebooks published by Google, though we may index user-created content in the future. We will do our best to update Seedbank regularly, though also be sure to check TensorFlow.org for new content,” writes Michael Tyke in a blog post announcing Seedbank.
  Why it matters: AI research and development is heavily based around repeated cycles of empirical experimentation, so being able to interact with and tweak live programming examples of applied AI systems is a good way to develop better intuitions about the technology.
  Read more: Seedbank – discover machine learning examples (TensorFlow Medium blog).
  Read more: Seedbank official website.

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

Cross-border collaboration, openness, and dual-use:
…A new report urges better oversight of international partnerships on AI, to ensure that collaborations are not being exploited for military uses…
The Australian Strategic Policy Institute has published a report by Elsa Kania outlining some of the dual-use challenges inherent to today’s scalable, generic AI techniques.
  Dual-use as a strategy: China’s military-civil fusion strategy relies on using the dual-use characteristics of AI to ensure new civil developments can be applied in the military domain, and vice versa. There are many cases of private labs and universities working on military tech, e.g. the collaboration between Baidu and CETC (state-owned defence conglomerate). This blurring of the line between state/military and civilian research introduces a complication into partnerships between (e.g.) US companies and their Chinese counterparts.
  Policy recommendations: Organizations should assess the risks and possible externalities from existing partnerships in strategic technologies, establish systems of best practice for partnerships, and monitor individuals and organizations with clear links to foreign governments and militaries.
  Why this matters: Collaboration and openness are a key driver of innovation in science. In the case of AI, international cooperation will be critical in ensuring that we manage the risks and realize the opportunities of this technology. Nevertheless, it seems wise to develop systems to ensure that collaboration is done responsibly and with an awareness of risks.
  Read more: Technological entanglement.

Around the world in 23 AI strategies:
Tim Dutton has summarized the various national AI strategies governments have put forward in the past two years.
  Observations:
– AlphaGo really was a Sputnik moment in Asia. Two days after AlphaGo defeated Lee Sedol in 2016, South Korea’s president announced ₩1 trillion ($880m) in funding for AI research, adding “Korean society is ironically lucky, that thanks to the ‘AlphaGo shock’, we have learned the importance of AI before it is too late.”
– Canada’s strategy is the most heavily focused on investing in AI research and talent. Unlike other countries, their plan doesn’t include the usual policies on strategic industries, workforce development, and privacy issues.
– India is unique in putting social goals at the forefront of their strategy, and focusing on the sectors which would see the biggest social benefits from AI applications. Their ambition is to then scale these solutions to other developing countries.
   Why this matters: 2018 has seen a surge of countries putting forward national AI strategies, and this looks set to continue. The range of approaches is striking, even between fairly similar countries, and it will be interesting to see how these compare as they are refined and implemented in the coming years. The US is notably absent in terms of having a national strategy.
   Read more: Overview of National AI Strategies.

Risks and regulation in medical AI:
Healthcare is an area where cutting-edge AI tools such as deep learning are already having a real positive impact. There is some tension, though, between the cultures of “do no harm”, and “move fast and break things.”
  We are at a tipping point: We have reached a ‘tipping point’ in medical AI, with systems already on the market that are making decisions about patients’ treatment. This is not worrying in itself, provided these systems are safe. What is worrying is that there are already examples of autonomous systems making potentially dangerous mistakes. The UK is using an AI-powered triage app, which recommends whether patients should go to hospital based on their symptoms. Doctors have noticed serious flaws, with the app appearing to recommend staying at home for classic symptoms of heart attacks, meningitis and strokes.
  Regulation is slow to adapt: Regulatory bodies are not taking seriously the specific risks from autonomous decision-making in medicine. By treating these systems like medical devices, they are allowing them to be used on patients without a thorough assessment of their risks and benefits. Regulators need to move fast, yet give proper oversight to these technologies.
  Why this matters: Improving healthcare is one of the most exciting, and potentially transformative applications of AI. Nonetheless, it is critical that the deployment of AI in healthcare is done responsibly, using the established mechanisms for testing and regulating new medical treatments. Serious accidents can prompt powerful public backlashes against technologies (e.g. nuclear phase-outs in Japan and Europe post-Fukushima). If we are optimistic about the potential healthcare applications of AI, ensuring that this technology is developed and applied safely is critical in ensuring that these benefits can be realized.
  Read more: Medical AI Safety: We have a problem.

OpenAI & ImportAI Bits & Pieces:

Better generative models with Glow:
We’ve released Glow, a generative model that uses a 1×1 reversible convolution to give it a richer representative capacity.  Check out the online visualization tool to experiment with a pre-trained Glow model yourself, applying it to images you can upload.
   Read more: Glow: Better Reversible Generative Models (OpenAI Blog).

AI, misuse, and DensePose:
IEEE Spectrum has written up some comments from here in Import AI about Facebook’s ‘DensePose’ system and the challenges it presents for how AI systems can potentially be misused and abused. As I’ve said in a few forums, I think the AI community isn’t really working hard on this problem and is creating unnecessary problems (see also: voice cloning via Lyrebird, faking politicans via ‘Deep Video Portraits’, surveiling crowds with drones, etc).
  Read more: Facebook’s DensePose Tech Raises Concerns About Potential Misuse (IEEE Spectrum).

Tech Tales:

Ad Agency Contracts for a Superintelligence:

Subject: Seeking agency for AI Superintelligence contract.
Creative Brief: Company REDACTED has successfully created the first “AI Superintelligence” and is planning a global, multi-channel, PR campaign to introduce the “AI Superintelligence” (henceforth known as ‘the AI’) to a global audience. We’re looking for pitches from experienced agencies with unconventional ideas in how to tell this story. This will become the most well known media campaign in history.

We’re looking for agencies that can help us create brand awareness equivalent to other major events, such as: the second coming of Jesus Christ, the industrial revolution, the declaration of World War 1 and World War 2, the announcement of the Hiroshima bomb, and more.

Re: Subject: Seeking agency for AI Superintelligence contract.
Three words: Global. Cancer. Cure. Let’s start using the AI to cure cancer around the world. We’ll originally present these cures as random miracles and over the course of several weeks will build narrative momentum and impetus until ‘the big reveal’. Alongside revealing the AI we’ll also release a fully timetabled plan for a global rollout of cures for all cancers for all people. We’re confident this will establish the AI as a positive force for humanity while creating the requisite excitement and ‘curiosity gap’ necessary for a good launch.

Re: Subject: Seeking agency for AI Superintelligence contract.
Vote Everything. Here’s how it works: We’ll start an online poll asking people to vote on a simple question of global import, like, which would you rather do: Make all aeroplanes ten percent more fuel efficient, or reduce methane emissions by all cattle? We’ll make the AI fulfill the winning vote. If we do enough of these polls in enough areas then people will start to correlate the results of the polls with larger changes in the world. As this happens, online media will start to speculate more about the AI system in question. We’ll be able to use this interest to drive attention to further polls to have it do further things. The final vote before we reveal it will be asking people what date they want to find out who is ‘the force behind the polls’.

Re: Subject: Seeking agency for AI Superintelligence contract.
Destroy Pluto. Stay with us. Destroy Pluto AND use the mass of Pluto to construct a set of space stations, solar panels, and water extractors throughout the solar system. We can use the AI to develop new propulsion methods and materials which can then be used to create an expedition to destroy the planet. Initially it will be noticed by astronomers. We expect early media narratives to assume that Pluto has been destroyed by aliens who will then harvest the planet and use it to build strange machines to bring havoc to the solar system. Shortly before martial law is declared we can make an announcement via the UN that we used the intelligence to destroy Pluto, at which point every person on Earth will be given a ‘space bond’ which entitles them to a percentage of future earnings of the space-based infrastructure developed by the AI.

Things that inspired this story: Advertising agencies, the somewhat un-discussed question of “what do we do if we develop superintelligence arrives”, historical moments of great significant.

Import AI: #102: Testing AI robustness with IMAGENET-C, militarycivil AI development in China, and how teamwork lets AI beat humans

Microsoft opens up search engine data:
New searchable archive simplifies data finding for scientists…
Microsoft has released Microsoft Research Open Data, a new web portal that people can use to comb through the vast amounts of data released in recent years by Microsoft Research. The data has also been integrated with Microsoft’s cloud services, so researchers can easily port the data over to an ‘Azure Data Science virtual machine’ and start manipulating it with pre-integrated data science software.
  Data highlights: Microsoft has released some rare and potentially valuable datasets, like 10GB worth of ‘Dual Word Embeddings Trained on Big Queries‘ (data from live search engines tends to be very rare), along with original research-oriented datasets like FigureQA, and a bunch of specially written mad libs.
  Read more: Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud (Microsoft Research Blog).
Browse the data: Microsoft Research Open Data.

What does military<>civil fusion look like, and why is China so different from America?
…Publication from Tsinghua VP highlights difference in technology development strategies…
What happens when you have a national artificial intelligence strategy that revolves around developing military and civil AI applications together? A recent (translated) publication by You Zheng, vice president of China’s Tsinghua University, provides some insight.
  Highlights: Tsinghua is currently constructing the ‘High-End Laboratory for Military Intelligence’, which will focus on developing AI to better support China’s country-level goals. As part of this, Tsinghua will invest in basic research guided by some military requirements. The university has also created the ‘Tsinghua Brain and Intelligence Laboratory’ to encourage interdisciplinary research which is less connected to direct military applications. Tsinghua also has a decade-long partnership with Chinese social network WeChat and search engine Sohuo, carrying out joint development within the civil domain. And it’s not focusing purely on technology – the school recently created a ‘Computational Legal Studies’ masters program “to integrate the school’s AI and liberal arts so as to try a brand-new specialty direction for the subject.”
  Why it matters: Many governments are currently considering how to develop AI to further support their strategic goals – many countries in the West are doing this by relying on a combination of classified research, public contracts from development organizations like DARPA, and partnerships with the private sector. But the dynamics of the free market and tendency in these countries to have relatively little direct technology development and research via the state (when compared to the amounts expended by the private sector) has led to uneven development, with civil applications leaping ahead of military ones in terms of capability and impact. China’s gamble is that a state-led development strategy can let it better take advantage of various AI capabilities to more rapidly integrate AI into its society – both civil and military. The outcome of this gamble will be a determiner of the power balance of the 21st century.
  Read more: Tsinghua’s Approach to Military-Civil Fusion in Artificial Intelligence (Battlefield Singularity).

DeepMind bots learn to beat humans at Capture the Flag:
…Another major step forward for team-based AI work…
Researchers with DeepMind have trained AIs that are competitive with humans in a first-person multiplayer game. The result shows that it’s possible to train teams of agents to collaborate with each other to achieve an objective against another team (in this case, Capture the Flag played from the first person perspective within a modified version of the game Quake 3), and follows other recent work from OpenAI on the online team-based multiplayer game Dota, as well as work by DeepMind, Facebook, and others on StarCraft 1 and StarCraft 2.
  The technique relies on a few recently developed approaches, including multi-timescale adaptation, an external memory module, and having agents evolve their own internal reward signals. DeepMind combines these techniques with a multi-agent training infrastructure which uses its recently developed ‘population-based training’ technique. One of the most encouraging results is that trained agents can generalize to never-before-seen maps and typically beat humans when playing under these conditions.
  Additionally, the system lets them train very strong agents: “we probed the exploitability of the agent by allowing a team of two professional games testers with full communication to play continuously against a fixed pair of agents. Even after twelve hours of practice the human game testers were only able to win 25% (6.3% draw rate) of games against the agent team, though humans were able to beat the AIs when playing on pre-defined maps by slowly learning to exploit weaknesses in the AI. Agents were trained on ~450,000 separate games.
  Why it matters: This result, combined with work by others on tasks like Dota 2, shows that it’s possible to use today’s existing AI techniques, combined with large-scale training, to create systems capable of beating talented humans at complex tasks that require teamwork and planning over lengthy timescales – I think because of the recent pace of AI progress these results can seem weirdly unremarkable, but I think that perspective would be wrong: it is remarkable we can develop agents capable of beating people at tasks requiring ‘teamwork’ – a trait that seems to require many of the cognitive tools we think are special, but which is now being shown to be achievable via relatively simple algorithms. As some have observed, one of the more intuitive yet counter-intuitive aspects of these results is how easily it seems that ‘teamwork’ can be learned.
  Less discussed: I think we’re entering the ‘uncanny valley’ of AI research when it comes to developing things with military applications. This ‘capture the flag’ demonstration, along with parallel work on OpenAI and on StarCraft, has a more militaristic flavor than prior research by the AI community. My suspicion is we’ll need to start thinking more carefully about we contextualize results like this and work harder at analyzing which other actors may be inspired by research like this.
Read more: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning (Arxiv).
  Watch extracts of the agent’s behavior here (YouTube).

Discover the hidden power of Jupyter at JupyterCon.
2017: 1.2 million Jupyter notebooks on GitHub.
2018: 3 million, when JupyterCon starts in New York this August.
– This is just one sign of the incredible pace of discovery, as organizations use notebooks and use recent platform developments to solve difficult data problems such as scalability, reproducible science, and compliance, data privacy, ethics, and security issues. JupyterCon: It’s happening Aug 21-25.
– Save 20% on most passes with the code IMPORTAI20.

Ever wanted to track the progress of language modelling AI in minute detail? Now is your chance!
…Mapping progress in a tricky-to-model domain…
How fast is the rate of progression in natural language processing technologies, and where does that progression fit into the overall development of the AI landscape? That’s a question that natural language processing researcher Seb Ruder has tried to answer with a new project oriented around tracking the rate of technical progress on various NLP tasks. Check out the project’s GitHub page and try to contribute if you can.
  Highlights: The GitHub repository already contains more than 20 tasks, and we can get an impression of recent AI progress by examining the results. Tasks like language modeling have seen significant progress in recent years, while tasks like constituency parsing and part-of-speech tagging have seen less profound progress (potentially because existing systems are quite good at these tasks).
  Read more: Tracking the Progress in Natural Language Processing (Sebastian Ruder’s website).
  Read more: Tracking Progress in Natural Language Processing (GitHub).

Facebook acquires language AI company Bloomsbury AI:
…London-based acquihire adds language modeling talent…
Facebook has acquired the team from Bloomsbury AI who will join the company in London and work on natural language processing research. Bloomsbury had previously built systems for examining corpuses of text and answering questions about them, and includes an experienced AI engineering and research team including Dr Sebastian Riedel, a professor at UCL (acquiring companies with professors tends to be a strategic move as it can help with recruiting).
  Read more: Bloomsbury AI website (Bloomsbury AI).
  Read more: I’d link to the ‘Facebook Academics’ announcement if Facebook didn’t make it so insanely hard to get direct URLs to link to within its giant blue expanse.

What is in Version 2, makes the world move, and just got better?
…Robot Operating System 2: Bouncy Bolson…
The ‘Bouncy Bolson’ version of ROS 2 (Robot Operating System) has been released. New features for the open source robot software include better security features, support for 3rd party package submission on the ROS 2 build farm, new command line tools, and more. This is the second non-beta ROS 2 release.
  Read more: ROS 2 Bouncy Bolson Released (Ros.org).

Think deep learning is robust? Try out IMAGENET-C and think again:
…New evaluation dataset shows poor robustness of existing models…
Researchers with Oregon State University have created new datasets and evaluation criteria to see how well trained image recognition systems deal with corrupt data. The research highlights the relatively poor representation and generalisation of today’s algorithms, while providing challenging datasets people may wish to test systems against in the future. To conduct their tests, the researchers create two datasets to evaluate how AI systems deal with these changes. IMAGENET-C is a dataset to test for “corruption robustness” and ICONS-50 is for testing for “surface variation robustness”.
  IMAGENET-C sees them apply 15 different types of data corruption to existing images, ranging from blurring images, to adding noise, or the visual hallmarks of environmental effects like snow, frost, fog, and so on. ICONS-50 consists of 10,000 images from 50 clases of icons of different things like people, food, activities, logos, and so on, and each class contains multiple different illustrative styles.
  Results: To test how well algorithms deal with these visual corruptions the researchers test pre-trained image categorization models against different versions of IMAGENET-C (where a version roughly corresponds to the amount of corruption applied to a specific image), then compute the error rate. The results of the test are that more modern architectures have become better at generalizing to new datatypes (like corrupted images), but that robustness – which means how well a model adapts to changes in data – has barely risen. “Relative robustness remains near AlexNet-levels and therefore below human-level, which shows that our superhuman classifiers are decidedly subhuman,” they write. They do find that there are a few tricks that can be used to increase the capabilities of models to deal with corrupted data: “more layers, more connections, and more capacity allow these massive models to operate more stably on corrupted inputs,” they write.
  For ICONs-50 they try to test classifier robustness by removing the icons from one source (eg Microsoft) or by selecting removing subtypes (like ‘ducks’) from broad categories (like ‘birds’). Their results are somewhat unsurprising: networks are not able to learn enough general features to effectively identify held-out visual styles, and similarly poor performance is displayed when tested on held-out sub-types.
  Why it matters: As we currently lack much in the way of theory to explain and analyze the successes of deep learning we need to broaden our understanding of the technology through empirical experimentation, like what is carried out here. And what we keep on learning is that, despite incredible gains in performance in recent years, deep nets themselves seem to be fairly inflexible when dealing with unseen or out-of-distribution data.
  Read more: Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net …

Technology Roulette:
Richard Danzig, former secretary of the Navy, has written a report for thinktank the Center for a New American Security, on the risks arising from militaries pursuing technological superiority.
  Superiority does not imply security: Creating a powerful, complex technology creates a new class of risks (e.g. nuclear weapons, computers). Moreover, pursuing technological superiority, particularly in a military context, is not a guarantee of safety. While superiority might decrease the risk of attack, through deterrence, it raises the risk of a loss of control, through accidents, misuse, or sabotage. These risks are made worse by the unavoidable proliferation of new technologies, which will place “great destructive power” in the hands of actors without the willingness or ability to take proper safety precautions.
  Human-in-the-loop: A widely held view amongst the security establishment is that these risks can be addressed by retaining human input in critical decision-making. Danzig counters this, arguing that human intervention is “too weak, and too frequently counter-productive” to control military systems that rely on speed. And AI decision-making is getting faster, whereas humans are not, so this gap will only widen over time. Efforts to control such systems must be undertaken at the time of design, rather that during operation.
   What to do: The report makes 5 recommendations for US military/intelligence agencies:
-Increase focus on risks of accidents and emergent effects
– Give priority to reducing risks of proliferation, adversarial behavior, accidents and emergent behaviors.
– Regularly assess these risks, and encourage allies and opponents to do so.
– Increase multilateral planning with allies and opponents, to be able to recognize and respond to accidents, major terrorist events, and unintended conflicts.
– Use new technologies as means for encourage and verifying norms and treaties.
  Why this matters: It seems inevitable that militaries will see AI as a means of achieving strategic advantage. This report sheds light on the risks that such a dynamic could pose to humanity if parties do not prioritize safety, and do not cooperate on minimizing risks from loss of control. One hopes that these arguments are taken seriously by the national security community in the US and elsewhere.
  Read more: Technology Roulette: Managing Loss of Control as Many Militaries Pursue Technological Superiority (CNAS).

UK government responds to Lords AI Report:
The UK government has responded to the recommendations made in the House of Lords’ AI report, released in April. For the most part, the government accepts the committee’s recommendations and is taking actions to help with specific elements of the recommendations:
On public perceptions of AI, the government will work to build trust and confidence in AI through the AI institutions like the Centre for Data Ethics and Innovation (CDEI), which will pursue extensive engagement with the public, industry and regulators, and will align governance measures with the concerns of the public, and businesses.
– On algorithmic transparency, the government pushes back against the report’s recommendation that deployed AI systems have a very high level of transparency/explainability. They note that excessive demands for algorithmic transparency in deployed algorithms could hinder development, particularly in deep learning, and must therefore be weighed against the benefits of the technologies.
– On data monopolies the government will strengthen the capabilities of the UK’s competition board to monitor anti-competitive practices in data and AI, so it can better analyze and respond to the potential for the monopolisation of data by tech giants.
– On autonomous weapons the report asked that the UK improves its definition of autonomous weapons, and brings it into line with that of other governments and international bodies. The government defines an autonomous system as one that “is capable of understanding higher-level intent and direction”, which the report argued “sets the bar so high that it was effectively meaningless.” The gov’t said they have no plans to change their definition.
– Why this matters: The response is not a game-changer, but it is worth reflecting on the way in which the UK has been developing their AI strategy, particularly in comparison with the US (see below). While the UK’s AI strategy can certainly be criticized, the first stage of information-gathering and basic policy recommendations has proceeded commendably. The Lords AI Report and the Hall-Pesenti Review were both detailed investigations, drawing on a array of expert opinions, and asking informed questions. Whether this methodology produces good policy remains to be seen, and depends on a number of contingencies.
  Read more: Government response to House of Lords AI Report\.

Civil liberties group urges US urged to include public in AI policy development, consider risks
Civil liberties group EPIC has organized a petition, with a long list of signatories from academia and industry, to the US Office of Science and Technology Policy (OSTP). Their letter is critical of the US government’s progress on AI policy, and the way in which the government is approaching issues surrounding AI.
  Public engagement in policymaking: The letter asks for more meaningful public participation in the development of US AI policy. They take issue with the recent Summit on AI being closed to the public, and the proposal for a Select Committee on AI identifying only the private sector as a source of advice. This contrasts with other countries, including France, Canada and UK, all of whom have made efforts to engage public opinion on AI.
  Ignoring the big issues: More importantly, the letter identifies a number of critical issues that they say the government is failing to address:
– Potential harms arising from the use of AI.
– Legal frameworks governing AI.
– Transparency in the use of AI by companies, government.
– Technical measures to promote the benefits of AI and minimize the risks.
– The experiences of other countries in trying to address challenges of AI.
– Future trends in AI that could inform the current discussion.
Why this matters: The US is conspicuous amongst global powers for not having a coordinated AI strategy. Other countries are quickly developing plans not only to support their domestic AI capabilities, but to deal with the transformative change that AI will have. The issues raised by the letter cover much of the landscape governments need to address. There is much to be criticized about existing AI strategies, but it’s hard to see the benefits of the US’ complacency.
   Read more: Letter to Michael Kratsios.

OpenAI Bits & Pieces:

Exploring with demonstrations:
New research from OpenAI shows how to obtain a state-of-the-art score on notoriously hard exploration game Montezuma’s Revenge by using a single demonstration.
   Read more: Learning Montezuma’s Revenge from a Single Demonstration (OpenAI blog).

Tech Tales:

When we started tracking it, we knew that it could repair itself and could go and manipulate the world. But there was no indication that it could multiply. For this we were grateful. We were hand-picked from several governments and global corporations and tasked with a simple objective: determine the source of the Rogue Computation and how it transmits its damaging actions to the world.

How do you find what doesn’t want to be found? Look for where it interacts with the world. We set up hundreds of surveillance operations to monitor the telecommunications infrastructure, internet cafes, and office buildings back to which we had traced viruses that bore the hallmarks of Rogue Computation. One day we identified some humans who appeared to be helping the machine, linking a code upload to a person who had gone into the building a few minutes earlier holding a USB key. In that moment we stopped being metal-hunters and became people-hunters.

Adapt, our superiors told us. Survey and deliver requested analysis. So we surveiled the people. We mounted numerous expeditions, tracking people back from the internet cafes where they had uploaded Rogue Computation Products, and following them into the backcountry behind the megacity expanse – a dismal set of areas that, from space, looks like a the serrated ridges caused in the wake of the passage of a boat. These areas were forested; polluted with illegal e-waste and chem-waste dumps; home to populations of the homeless and those displaced by the cold logic of economics; full of discarded home robots and bionic attachments; and everywhere studded with the rusting metal shapes of crashed or malfunctioned or abandoned drones. When we followed these people into these areas we found them parking cars at the heads of former hiking trails, then making their way deeper into the wilderness.

After four weeks of following them we had our first confirmed sighting of the Suspected Rogue Computation Originator: it was a USB inlet, which dangled out of a drainage pipe embedded in the side of a brown, forested hillside. Some of us shivered when we saw a human approach the inlet and, like an ancient peasant paying tribute to a magician, extend a USB key and plug it into the inlet, then back away with their palms held up toward the inlet. A small blue light in the USB inlet went on. Then the inlet, now containing a USB key, began to withdraw backward into the drainage pipe, pulled from within.

Then things were hard for a while. We tracked more people. Watched more exchanges. Observed over twenty different events which led to Rogue Computation Products being delivered to the world. But our superiors wouldn’t let us interfere, afraid that, after so many years searching, they might spook their inhuman prey at the last minute and lose it forever. So we watched. Slowly, we pieced the picture together: these groups had banded together under various quasi-religious banners, worshiping fictitious AI creatures, and creating endless written ephemera scattered across the internet. Once we found their signs it became easy to track them and spot them – and then we realized how many of them there were.

But we triangulated it eventually, tracking it back to a set of disused bombshelters and mining complex buildings scattered through a former industrial sector in part of the ruined land outside of the urban expanse. Subsequently classified assessments predicted a plausible compute envelop registering in the hundreds of exaflops – enough to make it a strategic compute asset and in violation of numerous AI-takeoff control treaties. We found numerous illegal power hookups linking the Rogue Computation facilities to a number of power substations. Repeated, thorough sweeps failed to identify any indication of a link with an internet service provider, though – small blessings.

Once we knew where it was and knew where the human collaborators were, things became simple again: assassinate and destroy. Disappear the people and contrive a series of explosions across the land. Use thermite to melt and distort the bones of the proto Rogue Computation Originator, rewriting their structure from circuits and transistor gates to uncoordinated lattices of atoms, still gyrating from heat and trace radiation from the blasts.

Of course there are rumors that it got it: that those Rogue Computation Products it smuggled out form the scaffolds for its next version, which will soon appear in the world, made real as if by imagination, rather than the brutal exploitation of the consequences of a learning system and compute and time.

Things that inspired this story: Bladerunner, Charles Stross stories.

Import AI: #101: Teaching robots to grasp with two-stage networks; Silicon Valley VS Government AI; why procedural learning can generate natural curriculums.

Making better maps via AI:
…Telenav pairs machine learning with OpenStreetCam data to let everyone make better maps…
Navigation company Telenav has released datasets, machine learning software, and technical results to help people build AI services on top of mapping infrastructure. The company says it has done this to create a more open ecosystem around mapping, specifically around ‘Open Street Map’, a popular open source map).
  Release: The release includes a training set of ~50,000 images annotated with labels to help identify common road signs; a machine-learning technology stack that includes a notebook with visualizations, a RetinaNet system for detecting traffic signs, and the results from running these AI tools over more than 140-million existing street-level images; and more.
  Why it matters: Maps are fundamental to the modern world. AI promises to give us the tools needed to automatically label and analyze much of the world around us, holding with it the promise to create truly capable open source maps that can rival those developed by proprietary interests (see: Google Maps, HERE, etc). Mapping may also become better through the use of larger datasets to create better automatic-mapping systems, like tools that can parse the meaning of photos of road signs.
  Read more: The Future of Map-Making is Open and Powered by Sensors and AI (OpenStreetMap @ Telenav blog).
  Read more: Telenav MapAI Contest (Telenav).
  Check out the GitHub (Telenav GitHub).

Silicon Valley tries to draw a line in shifting sand: surveillance edition:
…CEO of facial recognition startup says won’t sell to law enforcement…
Brian Brackeen, the CEO of facial recognition software developed Kairos, says his company is unwilling to sell facial recognition technologies to government or law enforcement. This follows Amazon coming under fire from the ACLU for selling facial recognition services to law enforcement via its ‘Rekognition’ API.
  “I (and my company) have come to belief that the use of commercial facial recognition in law enforcement or in government surveillance of any kind is wrong – and that it opens the door for gross misconduct by the morally corrupt,” Brackeen writes. “In the hands of government surveillance programs and law enforcement agencies, there’s simply no way that face recognition software will not be used to harm citizens”, he writes.
  Why it matters: The American government is currently reckoning with the outcome of an ideological preference leading to its military industrial infrastructure relying on an ever-shifting constellation of private compares, whereas other countries tend to perform more direct investment for certain key capabilities, like AI. That’s led to today’s situation where American government entities and organizes are, upon seeing how other governments (mainly China) are implementing AI, seeking to find ways to implement AI in America. But getting people to build these AI systems for the US government has proved difficult: many of the companies able to provide strategic AI services (see: Google, Amazon, Microsoft, etc) have become so large they’ve become literal multinationals: their offices and markets are distributed around the world, and their staff come from anywhere. Therefore, these companies aren’t super thrilled about working on behalf of any one specific government, and their staff are mounting internal protests to get the companies to not sell to the US government (among others).. How the American government deals with this will determine many of the contours of American AI policy in the coming years.
  Read more: Facial recognition software is not ready for use by law enforcement (TechCrunch).

“Say it again, but like you’re sad”. Researchers create and release data for emotion synthesis:
…Parallel universe terrifying future: a literal HR robot that can detect your ‘tone’ during awkward discussions and chide you for it…
You’ve heard of speech recognition. Well, what about emotion recognition and emotional tweaking? That’s the problem of listening to speech, categorizing the emotional inflections of the voices within it, and learning to change an existing speech sample to sound like it is spoken with a different emotion  – a potentially useful technology to have for passive monitoring of audio feeds, as well as active impersonation or warping, or other purposes. But to be able to create a system capable of this we need to have access to the underlying data necessary to train it. That’s why researchers with the University of Mons in Belgium and Northeastern University in the USA have created ‘the Emotional Voices dataset’.
  The dataset: “This database’s primary purpose it to build models that could not only produce emotional speech but also control the emotional dimension in speech,” write the researchers. The dataset contains five different speakers and two spoken languages (north American English and Belgian French), with four of the five speakers contributing ~1,000 utterances each, and one speaker contributing around ~500. These utterances are split across five distinct emotions: neutral, amused, angry, sleepy, and disgust.
  You sound angry. Now you sound amused: In experiments, the researchers tested how well they could use this dataset to transform speech from the same speaker from one emotion to another. They found that people would roughly categorize voices transformed from neutral to angry in this way with roughly 70 to 80 percent accuracy – somewhat encouraging, but hardly definitive. In the future, the researchers “hope that such systems will be efficient enough to learn not only the prosody representing the emotional voices but also the nonverbal expressions characterizing them which are also present in our database.”
  Read more: The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems (Arxiv).

Giving robots a grasp of good tasks with two-stage networks:
…End-to-end learning multii-stage tasks is getting easier, Stanford researchers show…
Think about a typical DIY task you might do at home – what do you do? You probably grab the tool in one hand, then approach the object you need to fix or build, and go from there. But how do you know the best way to grip the object so you can accomplish the task? And why do you barely ever get this grasp wrong? This type of integrated reasoning and action is representative of the many ways in which humans are smarter than machines. Can we teach machines to do the same? Researchers with Stanford University have published new research showing how to train basic robots to perform simple, real-world DIY-style tasks, using deep learning techniques.
  Technique: The researchers use a simulator to repeatedly train a robot arm and a tool (in this case, a simplified toy hammer) to pick up the tool then use it to manipulate objects in a variety of situations. The approach relies on a ‘Task-Oriented Grasping Network (TOG-Net), which is a two-stage system that first predicts effective grasps for the object, then predicts manipulation actions to perform to achieve a task.
  Data: One of the few nice things about working with robots is that if you have a simulator it’s possible to automatically generate large amounts of data for training and evaluation. Here, the researchers use the open source physics simulator Bullet to generate many permutations of the scene to be learned, using different objects and behaviors. They train using 18,000 procedurally generated objects.
  Results: The system is tested in two limited domains: sweeping and hammering, where sweeping consists of using an object to move another object without lifting it, and hammering involves trying to hammer a large wooden peg into a hole. The developed system obtains reasonable but not jaw-dropping success rates on the hammering tasks (obtaining a success rate of ~80%, far higher than other methods), and less impressive results on sweeping (~71%). These results put this work firmly in the domain of research, as the success rates are far too low for this to be interesting from a commercial perspective.
  Why it matters: Thanks to the growth in compute and advancement in simulators it’s becoming increasingly easy apply deep learning and reinforcement learning techniques to robots. These advancements are leading to an increase in the pace of research in this area and suggest that, if research continues to show positive results, there may be a deep learning tsunami about to hit robotics.
  Read more: Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision (Arxiv).

Evolution is good, but guided evolution is better:
…Further extension of evolution strategies shows value in non-deep learning ideas…
Google Brain researchers have shown how to extend ‘evolution strategies’, an AI technique that has regained popularity in recent years following experiments showing it is competitive with deep learning approaches. The extension further improves performance of the ES algorithm. “Our method can primarily be thought of as a modification to the standard ES algorithm, where we augment the search distribution using surrogate gradients,” the researchers explain. The result is a significantly more capable version of ES, which they call Guided ES, that “combines the benefits of first-order methods and random search, when we have access to surrogate gradients that are correlated with the true gradient”.
  Why it matters: In recent years a huge amount of money and talent has flooded into AI, primarily to work on deep learning techniques. It’s valuable to continue to research or to revive other discarded techniques, such as ES, to provide alternative points of comparison to let us better model progress here.
  Read more: Guided evolutionary strategies: escaping the curse of dimensionality in random search (Arxiv).
  Read more: Evolution Strategies as a Scalable Alternative to Reinforcement Learning (OpenAI blog).

Using procedural creation to train reinforcement learning algorithms with better generalization:
….Do you know what is cooler than 10 video game levels? 100 procedurally generated ones with a curriculum of difficulty…
Researchers with the IT University of Copenhagen and New York University have fused procedural generation with games and reinforcement learning to create a cheap, novel approach to curriculum learning. The technique relies on using reinforcement learning to guide the generation of increasingly difficult video game levels, where difficult levels are generated only once the agent has learned to beat easier levels. This process leads to a natural curriculum emerging, as each time the agent gets better it sends a signal to the game generator to create a harder level, and so on.
  Data generation: They use the General Video Game AI Framework (GVG-AI), an open source framework which over 160 games have been developed for. GVG-AI is scriptable by the video game description language (VGDL). GVG-AI is integrated with OpenAI Gym, so developers can train against from pixel inputs, incremental rewards, and a binary win/loss signal. The researchers create level generators for three difficult games within GVG-AI. During the level generation process they also manipulate a ‘difficulty parameter’ which roughly correlates to how challenging the generated levels are.
  Results: The researchers find that systems trained with this progressive procedural generation approach do well, obtaining top scores on the challenging ‘frogs’ and ‘zelda’ games, compared to baseline algorithms trained without a procedural curriculum.
  Why it matters: Approaches like this highlight the flaws in the way we evaluate today’s reinforcement learning algorithms, where we test algorithms on similar (frequently identical) levels/games to those they were trained on, and therefore have difficulty distinguishing between algorithmic improvements and overfitting a test set. Additionally, this research shows how easy it is becoming to use computers to generate or augment existing datasets (eg, creating procedural level generators for pre-existing games), reducing the need for raw input data in AI development, and increasing the strategic value of compute.
  Read more: Procedural Level Generation Improves Generality of Deep Reinforcement Learning (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net …

Trump drops plans to block Chinese investment in US tech, strengthens oversight:
  The Trump administration has rowed back on a proposal to block investment in industrially significant technology (including AI, robotics, semiconductors) by firms with over 25% Chinese ownership, and to restrict tech exports to China by US firms.   The government will instead expand the powers of the Committee of Foreign Investment in the United States (Cfius), the body that reviews the national security implications of foreign acquisitions. The new legislation will broaden the Committee’s considerations to include the impact on the US’s competitive position in advanced technologies, in addition to security risks.
  Why this matters: Governments are gradually adapting their oversight of cross-border investment to cover AI and related technologies, which are increasingly being treated as strategically important for both military and industrial applications. The earlier proposal would have been a step-up in AI protectionism from the US, and would have likely prompted a strong retaliation from China. For now, a serious escalation in AI nationalism seems to have been forestalled.
  Read more: Trump drops new restrictions on China investment (FT).

DeepMind co-founder appointed as advisor to UK government:
Demis Hassabis, co-founder of DeepMind, has been announced as an Adviser to the UK government’s Office for AI, which focuses on developing and delivering the UK’s national AI strategy.
  Why this matters: This appointment adds credibility to the UK government’s efforts in the sector; A persistent worry is that policy-makers are out of their depth when it comes to emerging technologies, and that this could lead to poorly designed policies. Establishing close links with industry leaders is an important means of mitigating these risks.
  Read more: Demis Hassabis to advise Office for AI.

China testing bird-like surveillance drones:
Chinese government agencies have been using stealth surveillance drones mimicking the flight and appearance of birds to monitor civilians. Code-named ‘doves’ and fitted with cameras and navigation systems, they are being used for civilian surveillance in 5 provinces. The drones’ bird-like appearance allows them to evade detection by humans, and even other birds, who reportedly regularly join them in flight. They are also being explored for military applications, and are reportedly able to evade many anti-drone systems, which rely on being able to distinguish drones from birds.
  Why this matters: Drones that are able to evade detection are a powerful surveillance technology that raise ethical questions. Should similar drones be used in civilian applications in the US and Europe, we could expect a resistance from privacy advocates.
  Read more: China takes surveillance to new heights with flock of robotic doves (SCMP).

OpenAI Bits & Pieces:

OpenAI Five:
We’ve released an update giving progress on our Dota project, which involves training large-scale reinforcement learning systems to beat humans at a challenging, partially observable strategy game.
   Read more: OpenAI Five (OpenAI blog).

Tech Tales:

Partying in the sphere

The Sphere was a collection of around 1,000 tiny planets in an artificial solar system. The Sphere was also the most popular game of all time. It crept into the world at first via high-end desktop PCs. Then its creators figured out how to slim down its gameplay into a satisfying form for mobile phones. That’s when it really took over. Now the sphere has around 150 million concurrent players at any one time, making it the most popular game on earth by a wide margin.

Several decades after it launched, The Sphere has started to feel almost crowded. Most planets are inhabited. Societal hierarchies have appeared. The era of starting off as a new player with no in-game currency and working your way up are over and have been over for years.

But there’s a new sport in The Sphere: breaking it. One faction of players, numbering in the millions, has begun to construct a large metallic scaffold up from one planet at the corner of the map. Their theory is that they can keep building it until they hit the walls of The Sphere, at which point they’re fairly confident that  – barring a very expensive and impractical overhaul of the underlying simulation engine – they will be able to glitch out of the map containing the 1,000 worlds and into somewhere else.

The game company that makes The Sphere became fully automated a decade ago, so players are mostly trying to guess at the potential reactions of the Corporate AI by watching any incidental changes to the game via patches or updates. So far, nothing has happened to suggest the AI wishes to discourage the scaffolding – the physics remains similar, the metals used to make the scaffolds remain plentiful, the weight and behavior of the scaffolds in zero-g space remain (loosely) predictable.

So, people wonder, what lies beyond The Sphere? Is this something the Corporate AI now wants humanity to try and discover? And what might lie there, at the limit of the game engine, able to reach via a bugged-out glitch kept deliberately open by one of the largest and most sophisticated AIs on the planet?

All we know is two years ago some fluorescent letters appeared above every one of the 1,000 planets in The Sphere: keep going, it says.

Things that inspired this story: Eve Online, Snowcrash, Procedural Generation,