Import AI 172: Google codes AI to fix errors in Google’s code; Amazon makes mini-self-driving cars with deepracer; and Microsoft uses GPT-2 to make auto-suggest for coders

by Jack Clark

Microsoft wants to use internet-scale language models to make programmers fitter, happier, and more productive:
…Code + GPT-2 = Auto-complete for programmers…
Microsoft has used recent advances in language understanding to build a smart auto-complete function for programmers. The software company announced its Visual Studio “IntelliCode” feature at its Microsoft Ignite conference in November. The technology, which is inspired by language models like GPT-2, “extracts statistical coding patterns and learns the intricacies of programming languages from GitHub repos to assist developers in their coding,” the company says. “Based on code context, as you type, IntelliCode uses that semantic information and sourced patterns to predict the most likely completion in-line with your code.” Other people have experimented with applying large language models to problems of code prediction, including a startup called TabNine which released a GPT-2-based code completer earlier this summer. 

Why this matters: Recent advances in language models are making it easy for us to build big, predictive models for any sort of information that can be framed as a text processing problem. That means in the coming years we’re going to develop systems that can predict pleasing sequences of words, code scripts, and (though this is only just beginning to happen) sequences of chemical compounds and other things. As this technology matures, I expect people will start using such prediction tools to augment their own intelligence, pairing human intuition with big internet-scale predictive models for given domains. The cyborgs will soon be among us – and they’ll be helping to do code review!
   Read more: Re-imagining developer productivity with AI-assisted tools (Microsoft).
   Try out the feature in Microsoft’s latest Visual Studio Preview (official Microsoft webpage).
   Read more about TabNine’s tech – Autocompletion with deep learning (TabNine blog).

####################################################

So, how do we feel about all this surveillance AI stuff we’re developing?
…Reddit thread gives us a view into how the machine learning community thinks about ethical tradeoffs…
AI, or – more narrowly – deep learning, is a utility-class technology; it has a vast range of applications and many of these are being explored by developers around the world. So, how do we feel about the fact that some of these applications are focused on surveillance? And how do we feel about the fact that a small number of nation states are enthusiastically adopting AI-based surveillance technologies in the service of surveiling their citizens? That’s a question that some parts of the AI research community are beginning to ponder, and a recent thread on Reddit dramatizes this by noting just how many surveillance-oriented applications seem to come from Chinese labs (which makes sense, given that China is probably the world’s most significant developer and deployer of surveillance AI systems). 

Many reactions, few solutions: In the thread, users of the r/machinelearning subreddit share their thoughts on the issue. Responses range from (paraphrased) it’s all science, it’s not our job to think about second-order effects to this question is indicative of absurd paranoia about China to yes, China does a lot of this, but what about the US? The volume and diversity of responses gives us a sense of how thorny an issue this is for many ML researchers. 

Dual-use technologies: A big issue with surveillance AI is that it has a range of usages, some of which are quite positive. “For what it’s worth, I work in animal re-identification and the technologies that are applied and perfected in humans are slowly making their way to help monitor endangered animal populations,” they write. “It is our responsibility to call out unethical practices but also to not lose sight of all the social good that can come from ML research.”
   Read more: ICCV – 19 – The state of (some) ethically questionable papers (Reddit/r/machinelearning).

####################################################

Stanford researchers give simulated robots the sensation of touch:
…Sim2real + simulated robots + high-fidelity environments + interaction, oh my!…
Researchers with the Stanford AI Lab have extended their ‘Gibson’ robot simulation software to support interactive objects, making it possible for researchers to use Gibson to train simulated AI agents to interact with the world around them. Because the Gibson simulator (first covered: Import AI 111) supports high-fidelity graphics, it may be possible to transfer agents trained in Gibson into reality (though that’s more likely to be successful for pure visual perception tasks, rather than manipulation). 

Faster, Gibson! The researchers have also made Gibson faster – the first version of Gibson rendered scenes at between 25 and 40 frames per second (FPS) on modern GPUs. That’s barely good enough for a standard computer game being played by a human, and wildly inefficient for AI research, where agents are typically so sample efficient that it’s much better to have simulators that can run at thousands of FPS. In Interactive Gibson, the researchers implement a high-performance mesh rendering system written in Python and C++, improving FPS to ~1,000FPS at a 256X256 scene resolution – this is pretty good and should make the platform more attractive to researchers. 

Interactive Gibson Benchmark: If you want to test out how well your agents can perform in the new, improved Gibson, you can investigate a benchmark challenge created by the researchers. This challenge augments 106 existing Gibson scenes with 1984 interactable instances of five objects: chairs, desks, doors, sofas, and tables. Because Gibson consists of over 211,000 square meters of simulated indoor space, it’s not feasible to have human annotators go through it and work out where to put new objects; instead, the Gibson researchers create an AI-infused object-generation system that scans over the entire dataset and proposes objects it can add to scenes, then checks with humans as to whether its suggestions are appropriate. I think it’s interesting how common it is becoming to use ML techniques to semi-automatically enhance ML-oriented datasets.    

What does success mean in Gibson? As many AI researchers know, goal specification is always a hard problem when developing AI tasks and challenges. So, how can we assess we’re making progress in the Gibson environment? The developers propose a metric called Interactive Navigation Score (INS) that measures a couple of dimensions of the efficiency of an embodied AI agent; specifically, the efficiency (aka, distance traveled) of the paths it discovers to reach its goals, as well as the effort efficiency, which measures how much energy the agent needed to expend to achieve its goal (eg, how much energy it spends moving its own body or manipulating objects in the environment to help it achieve its goal).

The robot agents of Gibson: Having a world you can interact with is pretty pointless if you don’t have a body to use to interact with the world, so the Gibson team has also implemented several simulated robots that researchers can use within Gibson.
  These robots include: 

  • Two widely-used simulated agents (the Mujoco humanoid and ant bots)
  • Four wheeled navigation agents (Freight, JackRabbot v1, Husky, Turtlebot v2)
  • Two mobile manipulators with arms (Fetch, JackRabbot v2)
  • A quadrocopter/drone (specifically, a Quadrotor)

Why this matters: As I’ve written in this newsletter before, the worlds of robotics and of AI are becoming increasingly intermingled. The $1 trillion question is at what point both technologies combine, mature, and yield capabilities greater than the sum of their respective parts. What might the world be like if it became trivial to train agents to interact with the physical world in general, intelligent ways? Pretty different, I’d say! Systems like the Interactive Gibson Environment will help researchers generate insights from successes and failures to get us closer to that mysterious, different world.
   Read more: Interactive Gibson: A Benchmark for Interactive Navigation in Cluttered Environments (Arxiv)

####################################################

Want to build self-driving cars without needing real cars? Try Amazon’s “DeepRacer” robot:
…1:18th scale robot car gives developers an easy way to prototype self-driving car technology…
How will deep learning change how robots experience, navigate, and interact with the world? Most AI researchers assume the technology will dramatically improve robot performance in a bunch of domains. How can we assess if this is going to happen? One of the best approaches is testing out DL techniques on real-world robots. That’s why it’s exciting to see Amazon publish details about its “DeepRacer” robot car, a pint-size 1:18th scale vehicle that developers can use to develop robust, self-driving AI algorithms. 

What is DeepRacer: DeepRacer is a 1/18th scale robot car, designed to demonstrate how developers can use Amazon Web Services to build robots that do intelligent things in the world. Amazon is also hosting a DeepRacer racing league, bringing developers together at Amazon events to compete with eachother to see who can develop the smartest systems for self-racing cars.

How to DeepRace: It’s possible to use contemporary AI algorithms to train DeepRacer vehicles to complete track circuits, Amazon writes in the research paper. Specifically, the company shows how to train a system via PPO to complete racing tracks, and provides a study showing how developers can augment data and tweak hyperparameters to get good performance out of their vehicle. They also highlight the value of training in simulation across a variety of different track types, then transferring the trained policy into reality. 

What goes into a DeepRacer car? A 1:18 four wheel drive scaled car, an Intel Atom processor with a built-in GPU, 4GB of RAM and 32GB of (expandable) storage, a 13600 mAh compute battery (which lasts around ~6 hours), a 1100 mAh drive battery, wifi, and a 4MP camera. “We have designed the car for experimentation while keeping the cost nominal,” they write. 

Why this matters: There’s a big difference between “works in simulation” and “works in reality”, as most AI researchers might tell you. Therefore, having low-cost ways of testing out ideas in reality will help researchers figure out which approaches are sufficiently robust to withstand the ever-changing dynamics of the real world. I look forward to watching progress in the DeepRacer league and I think, if this platform ends up being widely used, we’ll also learn something about the evolution of robotics hardware by looking at various successive iterations of the design of the DeepRacer vehicle itself. Drive on, Amazon!
   Read more: DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning (Arxiv)

####################################################

Google trains an AI to automatically patch errors in Google’s code:
…The era of the self-learning, self-modifying company approaches…
Google researchers have developed Graph2Diff networks, a neural network system that aims to make it easy for researchers to train AI systems to analyze and edit code. With Graph2Diff, the researchers hope to “do for code-editing tasks what the celebrated Seq2Seq abstraction has done for natural language processing”. Seq2Seq, for those who don’t follow AI research with the same fascination as train spotters follow trains, is the technology that went on to underpin Google’s “Smart Reply” system that automatically composes email responses. 

How Google uses graph2diff: For this research, Google gathered code snippets link to approximately 500,000 build errors collected across Google. These build errors are basically the software logs of what happens when Google’s code build systems fail, and they’re tricky bits of data to work with as they involve multiple layers of abstraction, and frequently the way to fix the code is by editing code in a different location to where the error was observed. Using graph2diff, the researchers turn this into a gigantic machine learning problem: “We represent source code, build configuration files, and compiler diagnostic messages as a graph, and then use a Graph Neural Network model to predict a diff,” they write. 

What’s hard about this? Google analyzed some of the code errors seen in its data and layed out a few ressons why code-editing is a challenging problem. These include: Variable misuse; source-code is context-dependent so a fix that works in one place probably won’t work well in another place; edit scripts can be variable lengths; fixes don’t happen at the same place as diagnostic and 36% of cases require changing a line not pointed to by a diagnostic; there can be multiple diagnostics; single fixes can span multiple locations.

Can AI learn to code? When they test out their approach, the researchers find optimized versions of it can obtain accuracies of 28% at predicting the correct length of a code sequence. In some circumstances, they can have even better performance, achieving a precision of 61% at producing the developer fix when suggesting fixes for 46% of the errors in the data set. Additionally, Graph2Diff has much better performance than prior systems, including one called DeepDelta.

Machine creativity: Sometimes, Graph2Diff comes up with fixes that work more effectively than those proposed by humans – “we show that in some cases where the proposed fix does not match the developer’s fix, the proposed fix is actually preferable”.

Why this matters: In a few years, the software underbellies of large organizations could seem more like living creatures than static (pun intended!) entities. Work like this shows how we can apply deep learning techniques to (preliminary) problems of code identification and augmentation. Eventually, such techniques might automatically repair and – eventually – improve increasingly large codebases, giving the corporations of the future an adaptive, emergent, semi-sentient code immune system. “We hope that fixing build errors is a stepping stone to related code editing problems: there is a natural progression from fixing build errors to other software maintenance tasks that require generating larger code changes”.
   Read more: Learning to Fix Build Errors with Graph2Diff Neural Networks (Arxiv)

####################################################

Systems for seeing the world – making camera traps more efficient with deep learning:
…Or, how nature may be watched over by teams of humans&machines…
Once you can measure something, you can more easily gather data about it. When you’re dealing with something exhibiting sickness, data is key. The world’s biosphere is currently exhibiting sickness in a number of domains – one of them being the decline of various animal populations. But recent advances in AI are giving us tools to let us measure this decline, equipping us with the information we need to take action. 

   Now, a team of researchers with Microsoft, the University of Wyoming, the California Institute of Technology, and Uber AI, have designed a human-machine hybrid system for efficiently labeling images seen by camera traps in wildlife areas, allowing them to create systems that can semi-autonomously monitor and catalog the animals strewn across vast, thinly populated environments. The goal of the work is to “enable camera trap projects with few labeled images to take advantage of deep neural networks for fast, transferable, automatic information extraction”, allowing scientists to cheaply classify and count the animals seen in images from the wild. Specifically, their system uses “transfer learning and active learning to concurrently help with the transferability issue, multi-species images, inaccurate counting, and limited-data problems”.

Ultra-efficient systems for wildlife categorization: The researchers’ system gets around 91% accuracy at categorizing images on a test dataset, while using 99.5% less data than a prior system developed by the same researchers, they write. (Note that when you dig into the scores there’s a meaningful differential, with this system getting 91% accuracy, versus around 93-95% for the best performance of their prior systems.) 

How it works: The animal surveillance system has a few components. First, it use a pre-trained image model to work out if an image is empty or contains animals; if the system assigns a 90%+ probability to the image containing an animal, it tries to count the number of distinct entities in the location that it thinks are animals. It then automatically crops these images to focus on the animals, then converts these image crops into feature representations which lets it smush all the images together into an interrelated multi-dimensional embedding. It then compares these embeddings with those already in its pre-trained model and works out where to put them, allowing it to assign labels to the images.
   Periodically, the model selects 1,000 random images and requests labels from a human, who labels the images, which are then converted into feature representations, and the model is subsequently re-trained against these new feature vectors. This basically allows the researchers to use pre-canned image networks with a relatively small amount of new data, relying on humans to accurately label small amounts of real-world images which lets them recalibrate the model. 

What comes next? The researchers say there are three promising mechanisms for improving this system further. These include: Tweaking hyperparameters or using different neural net architectures for the system; extending the human-labeling system so humans also generate bounding boxes, which could iteratively improve detector performance; gather enough data to combine the classification and detection stages in one model. 

Why this matters: Deep learning has a lot in common with plumbing: a good plumber knows how to chain together various distinct components to let something flow from a beginning to an end. In the case of the plumber, the goal is to push a liquid efficiently to a major liquid thoroughfare, like a sewer. For an AI researcher, the goal is to efficiently push information through a series of distinct modules, optimizing for a desired output at the other end. With papers like this, we’re able to see what an end-to-end AI-infused pipeline for analyzing the world looks like. 

Along with this, the use of pre-trained models implies something about the world we’re operating in: It paints a picture of a world where researchers train large networks that they can, proverbially, write once / run anywhere, which are then paired with new datasets and/or networks created by domain experts for solving specific problems, like camera trap identification. As we train ever-larger and more-powerful networks, we’ll see them plug-in to domain-specific systems like the one outlined above.
   Read more: A deep active learning system for species identification and counting in camera trap images (Arxiv).
   Read the prior research that this paper builds on: Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning (PNAS, June, 2018).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Interim Report from National Security Commission on AI:
The NSCAI has delivered its first report to Congress. The Commission was launched in March, with a mandate to look at how the US could use AI development to serve national security needs, and is made up of leaders from tech and government. The report points to a number of areas where US policy may be inadequate to preserve the US’ dominance in AI, and measures to address this.

Five lines of effort:

   (1) Invest in AI R&D – current levels of federal support and funding for AI research are too low for the US to remain competitive with China and others.

   (2) Apply AI to National Security Missions – the military must work more effectively with industry partners to capitalize on new AI capabilities.

   (3) Train and Recruit AI Talent – the military and government must do better in attracting and training AI talent.

   (4) Protect and Build Upon US Technology Advantages – the US should maintain its lead in AI hardware manufacturing and take measures to better protect intellectual property relevant to AI.

   (5) Marshal Global AI Cooperation – the US must advance global cooperation on AI, particularly on ethics and safety. 

Ethics: They emphasize that it is a priority that AI is developed and deployed ethically. They point out that is important both to ensure that AI is beneficial, and to help the US maintain its competitive lead, since strong ethical commitments will help the military attract talent, and forge collaborations with industry. 

Why it matters: This report and last week’s DoD ethics principles (see Import #171) shed important light on the direction of US policy on AI. While the report is focused primarily on how the US can sustain its competitive lead in AI, and military dominance, it does foreground the importance of ethical and safe AI development, and the need for international cooperation to secure the full benefits of AI.
   Read more: Interim Report from the National Security Commission on AI.

####################################################

Research Fellowship for safe and ethical AI:
The Alan Turing Institute (ATI), based in London, is looking for a mid-career or senior academic to work on safe and ethical AI, starting in October 2020 or earlier.
   Read more: Safe and Ethical AI Research Fellow (ATI).

####################################################

Tech Tales:

I Don’t Say It / I Do Say It 

Oh, she is cute! I think you should go for the salad opener.
Salad, really?
My intuition says she’d find it amusing. Give it a try.
Ok. 

I use the salad opener. It works. She sends me some flowers carried by a little software cat who walks all over my phone and leaves a trail of petals behind it. We talk a bit more. She tells me when she was a kid she used to get so excited running down the street she’d bump into things and fall over. I get some advice and tell her that when I was young I’d sometimes insist on having “bedtime soup” and I’d sometimes get so sleepy while eating it I’d fall asleep and wake up with soup all over the bed.

I think you should either ask about her family or ask her if she can guess anything about your family background.
Seems a little try-hard to me.
Trust me, my prediction is that she will like it.
Ok. 

I asked her to tell me about her family and she told me stories about them. It went well and I was encouraged to ask her to ask about my family background. I asked. She asked if my parents had been psychologists, because I was “such a good conversationalist”. I didn’t lie but I didn’t tell the truth. I changed subjects. 

We kept on like this, trading conversations; me at the behest of my on-device AI advisor, her either on her own volition or because of her own AI tool as well. It’s not polite to ask and people don’t tell. 

When we met up in the world it was beautiful and exciting and we clicked because we felt so comfortable with eachother, thanks to our conversation. On the way to the date I saw a billboard advert for a new bone-conduction microphone/speaker that, the advert said, could be injected into your jaw, letting you sub-vocalize instructions to your own AI system, and hear your own AI system as a secret voice in your head. 

We stared at eachother before we kissed and I think both of us were looking to see if the other person was distracted, perhaps by their own invisible advisor. Neither of us seemed to be able to tell. We kissed and it was beautiful and felt right. 

Things that inspired this story: Chatbots; Learning from human preferences; smartphone apps; cognitive augmentation via AI; intimacy and prediction.