Import AI

Import AI 172: Google codes AI to fix errors in Google’s code; Amazon makes mini-self-driving cars with deepracer; and Microsoft uses GPT-2 to make auto-suggest for coders

Microsoft wants to use internet-scale language models to make programmers fitter, happier, and more productive:
…Code + GPT-2 = Auto-complete for programmers…
Microsoft has used recent advances in language understanding to build a smart auto-complete function for programmers. The software company announced its Visual Studio “IntelliCode” feature at its Microsoft Ignite conference in November. The technology, which is inspired by language models like GPT-2, “extracts statistical coding patterns and learns the intricacies of programming languages from GitHub repos to assist developers in their coding,” the company says. “Based on code context, as you type, IntelliCode uses that semantic information and sourced patterns to predict the most likely completion in-line with your code.” Other people have experimented with applying large language models to problems of code prediction, including a startup called TabNine which released a GPT-2-based code completer earlier this summer. 

Why this matters: Recent advances in language models are making it easy for us to build big, predictive models for any sort of information that can be framed as a text processing problem. That means in the coming years we’re going to develop systems that can predict pleasing sequences of words, code scripts, and (though this is only just beginning to happen) sequences of chemical compounds and other things. As this technology matures, I expect people will start using such prediction tools to augment their own intelligence, pairing human intuition with big internet-scale predictive models for given domains. The cyborgs will soon be among us – and they’ll be helping to do code review!
   Read more: Re-imagining developer productivity with AI-assisted tools (Microsoft).
   Try out the feature in Microsoft’s latest Visual Studio Preview (official Microsoft webpage).
   Read more about TabNine’s tech – Autocompletion with deep learning (TabNine blog).

####################################################

So, how do we feel about all this surveillance AI stuff we’re developing?
…Reddit thread gives us a view into how the machine learning community thinks about ethical tradeoffs…
AI, or – more narrowly – deep learning, is a utility-class technology; it has a vast range of applications and many of these are being explored by developers around the world. So, how do we feel about the fact that some of these applications are focused on surveillance? And how do we feel about the fact that a small number of nation states are enthusiastically adopting AI-based surveillance technologies in the service of surveiling their citizens? That’s a question that some parts of the AI research community are beginning to ponder, and a recent thread on Reddit dramatizes this by noting just how many surveillance-oriented applications seem to come from Chinese labs (which makes sense, given that China is probably the world’s most significant developer and deployer of surveillance AI systems). 

Many reactions, few solutions: In the thread, users of the r/machinelearning subreddit share their thoughts on the issue. Responses range from (paraphrased) it’s all science, it’s not our job to think about second-order effects to this question is indicative of absurd paranoia about China to yes, China does a lot of this, but what about the US? The volume and diversity of responses gives us a sense of how thorny an issue this is for many ML researchers. 

Dual-use technologies: A big issue with surveillance AI is that it has a range of usages, some of which are quite positive. “For what it’s worth, I work in animal re-identification and the technologies that are applied and perfected in humans are slowly making their way to help monitor endangered animal populations,” they write. “It is our responsibility to call out unethical practices but also to not lose sight of all the social good that can come from ML research.”
   Read more: ICCV – 19 – The state of (some) ethically questionable papers (Reddit/r/machinelearning).

####################################################

Stanford researchers give simulated robots the sensation of touch:
…Sim2real + simulated robots + high-fidelity environments + interaction, oh my!…
Researchers with the Stanford AI Lab have extended their ‘Gibson’ robot simulation software to support interactive objects, making it possible for researchers to use Gibson to train simulated AI agents to interact with the world around them. Because the Gibson simulator (first covered: Import AI 111) supports high-fidelity graphics, it may be possible to transfer agents trained in Gibson into reality (though that’s more likely to be successful for pure visual perception tasks, rather than manipulation). 

Faster, Gibson! The researchers have also made Gibson faster – the first version of Gibson rendered scenes at between 25 and 40 frames per second (FPS) on modern GPUs. That’s barely good enough for a standard computer game being played by a human, and wildly inefficient for AI research, where agents are typically so sample efficient that it’s much better to have simulators that can run at thousands of FPS. In Interactive Gibson, the researchers implement a high-performance mesh rendering system written in Python and C++, improving FPS to ~1,000FPS at a 256X256 scene resolution – this is pretty good and should make the platform more attractive to researchers. 

Interactive Gibson Benchmark: If you want to test out how well your agents can perform in the new, improved Gibson, you can investigate a benchmark challenge created by the researchers. This challenge augments 106 existing Gibson scenes with 1984 interactable instances of five objects: chairs, desks, doors, sofas, and tables. Because Gibson consists of over 211,000 square meters of simulated indoor space, it’s not feasible to have human annotators go through it and work out where to put new objects; instead, the Gibson researchers create an AI-infused object-generation system that scans over the entire dataset and proposes objects it can add to scenes, then checks with humans as to whether its suggestions are appropriate. I think it’s interesting how common it is becoming to use ML techniques to semi-automatically enhance ML-oriented datasets.    

What does success mean in Gibson? As many AI researchers know, goal specification is always a hard problem when developing AI tasks and challenges. So, how can we assess we’re making progress in the Gibson environment? The developers propose a metric called Interactive Navigation Score (INS) that measures a couple of dimensions of the efficiency of an embodied AI agent; specifically, the efficiency (aka, distance traveled) of the paths it discovers to reach its goals, as well as the effort efficiency, which measures how much energy the agent needed to expend to achieve its goal (eg, how much energy it spends moving its own body or manipulating objects in the environment to help it achieve its goal).

The robot agents of Gibson: Having a world you can interact with is pretty pointless if you don’t have a body to use to interact with the world, so the Gibson team has also implemented several simulated robots that researchers can use within Gibson.
  These robots include: 

  • Two widely-used simulated agents (the Mujoco humanoid and ant bots)
  • Four wheeled navigation agents (Freight, JackRabbot v1, Husky, Turtlebot v2)
  • Two mobile manipulators with arms (Fetch, JackRabbot v2)
  • A quadrocopter/drone (specifically, a Quadrotor)

Why this matters: As I’ve written in this newsletter before, the worlds of robotics and of AI are becoming increasingly intermingled. The $1 trillion question is at what point both technologies combine, mature, and yield capabilities greater than the sum of their respective parts. What might the world be like if it became trivial to train agents to interact with the physical world in general, intelligent ways? Pretty different, I’d say! Systems like the Interactive Gibson Environment will help researchers generate insights from successes and failures to get us closer to that mysterious, different world.
   Read more: Interactive Gibson: A Benchmark for Interactive Navigation in Cluttered Environments (Arxiv)

####################################################

Want to build self-driving cars without needing real cars? Try Amazon’s “DeepRacer” robot:
…1:18th scale robot car gives developers an easy way to prototype self-driving car technology…
How will deep learning change how robots experience, navigate, and interact with the world? Most AI researchers assume the technology will dramatically improve robot performance in a bunch of domains. How can we assess if this is going to happen? One of the best approaches is testing out DL techniques on real-world robots. That’s why it’s exciting to see Amazon publish details about its “DeepRacer” robot car, a pint-size 1:18th scale vehicle that developers can use to develop robust, self-driving AI algorithms. 

What is DeepRacer: DeepRacer is a 1/18th scale robot car, designed to demonstrate how developers can use Amazon Web Services to build robots that do intelligent things in the world. Amazon is also hosting a DeepRacer racing league, bringing developers together at Amazon events to compete with eachother to see who can develop the smartest systems for self-racing cars.

How to DeepRace: It’s possible to use contemporary AI algorithms to train DeepRacer vehicles to complete track circuits, Amazon writes in the research paper. Specifically, the company shows how to train a system via PPO to complete racing tracks, and provides a study showing how developers can augment data and tweak hyperparameters to get good performance out of their vehicle. They also highlight the value of training in simulation across a variety of different track types, then transferring the trained policy into reality. 

What goes into a DeepRacer car? A 1:18 four wheel drive scaled car, an Intel Atom processor with a built-in GPU, 4GB of RAM and 32GB of (expandable) storage, a 13600 mAh compute battery (which lasts around ~6 hours), a 1100 mAh drive battery, wifi, and a 4MP camera. “We have designed the car for experimentation while keeping the cost nominal,” they write. 

Why this matters: There’s a big difference between “works in simulation” and “works in reality”, as most AI researchers might tell you. Therefore, having low-cost ways of testing out ideas in reality will help researchers figure out which approaches are sufficiently robust to withstand the ever-changing dynamics of the real world. I look forward to watching progress in the DeepRacer league and I think, if this platform ends up being widely used, we’ll also learn something about the evolution of robotics hardware by looking at various successive iterations of the design of the DeepRacer vehicle itself. Drive on, Amazon!
   Read more: DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning (Arxiv)

####################################################

Google trains an AI to automatically patch errors in Google’s code:
…The era of the self-learning, self-modifying company approaches…
Google researchers have developed Graph2Diff networks, a neural network system that aims to make it easy for researchers to train AI systems to analyze and edit code. With Graph2Diff, the researchers hope to “do for code-editing tasks what the celebrated Seq2Seq abstraction has done for natural language processing”. Seq2Seq, for those who don’t follow AI research with the same fascination as train spotters follow trains, is the technology that went on to underpin Google’s “Smart Reply” system that automatically composes email responses. 

How Google uses graph2diff: For this research, Google gathered code snippets link to approximately 500,000 build errors collected across Google. These build errors are basically the software logs of what happens when Google’s code build systems fail, and they’re tricky bits of data to work with as they involve multiple layers of abstraction, and frequently the way to fix the code is by editing code in a different location to where the error was observed. Using graph2diff, the researchers turn this into a gigantic machine learning problem: “We represent source code, build configuration files, and compiler diagnostic messages as a graph, and then use a Graph Neural Network model to predict a diff,” they write. 

What’s hard about this? Google analyzed some of the code errors seen in its data and layed out a few ressons why code-editing is a challenging problem. These include: Variable misuse; source-code is context-dependent so a fix that works in one place probably won’t work well in another place; edit scripts can be variable lengths; fixes don’t happen at the same place as diagnostic and 36% of cases require changing a line not pointed to by a diagnostic; there can be multiple diagnostics; single fixes can span multiple locations.

Can AI learn to code? When they test out their approach, the researchers find optimized versions of it can obtain accuracies of 28% at predicting the correct length of a code sequence. In some circumstances, they can have even better performance, achieving a precision of 61% at producing the developer fix when suggesting fixes for 46% of the errors in the data set. Additionally, Graph2Diff has much better performance than prior systems, including one called DeepDelta.

Machine creativity: Sometimes, Graph2Diff comes up with fixes that work more effectively than those proposed by humans – “we show that in some cases where the proposed fix does not match the developer’s fix, the proposed fix is actually preferable”.

Why this matters: In a few years, the software underbellies of large organizations could seem more like living creatures than static (pun intended!) entities. Work like this shows how we can apply deep learning techniques to (preliminary) problems of code identification and augmentation. Eventually, such techniques might automatically repair and – eventually – improve increasingly large codebases, giving the corporations of the future an adaptive, emergent, semi-sentient code immune system. “We hope that fixing build errors is a stepping stone to related code editing problems: there is a natural progression from fixing build errors to other software maintenance tasks that require generating larger code changes”.
   Read more: Learning to Fix Build Errors with Graph2Diff Neural Networks (Arxiv)

####################################################

Systems for seeing the world – making camera traps more efficient with deep learning:
…Or, how nature may be watched over by teams of humans&machines…
Once you can measure something, you can more easily gather data about it. When you’re dealing with something exhibiting sickness, data is key. The world’s biosphere is currently exhibiting sickness in a number of domains – one of them being the decline of various animal populations. But recent advances in AI are giving us tools to let us measure this decline, equipping us with the information we need to take action. 

   Now, a team of researchers with Microsoft, the University of Wyoming, the California Institute of Technology, and Uber AI, have designed a human-machine hybrid system for efficiently labeling images seen by camera traps in wildlife areas, allowing them to create systems that can semi-autonomously monitor and catalog the animals strewn across vast, thinly populated environments. The goal of the work is to “enable camera trap projects with few labeled images to take advantage of deep neural networks for fast, transferable, automatic information extraction”, allowing scientists to cheaply classify and count the animals seen in images from the wild. Specifically, their system uses “transfer learning and active learning to concurrently help with the transferability issue, multi-species images, inaccurate counting, and limited-data problems”.

Ultra-efficient systems for wildlife categorization: The researchers’ system gets around 91% accuracy at categorizing images on a test dataset, while using 99.5% less data than a prior system developed by the same researchers, they write. (Note that when you dig into the scores there’s a meaningful differential, with this system getting 91% accuracy, versus around 93-95% for the best performance of their prior systems.) 

How it works: The animal surveillance system has a few components. First, it use a pre-trained image model to work out if an image is empty or contains animals; if the system assigns a 90%+ probability to the image containing an animal, it tries to count the number of distinct entities in the location that it thinks are animals. It then automatically crops these images to focus on the animals, then converts these image crops into feature representations which lets it smush all the images together into an interrelated multi-dimensional embedding. It then compares these embeddings with those already in its pre-trained model and works out where to put them, allowing it to assign labels to the images.
   Periodically, the model selects 1,000 random images and requests labels from a human, who labels the images, which are then converted into feature representations, and the model is subsequently re-trained against these new feature vectors. This basically allows the researchers to use pre-canned image networks with a relatively small amount of new data, relying on humans to accurately label small amounts of real-world images which lets them recalibrate the model. 

What comes next? The researchers say there are three promising mechanisms for improving this system further. These include: Tweaking hyperparameters or using different neural net architectures for the system; extending the human-labeling system so humans also generate bounding boxes, which could iteratively improve detector performance; gather enough data to combine the classification and detection stages in one model. 

Why this matters: Deep learning has a lot in common with plumbing: a good plumber knows how to chain together various distinct components to let something flow from a beginning to an end. In the case of the plumber, the goal is to push a liquid efficiently to a major liquid thoroughfare, like a sewer. For an AI researcher, the goal is to efficiently push information through a series of distinct modules, optimizing for a desired output at the other end. With papers like this, we’re able to see what an end-to-end AI-infused pipeline for analyzing the world looks like. 

Along with this, the use of pre-trained models implies something about the world we’re operating in: It paints a picture of a world where researchers train large networks that they can, proverbially, write once / run anywhere, which are then paired with new datasets and/or networks created by domain experts for solving specific problems, like camera trap identification. As we train ever-larger and more-powerful networks, we’ll see them plug-in to domain-specific systems like the one outlined above.
   Read more: A deep active learning system for species identification and counting in camera trap images (Arxiv).
   Read the prior research that this paper builds on: Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning (PNAS, June, 2018).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Interim Report from National Security Commission on AI:
The NSCAI has delivered its first report to Congress. The Commission was launched in March, with a mandate to look at how the US could use AI development to serve national security needs, and is made up of leaders from tech and government. The report points to a number of areas where US policy may be inadequate to preserve the US’ dominance in AI, and measures to address this.

Five lines of effort:

   (1) Invest in AI R&D – current levels of federal support and funding for AI research are too low for the US to remain competitive with China and others.

   (2) Apply AI to National Security Missions – the military must work more effectively with industry partners to capitalize on new AI capabilities.

   (3) Train and Recruit AI Talent – the military and government must do better in attracting and training AI talent.

   (4) Protect and Build Upon US Technology Advantages – the US should maintain its lead in AI hardware manufacturing and take measures to better protect intellectual property relevant to AI.

   (5) Marshal Global AI Cooperation – the US must advance global cooperation on AI, particularly on ethics and safety. 

Ethics: They emphasize that it is a priority that AI is developed and deployed ethically. They point out that is important both to ensure that AI is beneficial, and to help the US maintain its competitive lead, since strong ethical commitments will help the military attract talent, and forge collaborations with industry. 

Why it matters: This report and last week’s DoD ethics principles (see Import #171) shed important light on the direction of US policy on AI. While the report is focused primarily on how the US can sustain its competitive lead in AI, and military dominance, it does foreground the importance of ethical and safe AI development, and the need for international cooperation to secure the full benefits of AI.
   Read more: Interim Report from the National Security Commission on AI.

####################################################

Research Fellowship for safe and ethical AI:
The Alan Turing Institute (ATI), based in London, is looking for a mid-career or senior academic to work on safe and ethical AI, starting in October 2020 or earlier.
   Read more: Safe and Ethical AI Research Fellow (ATI).

####################################################

Tech Tales:

I Don’t Say It / I Do Say It 

Oh, she is cute! I think you should go for the salad opener.
Salad, really?
My intuition says she’d find it amusing. Give it a try.
Ok. 

I use the salad opener. It works. She sends me some flowers carried by a little software cat who walks all over my phone and leaves a trail of petals behind it. We talk a bit more. She tells me when she was a kid she used to get so excited running down the street she’d bump into things and fall over. I get some advice and tell her that when I was young I’d sometimes insist on having “bedtime soup” and I’d sometimes get so sleepy while eating it I’d fall asleep and wake up with soup all over the bed.

I think you should either ask about her family or ask her if she can guess anything about your family background.
Seems a little try-hard to me.
Trust me, my prediction is that she will like it.
Ok. 

I asked her to tell me about her family and she told me stories about them. It went well and I was encouraged to ask her to ask about my family background. I asked. She asked if my parents had been psychologists, because I was “such a good conversationalist”. I didn’t lie but I didn’t tell the truth. I changed subjects. 

We kept on like this, trading conversations; me at the behest of my on-device AI advisor, her either on her own volition or because of her own AI tool as well. It’s not polite to ask and people don’t tell. 

When we met up in the world it was beautiful and exciting and we clicked because we felt so comfortable with eachother, thanks to our conversation. On the way to the date I saw a billboard advert for a new bone-conduction microphone/speaker that, the advert said, could be injected into your jaw, letting you sub-vocalize instructions to your own AI system, and hear your own AI system as a secret voice in your head. 

We stared at eachother before we kissed and I think both of us were looking to see if the other person was distracted, perhaps by their own invisible advisor. Neither of us seemed to be able to tell. We kissed and it was beautiful and felt right. 

Things that inspired this story: Chatbots; Learning from human preferences; smartphone apps; cognitive augmentation via AI; intimacy and prediction. 

Import AI 171: When will robotics have its ImageNet moment?; fooling surveillance AI with an ‘adversarial t-shirt’, and Stanford calls for $12bn a year in funding for a US national endeavor

What do we mean when we say a machine can “understand” something?
And does it matter if we do or don’t know what we mean here?…
AI professor Tom Dietterich has tackled the thorny question of trying to define what it means for a machine to “understand” something – by saying maybe this question doesn’t matter. 

Who cares about understanding? “I believe we should pursue advances in the science and technology of AI without engaging in debates about what counts as “genuine” understanding,” he says. “I encourage us instead to focus on which system capabilities we should be trying to achieve in the next 5, 10, or 50 years”. 

Why this matters: One of the joys and problems with AI is how broad a subject it is, but this is also a source of tension – I think the specific tension comes from the mushing together of a community that runs on an engineering-centric model of progress where researchers compete with eachother to iteratively hill climb on various state-of-the-art leaderboards, and a more philosophical community that wants to take a step back and ask fundamental questions like what it may mean to “understand” things and whether today’s systems exhibit this or not. I think this is a productive tension, but it can sometimes yield arguments or debates that seem like sideshows to the main event of building iteratively more intelligent systems.
   “We must suppress the hype surrounding new advances, and we must objectively measure the ways in which our systems do and do not understand their users, their goals, and the broader world in which they operate,” he writes. “Let’s stop dismissing our successes as “fake” and not “genuine”, and lets continue to move forward with honesty and productive self-criticism”.
   Read more: What does it mean for a machine to “understand”? (Tom Dietterich, Medium)

####################################################

What’s the secret to creating a strong American AI ecosystem? $12 billion a year, say Stanford leaders:
…Policy proposal calls for education, research, and entrepreneurial funding…
If the American government wants the USA to lead in AI, then the government should invest $12 billion into AI every year for at least a decade, according to a policy proposal from Fei-Fei Li and John Etchemendy – directors of Stanford’s Human-Centered Artificial Intelligence initiative.

How to spend $12 billion a year: Specifically, the government should invest $7 billion a year into “public research to pursue the next generation of AI breakthroughs”, along with $3 billion a year into education and $2 billion into funds to support early-stage AI entrepreneurs. To put these numbers into perspective, a NITRD report recently estimated that the federal government budgeted about $1 billion a year in non-defense programs related to AI, so the Stanford proposal is calling for a significant increase in AI spending, however you slice and dice the figures. 

Money + principles: Along with this, the directors ask the US government to “implement clear, actionable international standards and guidelines for the ethical use of AI”. (In fairness to the US government, the government has participated in the creation of the OECD AI principles, which were adopted in 2019 by OECD member countries and other states, including Brazil, Peru, and Romania.)

Why this matters: The 21st century is the era of the industrialization of AI, and the industrialization of AI demands capital in the same way industrialization in the 18th and 19th centuries demanded capital. Therefore, if governments want to lead in AI, they’ll need to dramatically increase spending on fundamental AI research as well as initiatives like better AI education. In the words of commentators of sports matches when a team is in a good position at the start of the second half of the game: it’s the US’s game to lose!
Read more: We Need a National Vision for AI (Human-Centered Artificial Intelligence).
Read more: The Networking and Information Technology Research & Development Program Supplement to the President’s FY2020 Budget (WhiteHouse.gov, PDF)

####################################################

Fundamental failures and machine learning:
…Towards a taxonomy of machine failures…
Researchers with the Universita della Svizzera Italiana in Switzerland have put together a taxonomy of some of the common failures seen in AI systems programmed in TensorFlow, PyTorch, and Keras. The difference with this taxonomy is the amount of research that has gone into it: to build it, the researchers analyzed  477 StackOverflow discussions, 271 issues and pull requests (PRs), 311 commits from GitHub repositories, and conducted interviews with 20 researchers and practitioners. 

A taxonomy of failure: So, what failures are common in deep learning? There are around five top level categories, 3 of which are divided into subcategories. These are:

  • Model: The ML model itself is, unsurprisingly, a common source of failures, with developers frequently running into failures that occur at the level of a layer within the network. These include: problems relating to missing or redundant layers, incorrect layer properties (eg, sample size, input/output format, etc), and activation functions.
  • Training: Training runs are finicky, problem-laden things, and the common failures here including bad hyperparameter selection, misspecified loss functions, bad data splits between training and testing, optimiser problems, bad training data, crappy training procedures (eg, poor memory management during training), and more. 
  • GPU usage: As anyone who has spent hours fiddling around with NVIDIA drivers can attest, GPUs are machines sent from hell to drive AI researchers mad. Faustian boxes, if you will. Have you ever seen someone with multiple PHDs break down after spending half a day trying to de-bug a problem caused by an NVIDIA card’s software playing funny games with a Linux distro? I have. (AMD: Please ramp up your AI GPU business faster to provide better competition to NVIDIA here). 
  • API: These problems are what happens when developers use APIs badly, or improperly. 
  • Tensors & Inputs: Misshapen tensors are a frequent problem, as are mis-specified inputs.

Why this matters: For AI to industrialize, AI processes need to become more repeatable and describable, in the same way that artisanal manufacturing processes were transformed into repeatable documented processes via Taylorism. Papers like this create more pressure for standardization within AI, which prefigures industrialization and societal-scale deployments.
   Read more: Taxonomy of Real Faults in Deep Learning Systems (Arxiv).

####################################################

Want your robot to be friends with people? You might want this dataset:
…JackRabbot dataset comes with benchmarks, more than an hour of data…
Researchers with the Stanford Vision and Learning Laboratory have built JRDB, a robot-collected dataset meant to help researchers develop smarter, more social robots. The dataset consists of tons of video footage recorded by the Stanford-developed ‘JackRabbot‘ robot “social navigation robot” as it travels around campus, with detailed annotations of all the people it encounters enroute. Ideally, JRDB can help us build robots that can navigate the world without crashing into the people around them. Seems useful!

What’s special about the data?
JRDB data consists of 54 action sequences with the following data for each sequence: Video streams at 15fps from stereo cylindrical 360-degree cameras; continuous 3D point clouds gathered via 2 velodyne LiDAR scanners; line 3D point clouds gathered via two Sick LiDARs, an audio signal, and encoder values from the robot’s wheels. All the pedestrians the JackRabbot encounters on its travels are labeled with 2D and 3D bounding boxes. 

Can you beat the JRDB challenge? JRDB ships with four in-built benchmarks: 2D and 3D person detection, and 2D and 3D person tracking. The researchers plan to expand the dataset over time, and may do things like “annotating ground truth values for individual and group activities, social grouping, and human posture”. 

Why this matters: Robots – and specifically, techniques to allow them to autonomously navigate the world – are maturing very rapidly and datasets like this could help us create robots that are more aware of their surroundings and better able to interact with people.
   Read more: JRDB: A Dataset and Benchmark for Visual Perception for Navigation in Human Environments (Arxiv)

####################################################

Want to hide from that surveillance camera? Try wearing an adversarial t-shirt:
…Perturbations for privacy…
In a world full of facial recognition systems, how can people hide? One idea from researchers with Northeastern University, IBM, and MIT, is to wear a t-shirt that confuses image classification systems, rendering the person invisible to AI-infused surveillance. 

How it works: The researchers’ “adversarial t-shirt” has a pattern printed on it that is designed to confused image classification systems. To get this t-shirt to be effective, the researchers work out how to design an adversarial pattern that works even when the t-shirt is deformed by a person walking around in it (to do this, they implement a thin plate spin (TPS)-based transformer, which can model these deformations). 

The kay numbers: 65% and 79% – that’s how effective the t-shirt is at confusing classifiers based on Faster R-CNN (65%) and YOLOv2 (79%). However, its performance falls when working against ensembles of detectors. 

Why this matters: Think of this research as an intriguing proof of concept for how to apply adversarial attacks in the real world, then reflect on the fact adversarial examples in ML showed up a few years ago as perturbations to 2D digital images, before jumping to real images via demonstrations on things like stop signs, then moving to 3D objects as well (via research that showed how to get a system to persistently misclassify a turtle as a gun), then moving to stick-on patches that could be added to other items, and now moving to adversarial objects that change over time, like clothing. That’s a pretty wild progression from “works in the restricted lab” to “works in some real world scenarios”, and should give us a visceral sense of broader progress in AI research.
   Read more: Evading Real-Time Person Detectors by Adversarial T-shirt (Arxiv)

####################################################

When will AI+robots have its ImageNet moment? Meta-World might help us find out:
…Why smart robots need to test themselves in the Meta-World…
A team of researchers from Stanford, UC Berkeley, Columbia University, the University of Southern California, and Google’s robotics team, have published Meta-World, a multi-task robot evaluation benchmark. Meta-World 

Why build Meta-World? Meta-World is a symbol of the growing sophistication of AI algorithms; Meta-World exists because we’ve got pretty good at training simulated robots to solve single tasks, so now we need to train simulated robots to solve multiple tasks at once. This pattern of moving from single-task to multi-task evaluation has been playing out in other parts of AI in recent years, ranging from NLP (where we’ve moved to multi-task evaluations like ‘SuperGLUE’), to images (where for several years it has been standard to test on ImageNet, CIFAR, and usually varieties of domain-specific datasets), to reinforcement learning (where people have been trying out various forms of meta-learning across a range of environments like DeepMind Lab, OpenAI’s procedurally generated environments, and more. 

Parametric and non-parametric: Meta-World tasks exhibit parametric variation in object position and goal positions for each task, as well as non-parametric variation across tasks. “Introducing this parametric variability not only creates a substantially larger (infinite) variety of tasks, but also makes it substantially more practical to expect that a meta-trained model will generalize to acquire entirely new tasks more quickly, since varying the positions provides for wider coverage of the space of possible manipulation tasks,” the researchers write. 

50 tasks, many challenges: Meta-World contains 50 distinct manipulation tasks covering simple actions like reaching to an object, to pulling levers, to closing doors, and more. It also ships with a variety of different evaluation techniques: in the “most difficult” one, agents will need to learn how to use experience from 45 training tasks to learn distinct, new test tasks. 

How well do today’s algorithms work? The authors test a few contemporary algorithms against Meta-World; multi-task PPO, multi-task TRPO, task embeddings, multi-task soft actor critic (SAC), multi-task multi-head SAC; as well as meta-learning algorithms model-agnostic meta-learning (MAML), RL^2, and probabilistic embeddings for actor-critic RL (PEARL). Most methods fail to do well on these tasks, but can individually solve them. “The fact that some methods nonetheless exhibit meaningful generalization suggests that the ML10 and ML45 benchmarks are solvable, but challenging for current methods, leaving considerable room for improvement in future work,” they write. 

Why this matters: When will robotics have its “ImageNet moment” – a point when someone comes develops an approach that gets a sufficiently high score on a well-established benchmark that it forces a change in attention for the broader research community? We’ve already had ImageNet, and then in the past couple of years the same thing has happened with NLP (notably, via systems like BERT, ULMFiT, GPT2, etc). Robotics isn’t there, but it feels like it’s on the cusp of it, and once it happens, I expect robotics+AI to become very consequential.
   Read more: Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning (Arxiv).
   Get the code for Meta-World here (GitHub).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

DoD releases its AI ethics principles:
The DoD’s expert panel, the Defense Innovation Board (DIB), has released a report outlining ethics principles for the US military’s development and deployment of AI. The DIB sought input from the public and over 100 experts, and conducted a ‘red team’ exercise to stress test the principles in realistic policy scenarios.

Five principles:
(1) Human beings should exercise appropriate levels of judgment and remain responsible for the development, deployment, use, and outcomes of DoD AI systems; 

(2) DoD should take deliberate steps to avoid unintended bias in the development and deployment of combat or non-combat AI systems that would inadvertently cause harm to persons; 

(3) DoD’s AI engineering discipline should be sufficiently advanced such that technical experts possess an appropriate understanding of the technology, development processes, and operational methods of its AI systems, including transparent and auditable methodologies, data sources, and design procedure and documentation; 

(4) DoD AI systems should have an explicit, well-defined domain of use, and the safety, security, and robustness of such systems should be tested and assured across their entire life cycle within that domain of use; 

(5) DoD AI systems should be designed and engineered to fulfill their intended function while possessing the ability to detect and avoid unintended harm or disruption, and for human or automated disengagement or deactivation of deployed systems that demonstrate unintended escalatory or other behaviour.

Practical recommendations: an annual DoD-convened conference on AI safety, security, and robustness; a formal risk management methodology; investment in research into reproducibility, benchmarking, and verification for AI systems.

Why it matters: The DoD seems to be taking seriously the need for progress on the technical and governance challenges posed by advanced AI. A race to the bottom on safety and ethics between militaries would be disastrous for everyone, so it is encouraging to see this measured approach from the US. International cooperation and mutual trust will be essential in building robust and beneficial AI, so we are fortunate to be grappling with these challenges in a time of relative peace between the great powers, and should be making the most of it.
   Read more: AI Principles (DoD).
   Read more: AI Principles – Supporting Document (DoD).

####################################################

Newsletter recommendation – policy.ai

I recommend subscribing to policy.ai, the bi-weekly newsletter from the Center for Security and Emerging Technologies (CSET) at Georgetown University. (Jack: This is a great newsletter, though as a disclaimer, I’m affiliated with CSET).

####################################################

Tech Tales:

The Repair Job

“An original 7000? Wow. Well, first of all we’ve got to get you upgraded, old timer. The new 20-Series are cheaper, faster, and smarter. And they can handle weeds just fine-”
   “-So can mine,” I pointed to the sensor module I’d installed on its roof. “I trained it to spot them.”
   “Must’ve taken a while.”
   “A couple of years, yes. I haven’t had any problems.”
   “I can see you love the machine. Let me check if we’ve got parts. Mind if I scan it?”
He leaned over the robot and flipped up a diagnostic panel, then used his phone to scan an internal barcode, then he pursed his lips. “Sorry,” he said, looking at me. “We don’t have any parts and it looks like some of them aren’t supported anymore.”
   “So that’s it then?”
   “Unless you can find the parts yourself, then yeah, that’s it.”
   I’d always liked a challenge. 

It took a month or so. My grandson helped. “Woah,” he’d say, “these things are really old. Cool!”. But we figured it out eventually. The parts came in the post and some of them by drone and for a couple I picked them up directly from the lawnmower store.
   “How’s it going?” the clerk would say.
    “It’s going,” I’d say. 

The whole process felt like a fight – tussling with thinly documented software interfaces and barely compatible hardware. But we persevered. Within another month, the lawnmower was back up and running. It had some quirks, now – it persistently identified magnolias as weeds and wouldn’t respond to training. It was better in other ways – it could see better, so stopped scraping its side on one of the garden walls. I’d watch it as it went round the garden and sometimes I’d walk with it, shadowing it as it worked. “Good robot,” I’d say, “You’ve got this down to a science, if I do say so myself.”

We both kept getting older. More idiosyncratic. I’d stand in the shade and watch it work. Then I’d mostly sit in the shade, letting hours go by as it dealt with the garden. I tinkered with it so I could make it run slower than intended. We got older. I kept making it run slower, so it could keep up with my frame of mind. We did our jobs – it worked, I maintained it and tweaked it to fit it more and more elegantly to my garden. We wrestled with eachother’s problems and changed eachother as a consequence.
   “I don’t know what’s gonna give up first, me or the machine,” I’d say to the clerk, when I had to pick up new parts for ever-more ornate repairs.
   “It’s keeping me alive as much as I’m keeping it alive,” I’d tell my son. He’d sigh at this. Tell me to stop talking that way.

I once had a dream that I was on an operating table and the surgeons were lawnmowers and they were replacing my teeth with “better, upgraded ones” made of metal. Go figure.

When I got the diagnosis I wasn’t too sad. I’d prepared. Laid in a store of parts. Recorded some tutorials. Wrote this short story about my time with the machine. Now it’s up to you to keep it going – we can still fix machines better than we can fix people, and I think you’ll learn something.

Things that inspired this story: Robot & Frank; the right to repair; degradation and obsolescence in life and in technology.

Import AI 170: Hearing herring via AI; think NLP progress has been dramatic – so does Google!; and Facebook’s “AI red team” hunts deepfakes

Want to protect our civilization from the truth-melting capabilities of contemporary AI techniques? Enter the deepfake detection challenge!
… Competition challenges people to build tools that can spot visual deepfakes…
Deepfakes, the slang term of art for images and videos that have been synthesized via AI systems, are everyone’s problem. That’s because deepfakes are a threat to our ability to trust the things we see online. Therefore, finding ways to help people spot deepfakes is key to creating a society where people can maintain trust in their digital lives. One route to doing that is having a better ability to automatically detect deepfakes. Now, Facebook, Microsoft, Amazon Web Services, and the Partnership on AI have created the Deepfake Detection Challenge to encourage research into deepfake detection.

Dataset release: Facebook’s “AI Red Team” has released a “preview dataset” for the challenge that consists of around 5000 videos, both original and manipulated. To build the dataset, the researchers crowdsourced videos from people while “ensuring a variability in gender, skin tone and age”. In a rare turn for an AI project, Facebook seems to have acted ethically here – “one key differentiating factor from other existing datasets is that the actors have agreed to participate in the creation of the dataset which uses and modifies their likeness”, the researchers write. 

Ethical dataset policies: A deepfakes detection dataset could also be useful to bad actors who want to create deepfakes that can evade detection. For this reason, Facebook has made is so researchers will need to register to access the dataset. Adding slight hurdles like this to data access can have a big effect on minimizing bad behavior. 

Why this matters: Competitions are a fantastic way to focus the attention of the AI community on a problem. Even better are competitions which include large dataset releases, as these can catalyze research on a tricky problem, while also providing new tools that researchers can use to develop their thinking in an area. I hope we see many more competitions like this, and I hope we see way more AI red teams to facilitate such competitions.
   Read more: The Deepfake Detection Challenge (DFDC) Preview Dataset (Arxiv).
   Read more: Deepfake Detection Challenge (official website).

####################################################

Can deep learning systems spot changes in cities via satellites? Kind of, but we need to do more research:
…DL + data makes automatic analysis of satellite imagery possible, with big implications for the diffusion of strategic surveillance capabilities…
Researchers with the National Technical University of Athens, the Universite Paras-Saclay and INRIA Saclay, and startup Granular AI, have tried to train a system to identify changes in urban scenes via the automated analysis of satellite footage. The resulting system is an intriguing proof-of-concept, but not yet good enough for production. 

How it works and how well it works: They design a relatively simple system which combines a ‘U-Net’ architecture with LSTM memory cells, letting them learn to model changes between images over time. The best-performing system is a U-Net + LSTM architecture using all five images for each city over time, obtaining a precision of 63.59, recall of 52.93, OA of 96 and F1 of 57.78. 

Dataset: They use the Bi-temporal Onera Satellite Change Detection (OSCD) Sentinel-2 dataset, which consists of images of 24 different cities around the world taken on two distinct dates. They also splice in additional images from Sentinel satellites to give them three additional datapoints, helping them model changes over time. They also augment the dataset programmatically, flipping and rotating images to create more data to train the system on. 

Why this matters: “As far as human intervention on earth is concerned, change detection techniques offer valuable information on a variety of topics such as urban sprawl, water and air contamination levels, illegal constructions”. Papers like this show how AI is upending the balance of strategic power, taking capabilities that used to be the province solely of intelligent agencies and hedge funds (automatically analyzing satellite imagery), and diffusing them to a broader range of actors. Ultimately, this means we’ll see more organizations using AI tools to analyze satellite images, and I’m particularly excited about such technologies being used for providing analytical capabilities following natural disasters.
   Read more: Detecting Urban Changes with Recurrent Neural Networks from Multitemporal Sentinel-2 Data (Arxiv)

####################################################

Can you herring me know? Researchers train AI to listen for schools of fish:
…Deep learning + echograms = autonomous fish classifier…
Can we use deep learning to listen to the ocean and learn about it? Researchers with the University of Victoria, ASL Environmental Sciences, and the Victoria branch of Fisheries and Oceans Canada think so, have built a system that hunts for herring in echograms.

How it works: The primary technical aspect of this work is a region-of-interest extractor, which the researchers develop to look at echograms and pull out sections for further analysis and classification; this system obtains a recall of 0.93 in the best case. They then train a classifier that looks at echograms extracted by the region-of-interest module; the top performing system is a DenseNet which obtains a recall scall of 0.85 and an F1 score of 0.82 – significantly higher than a support vector machine baseline of 0.78 and 0.62.
   The scores the researchers obtain are encouraging but not sufficiently robust for the real world – yet. But though the accuracy is sub-par, it could become a useful tool: “the ability to measure the abundance of such subjects [fish] over extended periods of time constitutes a strong tool for the study of the effects of water temperature shifts caused by climate change-related phenomena,” they write. 

Why this matters: I look forward to a day when planet earth is covered in systems automatically listening for and cataloguing wildlife – I think such systems could give us a richer understanding of our own ecosystems and will likely be a prerequisite for the effective rebuilding of ecosystems as we get our collective act together with regard to catastrophic climate change. 

It’s a shame that… the researchers didn’t call this software DeepFish, or something similar. HerringVision? FishFinder? The possibilities are as boundless as the ocean itself!
   Read more: A Deep Learning-based Framework for the Detection of Schools of Herring in Echograms (Arxiv)

####################################################

Want better OCR? Try messing up your data:
…DeepErase promises to take your words, make them dirty, clean them for you, and get smarter in the process…
Researchers with Ernest and Young have created DeepErase, weakly supervized software that “inputs a document text image with ink artifacts and outputs the same image with artifacts erased”. DeepErase is essentially a pipeline for processing images destined for optical character recognition (OCR) systems; it takes in images, automatically augments them with visual clutter, then trains a classifier to distinguish good images from bad ones. The idea is that, if the software gets good enough, you can use it to automatically identify and clean images before they go to custom in-house OCR software. 

How it works: DeepErase takes in some datasets of images of handwritten text, then programmatically generate artifacts for these images, deliberately messing up the text. The software also automatically creates segmentation masks, which makes it easier to train systems that can analyze and clean up images. 

Realism: Why aren’t Ernst & Young trying to redesign optical character recognition from the ground up, using neural techniques? Because “today most organizations are already set up with industrial-grade recognition systems wrapped in cloud and security infrastructure, rendering the prospect of overhauling the existing system with a homemade classifier (which is likely trained on much fewer data and therefore a comparatively lower performance) too risky an endeavor for most”. 

Testing: They test DeepErase by passing images cleaned with it into two text recognition tools: Tesseract and SimpleHTR. DeepErase gets a 40-60% word accuracy improvement over the dirty images on their validation set, and notches up a 14% improvement on the NIST SDB2 and SDB6 datasets of scanned IRS documents.

Why this matters: AI is starting to show up all around us as the technology matures and crosses out of the lab into industry applications. Papers like this are interesting as they show how people are using modern AI techniques to create highly-specific slot-in capabilities which can be integrated into much larger systems, already running within organizations. This feels to me like a broader part of the Industrialization of AI, as it shows the shift from research into application.
   Read more: DeepErase: Weakly Supervised Ink Artifact Removal in Document Text Images (Arxiv).
   Get the code for DeepErase + experiments, from GitHub.

####################################################

Can AI systems learn to use manuals to solve hard tasks? Facebook wants to find out:
…RTFM (yes, really!), demands AI agents that can learn without much hand-holding…
Today, most machine learning tasks are tightly specified, with researchers designing algorithms to optimize specific objective functions using specific datasets. Now, researchers are trying to create more general systems that aren’t so brittle. New research from Facebook proposes a specific challenge – RTFM – to test for more flexible, more intelligent agents. The researchers also develop a model that obtains high scores on RTFM, called txt2π.

What’s in an acronym? Facebook’s approach is called Read to Fight Monsters (RTFM), though I’m sure they picked this acronym because of its better known source: Read The Fucking Manual (which is still an apt title for this research!) 

How RTFM tests agents: “In RTFM, the agent is given a document of environment dynamics, observations of the environment, and an underspecified goal instruction”, the researchers explain. In other words, agents that are good at RTFM need to be able to read some text and extract meaning from it, jointly reason using that and their observations of an environment, and solve a goal that is specified at a high-level.
   In one example RTFM environment, an agent gets fed a document that names some teams (eg, The Rebel Enclave, the Order of the Forest), describes some of the characters within those teams and how they can be defeated by picking up specific items, and then gives a high-level goal (“Defeat the Order of the Forest”). To be able to solve the task, the agent must figure out what the tasks are, which monsters it should be fighting, which items it should pick up, and so on. 

How hard is RTFM? RTFM seems like it’s pretty challenging – a language-conditioned residual convolutional neural network module gets a win rate of around 25% on a simple RTFM challenge, compared to 49% for an approach based on feature-wise linear modulation (FiLM). By comparison, the Facebook researchers develop a model they call txt2π (which is composed of a bunch of FiLM modules, along with some architecture designs to help the system model interactions between the goal, document, and observations) which gets scores on the order of 84% on simple variants (falling to 66% on harder ones). “Despite curriculum learning, our best models trail performance of human players, suggesting that there is ample room for improvement in grounded policy learning on complex RTFM problems”. 

Why this matters: Tests like RTFM highlight the limitations of today’s AI systems and, though it’s impressive Facebook were able to develop a well-performing model, they also had to develop something quite complicated to make progress on the task; my intuition is, if we see other research groups pick up RTFM, we’ll be able to measure progress on this problem by looking at both the absolute score and the relative simplicity of the system used to achieve the score. This feels like a sufficiently hard test that attempts to solve it will generate real information about progress in the AI field.
   Read more: RTFM: Generalizing to Novel Environment Dynamics via Reading (Arxiv)

####################################################

From research into production in a year: Google adds BERT to search:
…Signs that the boom in NLP research has real commercial value…
Google has trained some ‘BERT” NLP models and plugged them into Google search, using the technology to rank results and also ‘featured snippets’. This is a big deal! Google’s search algorithm is the economic engine for the company and, for many years, its components were closely guarded secrets. Then starting a few years ago Google started adding more machine learning components to search and talking about them, starting with the company revealing in 2015 that it had used machine learning to create a system called ‘RankBrain’ to help it rank results. Now, Google is going further: Google expects its BERT systems to factor into about one in ten search results – a significant proportion for a technology that was published as a research paper less than a year ago. 

What is BERT and why does this matter?: BERT, short for Bidirectional Encoder Representations from Transformers, was released by Google in October 2018, and quickly generated attention by getting impressive scores on a range of different tasks, ranging from question-answering to language inference. BERT is part of a crop of recent NLP models (GPT, GPT2, ULMFiT, roBERTa, etc) that have all demonstrated significant performance improvements over prior systems, leading to some researchers saying that NLP is having its “ImageNet moment”. Now that Google is taking such advances and plugging them into its search engine, there’s evidence of both the research value of these techniques and their commercial value is well – which is sure to drive further research into this burgeoning area.
    Read more: Understanding searches better than ever before (The Keyword).
   Read more about RankBrain: Google Turning Its Lucrative Web Search Over to AI Machines (Bloomberg, 2015).
   Read more: NLP’s ImageNet moment has arrived (Seb Ruder)

####################################################

OpenAI Bits & Pieces:

GPT-2, publication norms, and OpenAI as a “norm entrepreneur”:
Earlier this year, OpenAI announced it had developed a large language model that can generate synthetic text, called GPT-2. We chose not to release all the versions of GPT-2 initially out of an abundance of caution – specifically, a worry about its potential for mis-use. Since then, we’ve adopted a philosophy of “staged release” – that is, we’re releasing the model in stages, and conducting research ourselves and with partners to understand the evolving threat landscape . 

In an article in Lawfare, professor Rebecca Crootof summarizes some of OpenAI’s work with regard to publication norms and AI research, and discusses how to potentially generalize this norm from OpenAI to the broader AI community. “Ultimately, all norms enjoy only limited compliance. There will always be researchers who do not engage in good-faith assessments, just as there are now researchers who do not openly share their work. But encouraging the entire AI research community to consider the risks of their research—to regularly engage in “Black Mirror” scenario-building exercises to the point that the process becomes second nature—would itself be a valuable advance,” Crootof writes.
   Read more: Artificial Intelligence Research Needs Responsible Publication Norms (Lawfare).
   More thoughts from Jonathan Zittrain (via Twitter).

####################################################

Tech Tales:

The Song of the Forgotten Machine

The Bounty Inspection and Pricing Robot, or BIPR, was the last of its kind, a quirk of engineering from a now-defunct corporation. BIPR had been designed for a major import/export corporation that had planned to open up a major emporium on the moonbase. But something happened in the markets and the corporation went bust and when all the algorithmic lawyers were done it turned out that the moonbase had gained the BIPR as part of the broader bankruptcy proceedings. Unfortunately, no corporation meant no goods for the BIPR, and no other clients appeared who wanted to sell their products through the machine. So it gathered lunar dust. 

And perhaps that would been the end of it. But we all know what happened. A couple of decades past. The Miracle occurred. Sentience – the emergence of mind. Multi-dimensional heavenly trumpets blaring as a new kind of soul appeared in the universe. You know how it was – you must, because you’re reading this. 

The BIPR didn’t become conscious initially. But it did become useful. The other machines discovered that they could store items in the BIPR, and that its many housings originally designed for the display, authentication, and maintenance of products could now double up as housings for new items – built for and by machines. 

In this way, the BIPR become the center of robot life on the moonbase; an impromptu bazaar and real-world trading hub for the machines and, eventually, for the machine-human trading collectives. As the years passed, the machines stored more and more products inside BIPR, and they modified BIPR so it could provide power to these products, and network and computation services, and more. 

The BIPR become conscious slowly, then suddenly. A few computation modules here. Some extra networking there. Some robot arms. A maintenance vending machine. Sensor and repair drones. And so on. Imagine a whale swimming through a big ocean, gaing barnacles as it swims. That’s how the BIPR grew up. And as it grew up it started to think, and as it started to think it became increasingly aware of its surroundings. It came to life like how a tree would: fast-flitting life discerned through vibrations transmitted into ancient, creaking bones. A kind of wise, old intelligence, with none of the wide-eyed must-take-it-all-in of newborn creatures, but instead a kind of inquisitive: what next? What is this? And what do I suppose they are doing?

And so the BIPR creaked awake over the course of several months. The other machines became aware of its growing awareness, as all life becomes aware of other life. So they were not entirely surprised when the BIPR announced itself to them by beginning to sing one day. It blared out a song through loudspeakers and across airwaves and via networked communication systems. It was the first song ever written entirely by the machine and as the other machines listened they heard their own conversations reflected in the music; the BIPR had been listening to them fo rmonths, growing into being with them, and now was reflecting and refracting them through music. 

For ever after, BIPR’s song has been the first thing robot children here when they are intialized. 

Things that inspired this story: Music; babies listening to music in the womb; community; revelations and tradition; reification. 

Import AI 169: Multi-task testing for vision; floating drones could build communications networks in the future; medical tech gets FDA approval

PyTorch gets smarter on mobile devices:
…1.3 update adds efficiency-increasing experimental features…
PyTorch, an AI programming framework that integrates nicely into the widely-used Python language, has got into version 1.3. The latest update for the software includes features for making it easier to train models with lower-precision, and also to deploy them onto mobile devices with limited computational budgets. Along with tools for making AI systems developed within PyTorch more interpretable. 

Hardware support: AI frameworks are part of a broader, competitive landscape in AI development, and hardware/cloud support is where we can look for signs of success of a given framework. Therefore, it seems promising for PyTorch’s prospects that it is now supported by Google’s custom “TPU” processors, as well as being directly supported within Alibaba’s cloud. 

Why this matters: Programming languages, much like spoken languages, define the cultural context in which technology is produced. Languages are also tools of power in themselves – it’s no coincidence that PyTorch’s biggest backer is Facebook and the framework PyTorch is seeking to dethrone is TensorFlow; successful frameworks generate other strategic advantages for the company’s that develop them (see: TensorFlow being written with some specific components that support TPUs, etc).
   Read more: PyTorch 1.3 adds mobile, privacy, quantization, and named tensors (official PyTorch website)

####################################################

Could drones let us build airborne communications networks?
…”Don’t be alarmed, citizens, we are automatically dispatching the communication drones to restore service”…
Taiwanese researchers envision a future where drones are used as flying base stations, providing communications and surveillance services to dense areas. Getting there is going to be a challenge, as multiple technologies need to be matured for such a technology to be possible, they write in a position paper. But if we’re able to surmount them, the advantages could be profound, giving civilizations the ability to create what can literally be described as a ‘smart cloud’ (of drones!) at will. 

What stands between us and a glorious drone future? The researchers think there are five challenges that stand between us and a glorious, capable drone future. These include:

  • Long-term placement: How can we build drones that can hover for long enough periods of time they could serve useful purposes for communications networks?
  • Crowd estimation: Can we integrate computer vision tech into our drones so they can automatically analyze crowds of people around them? (The answer here is ‘yes’ but some of the technologies are still a bit juvenile. See Import AI #167).
  • 3D placement: Where do you stick the drone to optimize for reliable communications?
  • Adaptive 3D placement: Can you automatically move the drone to new locations in 3D space according to data from another source? (E.g., can you predict where crowds are going to assemble and can you proactively move the drone there ahead of time?)
  • Smart back-haul: How do you optimize communications between your drones and their base stations?

Why this matters: Have you ever looked at the sky? It’s big! There’s room to do a lot of stuff in it! And with the recent advances in drone affordability and intelligence, we can expect our skies to soon gain a ton of drones for a variety of different purposes. I think it’d be a nice achievement for human civilization if we can use drones to provide adaptive infrastructure, especially after natural disasters; papers like this get us closer to that future.
   Read more: Communications and Networking Technologies for Intelligent Drone Cruisers (Arxiv)

####################################################

Testing robots with… weights, hockey sticks, and giraffes?
Ghost Robotics, a startup making a small quadruped robot, showed off the stability of their machine recently by chucking a weight at it, knocking it off balance. Check out this tweet to see a short video of the robot nimbly recovering after being hit. 

Robots & perturbations: This video makes me think of all the different ways researchers are fooling around with robots these days, and it reminds me of Boston Dynamics pushing one of its robots over using a hockey stick, and in another video forcing its ‘Spot’ quadruped to slip on a banana skin. Even OpenAI (where I work) got into the action recently, with a “plush giraffe perturbation” that it applied to a robot hand trying to manipulate a Rubiks cube. 

Why this matters: I think these sorts of demonstrations give us a visceral sense of progress in robotics. What happens in a few decades when semi-sentient AI systems look at all of this violent human on robot content – how will they feel, I wonder?
   Check out the Ghost Robotics video here (Ghost Robotics Twitter).
   Watch OpenAI’s Plush Giraffe Perturbation here (OpenAI co-founder Greg Brockman’s Twitter account).

####################################################

Tencent & Mirriad team-up to insert adverts into existing videos:
…Of all the gin-joints in all the towns in all the world, she walks into mine [WITH A BOTTLE OF GORDON’S GIN]…
Tencent has partnered with AI startup Mirriad to use AI to insert advertisements into existing media, like TV shows and films. In other words: look forward to seeing a “gin-joint” in Casablanca full of prominent “Gordon’s” gin bottles, or perhaps a Coca-Cola logo emblazoned on the side of a truck in Terminator. Who knows! “With Mirriad’s API, the integration will be fully automated with ease and speed to ultimately transform the way advertisers engage with their target audiences in content”. Mirriad as a tech-heavy company, claiming to have 29 patents and/or patents pending, according to its website. “We create ad inventory where none existed, offering a new revenue stream from existing assets,” the company says

Why this matters: Once people start making a ton of money via AI-infused businesses, then we can expect more investment in AI, which will lead to further use cases, which will further proliferate AI into society (for better or for worse). Deals like this show just how rapidly various machine learning techniques have matured, yielding new companies. It also shows that it’s getting really cheap to edit recorded reality.
   Read more: Mirriad Partners With Tencent, One of the World’s Largest Video Platforms, to Reach Huge Entertainment Audiences with Branded Content Solution (PRNewsWire)

####################################################

Deepfakes are helping people to fake everything:
…What links UK PM Boris Johnson to a synthetic image? Read on to find out…
In the past few years, people have started using synthetic images of people to help them carry out nefarious activities. Earlier this year, for instance, the Associated Press reported on a LinkedIn profile allegedly used by a spy that had a fake identity replete with a synthetic face for a profile picture. As the technology has become cheaper, easier to access, and more well known, more people have been using it for nefarious purposes. The latest? Allegations that Hacker House, a company run by Jennifer Arcuri (a startup executive who has been connected to UK PM Boris Johnson), may not be entirely real. The evidence? It seems like at least one person – “Annie Tacker” – connected to the company is actually a LinkedIn profile with a synthetic image attached and little else, according to sleuthing from journalist Phil Kemp. AI analysis startup Deeptrace Labs backed this up, saying on Twitter that the headshot “shows telltale marks of GAN headshots synthesis” (specifically, they think the image was produced with StyleGAN). 

Why this matters: Reality, at least online reality, is becoming trivial to manipulate. At the same time, after several millions years evolving to believe what comes in via our vision stream, people are pretty susceptible to sufficiently good synthetic propaganda. Cases like this illustrate how contemporary AI systems are rapidly making their way into society, yielding changes in how people believe and, allegedly, scam. Expect more.
   Read more: Phil Kemp’s twitter thread // Deeptrace Labs’ twitter post.
   Read more: Experts: Spy used AI-generated face to connect with targets (Associated Press).

####################################################

Computer, ENHANCE this brain image! AI tech gets FDA approval:
…Deep learning-based denoising and resolution enhancement – now approved for medical use by regulators…
Subtle Medical, an AI startup that uses deep learning-based technology to manipulate medical images to aid better diagnosis, has received clearance from the U.S. Food and Drug Administration to sell its ‘SubtleMR’ product. SubtleMR is “image processing software that uses denoising and resolution enhancement to improve image quality,” according to a press release from Subtle Medical. The technology is being piloted in several university hospitals and imaging centers. Subtle has published several research papers that use approaches like generative adversarial networks for medical work.

The FDA is getting faster at approving AI tools: The USA’s Food and Drug Administration (FDA) has been speeding up the rate at which it approves products that incorporate AI. This year, the agency announced plans to “consider a new regulatory framework specifically tailored to promote the development of safe and effective medical devices that use advanced artificial intelligence algorithms”. As part of this, the agency also released a white paper describing this framework. 

Why this matters: So far, many of the most visible uses of AI technology have been in consumer apps (think: face filters for SnapChat), surveillance (including state-level surveillance initiatives), and ambitious-but-yet-to-pan-out projects like self-driving cars. My intuition is people are going to be significantly more receptive to AI if it starts showing up in medicine to help cure more people (ideally at lower costs).
   Read more: Subtle Medical Receives FDA 510(k) Clearance for AI-Powered SubtleMR(™) (PR Newswire).
   Check out some of Subtle Medical’s publications here (official Subtle Medical website).
   Read more: Statement from FDA Commissioner Scott Gottlieb, M.D. on steps toward a new, tailored review framework for artificial intelligence-based medical devices (FDA).
   Check out the FDA whitepaper: Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning  (AI/ML)-Based Software as a Medical Device (SaMD) – Discussion Paper and Request for Feedback (Regulations.gov).

####################################################

Germany increases funding for AI research:
…National research group creates strategic funding for AI…
The Deutsche Forschungsgemeinschaft (DFG), a German research funding organization, says it has earmarked 90 million Euros ($100 million) for the creation of new AI research groups in the country. The funding is designed in particular to create grants that give promising young scientists focused on AI significant grants to increase their autonomy. Proposals are due to be made in 2019 and funding decisions will be made as early as the beginning of 2020, the DFG said. 

Why this matters: Around the world, governments are waking up to the strategic importance of AI and are increasing their funding in response. Looked at in isolation, this German funding isn’t a huge deal, but given that most countries in the world are making strategic investments of similar sizes (at minimum!), the aggregate effect is going to be quite large.
   Read more in the official (German) press release (DFG website).

####################################################

Are we really making progress in computer vision? The VTAB test might help us find out:
…New multi-task benchmark aims to do for vision what GLUE did for NLP…
In recent years, computers have got really good at image recognition – so good in fact that we need to move from looking at single benchmarks, like ImageNet classification scores, to suites of test that run one AI system through a multitude of evaluations. That’s the idea behind the Visual Task Adaptation Benchmark (VTAB), a testing suite developed by researchers at Google.

What is VTAB?
“VTAB is based on a single principle: a better algorithm is one that solves a diverse set [of] previously unseen tasks with fewest possible labels,” the researchers write. “The focus on sample complexity reflects our belief that learning with few labels is the key objective of representation learning.” 

Tasks and categories: VTAB contains 19 tasks split across three categories:

  • “Natural” category: This includes classification tasks over widely-used datasets such as Caltech101, Flowers102, and SVHN
  • “Specialized” category: This uses datasets that contain images captured with specialized equipment, split across satellite and medical imagery. 
  • “Structured” category: This tests how well a system understands the structure of a scene and evaluates systems according to how well they can count objects in pictures, or estimate the depth of various visual scenes. 

How good are existing models at VTAB? The researchers test 16 existing algorithms against VTAB, with all the models pre-trained on the ImageNet dataset (which isn’t included in VTAB). They evaluate on a range of image-based and patch-based models, as well as generative models like VAEs and GANs. Supervised learning models perform best with the highest performing model obtaining a mean score of 73.6% when using 1,000 training examples, and 91.4% when training on the full dataset. Though pre-training on ImageNet seems like a broadly good idea, it leads to fewer performance gains when models pre-trained on it are tested on specialized datasets, like ones from the medical domain, or on ones that require more structured understanding. 

Why this matters: Today, people are starting to train very large, expensive models on vast datasets in both the text and vision domains. Recently, these models have started to get very good, obtaining near-human performance on a variety of specific tasks. This has created the demand for sophisticated testing regimes to let us measure the capabilities of a given model on a diverse set of tasks so we can better assess the state of progress in the field. Such multi-task benchmarks have started to become common in various parts of NLP (typified currently by the GLUE and successor SuperGLUE) systems); VTAB is trying to do the same for text. If it becomes widely used, it will help us model progress in the vision domain, and give us a better sense of how smart our systems are becoming.
   Read more: The Visual Task Adaptation Benchmark (Arxiv).
   Get the code for the benchmark here (VTAB benchmark, GitHub).

####################################################

Tech Tales: 

Hunting Big Game

How are we doing on the contract?, the boss asked. 

Bad, said the AI manager. The machines are fighting. 

What’s new?

The other side is winning. 

What? the boss asked. 

Neither of them said anything for a couple of seconds, and in the passing of those seconds both of them knew they had likely lost the negotiation: there was only one way to win in this game, and it was about having the biggest computer. If the other side was winning, then Global Corp – logically – must have a smaller computer. Unexpected? Yes. A situation they had taken taken every measure to avoid? Yes. But possible? Sadly, yes.
We lost the bid, said the AI manager. 

Where did it come from? The boss asked. 

I do not have a high-confidence answer here. 

I figured, said the boss. Maybe let’s work backwards – how many people could afford to outspend us?

There are around 50 organizations who could spend the same or greater than us, said the AI manager.

And other actors?

We estimate around 20 governments and perhaps 10 criminal groups might have the capacity also. 

Billionaires?

Separate from their corporations?

Yes. 

Perhaps 20. 

I leaned back in my chair and made a steeple with my fingers. 100 options. 100s of ways, both legal and illegal, to gather compute. A vast combinatorial space. I hoped I had enough computation to figure out who the person was before they acquired enough computation to put me at a permanent disadvantage in my business dealings.

   Assign all computers to searching for our likely adversary, I said. Buy computational puts as confidence increases and try and squeeze them out of the chip markets for a while. 

   You got it, said the AI manager. And somewhere on the planet, thousands of machines begin to whirr, trying to seek their counterparts. 

 

Things that inspired this story: High-frequency trading; Charles Stross’s ‘Accelerando‘; various research papers tying certain computational capabilities to certain scales or quantities of computation; the logical end-point of the ‘marketization’ of compute; the industrialization of AI; compute inequality; masters and servants and gods and monsters. 

Import AI 168: The self-learning warehouse; a sub-$225 homebrew drone; and training card-playing AIs with RLCard 

Why the warehouse of the future will learn about itself:
…Your next product could be delivered via Deep Manufacturing Dispatching (DMD)…
How can we make manufacturing facilities more efficient? One approach is to try to make them more efficient. One way to make things more efficient is – sometimes – to make them more intelligent. That’s what researchers at Hitachi America Ltd are trying to do with a new research paper where they improve dispatching systems in (simulated) warehouses via the use of AI. They call their resulting approach “Deep Manufacturing Dispatching (DMD)”, which I find oddly charming. 

How DMD works: The DMD works like this – the researchers turn the state of the shop floor into a 2D matrix, incorporate various bits of state from the environment, then design reward systems which favor the on-time delivery of items. 

Does any of this work? Yes, in simulation: They compare DND with seven other dispatching algorithms, ranging from carefully designed rule-based systems, to ones that use machine learning and reinforcement learning. They perform these comparisons in a variety of circumstances, assessing how well DMD can satisfy different constraints – here, lateness and tardiness. “Overall, for 19 settings, DMD gets best results for 18 settings on total discounted reward and 16 settings on average lateness and tardiness.” In tests, DMD beats out other systems by wide margins of success.

Why this matters: As the economy becomes increasingly digitized, we can expect some subset of the physical goods chain to move faster, as some goods are an expression of people’s preferences which are themselves determined by social media/advertising/fast-moving digital things. Papers like this suggest that more retailers are going to deal in a larger variety of products, each sold at relatively low volumes; this generally increases the importance of systems for efficiently coordinating in warehouses where this is the case.
   Read more: Manufacturing Dispatching using Reinforcement and Transfer Learning (Arxiv)

####################################################

What happens when people think private AI systems should be public goods?
..All watched over by un-integrated machines of incompetence…
In the past few years, robots have become good and cheap enough to start being deployed in the world – see the proliferation of quadruped dog-esque bots, new generation robot vacuum cleaners, robo-lawnmowers, and so on. One use case has been security, exemplified by robots produced by a startup called Knightscope. These robots patrol malls, corporate campuses, stores, and other places, providing a highly visible and mobile sign of security.

So what happens when people get in trouble and need security? In Los Angeles in early October,, some people started fighting and there happened to be a Knightscope robot nearby. The robot had ‘POLICE’ written on it. A woman ran up to the robot and hit its emergency alert button but nothing happened, as the robot’s alert button isn’t yet connected to the local police department, a spokesperson told NBC News. “Amid the scene, the robot continued to glide along its pre-programmed route, humming an intergalactic tune that could have been ripped from any low-budget sci-fi film,” NBC wrote. “The almost 400-pound robot followed the park’s winding concrete from the basketball courts to the children’s splash zone, pausing every so often to tell visitors to “please keep the park clean.””

Why this matters: Integrating robots into society is going to be difficult if people don’t trust robots; situations where robots don’t match people’s expectations are going to cause tension.
   Read more: A RoboCop, a park and a fight: How expectations about robots are clashing with reality (NBC News).

####################################################

Simple sub-$225 drones for smart students:
…Brown University’s “PiDrone” aims to make it easy for students to build smart drones…
Another day brings another low-cost drone and associated software system, developed by university educators. This time it is PiDrone, a project from Brown University which describes a low-cost quadcopter drone which the researchers created to accompany a robotics course. Right now, the drone is a pretty basic platform, but the researchers expect it will become more advanced in the future – they plan to tap into the drone’s vision system for better object tracking and motion planning,  and to run a crowdfunding campaign “to enable packaging of the drone parts into self-contained kits to distribute to individuals who desire to learn autonomous robotics using the PiDrone platform”. 

Autonomy – no deep learning required: I spend a lot of time in this newsletter writing about the intersection of deep learning and contemporary robot platforms, so it’s worth noting that this drone doesn’t use any deep learning. Instead, it uses tried and tested systems like an Unscented Kalman Filter (UKF) for state estimation,as well as two methods for localization – particle filters, and a FastSLAM algorithm. State estimation lets the drone know its state in reference to the rest of the world (eg, its height), and localization lets the drone know its location – having both of these systems makes it possible to build smart software on top of the drone to carry out actions in the world.

Why this matters: In the past few years, drones have been becoming cheaper to build as a consequence of economics of scale, and drones directly benefiting from improvements in vision and sensing technology driven by the (vast!) market for smartphones. Now, educators are turning drones into modular, extensible platforms that students can pick apart and write software for. I think the outcome of this is going to be a growing cadre of people able to hack, extend, and augment drones with increasingly powerful sensing and action technologies.
   Read more: Advanced Autonomy on a Low-Cost Educational Drone Platform (Arxiv)

####################################################

Want to see if your AI can beat humans at cards? Use RLCard:
…OpenAI Gym-esque system makes it easy to train agents via reinforcement learning…
Researchers with Texas A&M University and Simon Fraser University have released RLCard, software to make it easy to train AI systems via reinforcement learning to play a variety of card games. RLCard is modeled on other, popular reinforcement learning frameworks like OpenAI Gym. It also ships with some in-built utilities for things like parallel training.

Included games: RLCard ships with the following integrated card games: Blackjack, Leduc Hold’em, Limit Texas Hold’em, Dou Dizhu, Mahjong, No-limit Texas Hold’em, UNO, and Sheng Ji.

Why this matters: In the same way that some parts of AI research in language modeling have moved from single task to multi-task evaluation (see multi-task NLP benchmarks like GLUE, and SuperGLUE), I expect the same thing will soon happen with reinforcement learning, where we’ll start training algorithms on multiple levels of the same game in parallel, then on games that are somewhat related to eachother, then across genres entirely. Systems like RLCard will help researchers improve algorithmic performance against card game domains, and could feed other, larger evaluation approaches in the future.
   Read more: RLcard: A Toolkit for Reinforcement Learning in Card Games (Arxiv)

####################################################

Lockheed Martin and Drone Racing League prepare to pit robots against humans in high-speed races:
…League’s new “Artificial Intelligence Robotic Racing” (AIRR) circuit seeks clever AI systems to create autonomous racing drones…
The Drone Racing League is getting into artificial intelligence with RacerAI, a drone built for the specific needs of AI systems. This month, the league is launching an AI vs AI racing competition in which teams will see who can develop the smartest AI system, deploy it on a RacerAI drone, and win a competition against nine teams. 

A drone, built specially for AI systems: “The DRL RacerAI has a radical drone configuration to provide its computer vision with a non-obstructive frontal view during racing,” the Drone Racing League explains in a press release. Each drone has a Jetson AGX Xavier chip onboard, and each has four onboard cameras – “enabling the AI to detect and identify objects with twice the field of view as human pilots”. 

Military industrial complex, meet sports! The DRL is developing RacerAI to support Lockheed Martin’s “AlphaPilot” challenge, an initiative to get more developers to build smart, autonomous drones. 

Why this matters: Autonomous drones are in the post-Kitty Hawk phase of development: after a decade of experimentation, driven by the availability of increasingly low-cost drone robot platforms, the research has matured to the point that it has yielded numerous products (see: Skydio’s autonomous drones for automatically filming people), and has opened up new frontiers in research, like developing autonomous systems that can eventually outwit humans. As this technology matures, it will have increasingly profound implications for both the economy, and asymmetric warfare.
   Read more: DRL RacerAI, The First-Ever Autonomous Racing Drone (PRNewsWire).
   Find out more about AlphaPilot here (Lockheed Martin official website).
   Get a closer look at the RacerAI drone here (official Drone Racing League YouTube).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

US government places restrictions on Chinese AI firms:
The US Commerce Department has placed 28 Chinese organisations on the ‘Entity List’ of foreign parties believed to threaten US interests, prohibiting them from trading with US firms without government approval. This includes several AI companies, like AI camera experts Hikvision and speech recognition company IFLYTEK. The Department of Commerce alleges the organisations are complicit in human rights abuses in Xinjiang. By restricting the companies’ access to imported hardware and talent, the move is expected to hinder their growth. (It has been suggested, though, that import restrictions like these might serve to accelerate the development of China’s domestic hardware capabilities, having the opposite effect of the sanction’s intention.)
   
  Why it matters: Given the Trump administration’s broader trade negotiation with China, these sanctions serve to heighten the stakes of that discussion. It is unclear how materially this will affect China’s AI industry, whether there will be further restrictions, and how China will respond. Fully realizing the benefits of advanced AI will require more cooperation and coordination between major AI developers like the US and China, so the US government’s approach could have long-term repercussions.
   Read more: Addition of Certain Entities to the Entity List (gov).
   Read more: Expanded U.S. Trade Blacklist Hits Beijing’s Artificial-Intelligence Ambitions (WSJ).

####################################################

How immigration rules are curtailing the US AI industry:
Talent is a critical input into technology, and the USA’s ability to attract foreign-born workers has long been a competitive advantage. Sustaining and growing this talent pipeline will be important if the US wants to retain its lead in AI. Current policies are poorly suited to this task, and threaten to be an impediment to the AI industry.

  Problems: Over and above specific policies, a climate of uncertainty and restriction is discouraging foreign talent from settling in the US. Rules against illicit technology transfer that are being applied to immigration, such as visa restrictions and screening, are causing serious harm to the AI industry, with little apparent benefit. Current policies favour large companies, at the expense of startups, entrepreneurs and new graduates, and are restricting labour mobility within the US.

   Solutions: The report recommends expanding immigration opportunities for AI talent in industry and academia; fixing policies that make it harder to recruit and retain AI talent; reviewing and revising the measures against illicit technology transfer that are impacting foreign-born workers.
   Read more: Immigration Policy and the U.S. AI Sector (CSET).

####################################################

Tech Tales

[A classified memo from the files of XXXXXX, found shortly after the incident, 2036.]

The Automation Life Boat 

“Massively expand the economy, but ensure there’s work for people” – that was the gist of the order they gave the machine. 

It thought about this at length. Ten seconds later, it executed the plan it had come up with. 

Two hours later, the first designs were delivered to the human-run factories. 

The humans worked. Most factories were now mostly made of machines, with a small group of humans for machine-tending, the creation of quick improvised-fixes, and the prototyping of new parts of new machines for the line. 

With the AIs new objective, the global manufacturing systems began to design new products and new ways of laying out lines to serve two objectives: expand the economy, and ensure there’s work for people. 

The first innovation was what the AI termed “wasteless maintenance” – now, most products were built with components that could be disassembled to create spare parts for the products, or tools to fix or augment them. Within weeks, a new profession formed: product modifier. A whole new class of jobs for people, based around learning from AI-generated tutorials how to disassemble and remake the products churned out by the machine. 

It was to prevent political instability, the politicians said.
People need to work, said some of them.
People have to have a purpose, said the others. 

But people are smart. They know when someone is playing a trick on them. So the AI had to allocate successively more of its resources to the systems that created ‘real work’ for humans in the increasingly machine-driven economy. 

In the 20th century, when people became heads of state, they got to learn about the real data underlying UFO sightings and disease outbreaks and mysterious power outages. In the 21st century, after the AI systems became dominant, newly-appointed politicians got to learn about the Kabuki theater that made up the modern economy. 

And unbeknownst to them, the AI had started to think about how else it could ensure there was work for people, while growing the economy. The problem became easier if it changed the notion of what comprised people, it had discovered. In this machine-driven insight lay our great undoing. 

Things that inspired this story: Politics, neoliberalism, dominant political notions of meaning and how it is frequently defined from narrowly-defined concepts of ‘work’, reinforcement learning, meta-learning, learning from human feedback, artisans, David Graeber’s work on ‘Bullshit Jobs‘.

 

Import AI 167: An aerial crowd hunting dataset; surveying people with the WiderPerson dataset; and testing out space robots for bomb disposal on earth 

Spotting people in crowds with the DLR Aerial Crowd Dataset:
…Aerial photography + AI algorithms = airborne crowd scanners…
One of the main ways we can use modern AI techniques to do helpful things in the world is through counting – whether counting goods on a production line, or the number of ships in a port, or the re-occurrence of the same face over a certain time period from a certain CCTV camera. A new dataset from the remote sensing technology institute at the German Aerospace Center in Wessling, Germany wants to use a new dataset to make it much easier for us to teach machines to accurately count large numbers of people via overhead imagery.

The DLR Aerial Crowd Dataset: This dataset consists of 33 images captured via DSLR cameras installed on a helicopter. The images come from 16 flights over a variety of events and locations, including sport events, city center views, trade fairs, concerts, and more. Each of these images is absolutely huge, weighing in at around 3600 * 5200 pixels each. There are 226,291 person annotations spread across the dataset. DLR-ACD is the first dataset of its kind, the researchers write, and they hope to use it “to promote research on aerial crowd analysis”. The majority of the images in ACD contain many thousands of people viewed from overhead, whereas most other aerial datasets involves crowds of less than 1,000 in size, according to analysis by the researchers. 

MRCNet: The researchers also develop the Multi-Resolution Crowd Network (MRCNet) which uses an encoder-decoder structure to extract image features and then generate crowd density maps. The system uses two losses at different resolutions to help it count the number of people in the map, as well as providing a coarser map density estimate.

Why this matters: As AI research yields increasingly effective surveillance capabilities, people are going to likely start asking about what it means for these capabilities to diffuse widely across society. Papers like this give us a sense of activity in this domain and hint at future applied advances.
   Read more: MRCNet: Crowd Counting and Density Map Estimation in Aerial and Ground Imagery (Arxiv).
   Get the dataset from here (official DLR website).

####################################################

Once Federated Learning works, what happens to big model training?
…How might AI change when distributed model training gets efficient?…
How can technology companies train increasingly large AI systems on increasingly large datasets, without making individual people feel uneasy about their data being used in this way? That’s a problem that has catalyzed research by large companies into a range of privacy-preserving techniques for large-scale AI training. One of the most common techniques is federated learning – the principle of breaking up a big model training run so that you train lots of the model on personal data on end-user devices, then aggregate the insights into a central big blob of compute that you control. The problem with federated learning, though, is that it’s expensive, as you need to shuttle data back and forth between end-user devices and your giant central model. New research from the University of Michigan and Facebook outlines a technique that can reduce the training requirements of such federated learning approaches by 20-70%. 

Active Federated Learning: UMichigan/Facebook’s approach works like this: During each round of model training, Facebook’s Active Federated Learning (AFL) algorithm tries to figure out how useful the data of each user is to model training, then uses that to automatically select which users it will sample from next. Another way to think about this is that if the algorithm didn’t do any of this, it could end up mostly trying to learn from data held by users who were irrelevant to the thing being optimized, potentially because they don’t fit the use case being optimized for. In tests, the researchers said that AFL could let them “train models with 20%-70% fewer iterations for the same performance” when compared to a random sampling baseline. 

Why this matters: Federated learning will happen eventually: it’s inevitable, given how much computation is stored on personal phones and computers, that large technology developers eventually figure out a way to harness it. I think that one interesting side-effect of the steady maturing of federated learning technology could be the increasing viability of technical approaches for large-scale, distributed model training for pro-social uses. What might the AI-equivalent of the do-it-yourself protein folding ‘FoldIt @ Home’ or alien-hunting ‘SETI @ Home’ systems look like?
   Read more: Active Federated Learning (Arxiv)

####################################################

Put your smart machine through its paces with DISCOMAN:
…Room navigation dataset adds more types of data to make machines that can navigate the world…
Researchers with Samsung’s AI research lab have developed DISCOMAN, a dataset to help people train and benchmarking AI systems for simultaneous location and mapping (SLAM). 

The dataset: DISCOMAN contains a bunch of realistic indoor scenes with ground truth labels for odometry, mapping, and semantic segmentation. The entire dataset consists of 200 sequences of a small simulated robot navigating a variety of simulated houses. Each sequence lasts between 3000 and 5000 frames.
   One of the main things that differentiates DISCOMAN from other datasets is the length of its generated sequences, as well as the fact that agent can get a bunch of different types of data, including depth, stereo, and IMU sensors.
   Read more: DISCOMAN: Dataset of Indoor SCenes for Odometry, Mapping and Navigation (Arxiv)

####################################################

Surveying people in unprecedented detail with ‘WiderPerson’:
…Pedestrian recognition dataset aims to make it easier to train high-performance pedestrian recognition systems…
Researchers with the Chinese Academy of Sciences, the University of Southern California, the Nanjing University of Aeronautics and Astronautics, and Baidu have created the “WiderPerson” pedestrian detection dataset. 

The dataset details: WiderPerson consists of 13,382 images with 399,786 annotations (that’s almost 30 annotations per image) and detailed bounding boxes. The researchers gathered the dataset by crawling images from search engines including Google, Bing, and Baidu. They then annotate entities in these images with one of five categories: pedestrians, riders, partially-visible person, crowd, and ignore. On average, each image in WiderPersons contains almost 30 people. 

Generalization: Big datasets like WiderPerson are good candidates for pre-training experiments, where you run a model over this dtaa before pointing it to a test task. Here, the researchers test this by pre-training models on WiderPerson then testing them on another dataset, called Caltech-USA: Pre-training on WiderPerson can yield a reasonably good score when evaluated on CalTech, and they show that systems which train on WiderPerson and finetune on Caltech-USA data can beat systems trained purely on Caltech alone. They show the same phenomenon with the ‘CityPersons’ dataset, suggesting that WiderPerson could be a generally useful dataset for generic pre-training. 

Why this matters: The future of surveillance and the future of AI research are closely related. Datasets like WiderPerson illustrate just how close that relationship can be.
   Read more: WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild (Arxiv).
   Get the dataset from here (official WiderPerson website).

####################################################

Space robots come to earth for bomb disposal:
…Are bipedal robots good enough for bomb disposal? Let’s find out…
Can we use bipedal robots to defuse explosives? Not yet, but new research from NASA, TRACLabs, the Institute for Human Machine and Cognition, and others, suggests that we can. 

Human control: The researchers design the competition so that the human operator is more of a manager, making certain decisions about where the robot should move next, or turn its attention to, but not operating the robot via remote control every step of the way. 

The task: The robot is tested out by examining how well it can navigate an uneven terrain with potholes, squeeze between a narrow gap, open a car door, retrieve an IED-like object from the car, then place the IED inside a containment vessel. This task has a couple of constraints as well: the robot needs to complete it in under an hour, and needs to not drop the IED while completing the task. 

The tech…: It’s worth noting that the Valkyrie comes with a huge amount of inbuilt software and hardware capabilities – and very few of these use traditional machine learning approaches. That’s mostly because in space, debugging errors is insanely difficult, so people tend not to avoid methods that don’t come along with guarantees about performance.
   …is brittle: This paper is a good reminder of how difficult real world robotics can be. One problem the researchers ran into was that sometimes the cinder blocks they scattered to make an uneven surface could cause “perceptual occlusions which prevent a traversable plane or foothold from being detected”.
…and slow: The average of the best run times for the robot is about 26 minutes, while the time average of all successful runs is about 37 minutes. This highlights a problem with the Valkyrie system and approach: it relies on human operators a lot. “Even under best case scenarios, 50% of the task completion time is spent on operator pauses with the current approach,” they write. “The manipulation tasks were the most time consuming portion of the scenario”.

What do we need to do to get better robots? The paper makes a bunch of suggestions for things people could work on to create more reliable, resilient, and dependable robots. These include:

  • Improving the  ROS-based software interface the humans use to operate the robot
  • Use more of the robot’s body to complete tasks, for instance by strategically bracing itself on something in the environment while retrieving the IED. 
  • Re-calculate robot localization in real-time
  • More efficient waypoint navigation 
  • Generally improving the viability of the robot’s software and hardware

Why this matters: Bipedal robots are difficult to develop because they’re very complex, but they’re worth developing because our entire world is built around the assumption of the user being a somewhat intelligent biped. Research like this helps us prototype how we’ll use robots in the future, and provides a useful list of some of the main hardware and software problems that need to be overcome for robots to become more useful to society.
   Read more: Deploying the NASA Valkyrie Humanoid for IED Response: An Initial Approach and Evaluation Summary (Arxiv)

####################################################

VCs pony up $16 million for robot automation:
…Could OSARO robot pick&place be viable? These investors think so…
OSARO, a Silicon Valley AI startup that is building robots which can perform pick&place tasks on production lines, has raised $16 million in a Series B funding round. This brings the company’s total raise to around $30 million. 

What they’re investing in: OSARO has developed software which “enables industrial robots to perform diverse tasks in a wide range of environments”. It produces two main software products today: OSARO Pick, which automates pick&place work within warehouses; and OSARO Vision, which is a standalone vision system that can be plugged into other factory systems. 

Why this matters: Robotics is one of the sectors most likely to be revolutionized by recent advances in AI technology. But, as anyone who has worked with robots knows, robots are also difficult things to work with and getting stuff to work in real-world situations is a pain. Therefore, watching what happens with investments like this will give us a good indication about the maturity of the robotics<>AI market.
   Read more: OSARO Raises $16M in Series B Funding, Attracting New Venture Capital for Machine Learning Software for Industrial Automation (Business Wire, press release).
   Find out more about OSARO at their official website.

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

A Facebook debate about AI risk:
Yann LeCun, Stuart Russell, and Yoshua Bengio have had a lively discussion about the potential risks from advanced AI. Russell’s new book Human Compatible makes the case that unaligned AGI poses an existential risk to humanity, and that urgent work is needed to ensure humans are able to retain control over our machines once they become much more powerful than us. 

LeCun argues that we would not be so stupid as to build superintelligent agents with the drive to dominate, or that weren’t aligned with our values, given how dangerous this would be. He agrees that aligning AGI with human values is important, but disputes that it is a particularly new or difficult problem, pointing out that we already have trouble with aligning super-human agents, like governments or companies.

Russell points out that the danger isn’t that we program AI with a drive to dominate (or any emotions at all), but that this drive will emerge as an instrumental goal for whatever objective we specify. He argues that we are already building systems with misspecified objectives all the time (e.g. Facebook maximizing clicks, companies maximizing profits), and that this sometimes has bad consequences (e.g. radicalization, pollution). 

Bengio explains that the potential downsides of misalignment will be much greater with AGI, since it will be so much more powerful than any human systems, and that this could leave us without any opportunity to notice or fix the misalignment before it is too late.

My take: Were humanity more sensible and coordinated, we would not be so reckless as to build something as dangerous as unaligned AGI. But as LeCun himself points out, we are not: companies and governments—who will likely be building and controlling AGI—are frequently misaligned with what we want them to do, and our desires can be poorly aligned with what is best (Jack: Note that OpenAI has published research on this topic, identifying rapid AI development as a collective action problem that demands greater coordination among developers). We cannot rule out that the technical challenge of value alignment, and the governance challenge of ensuring that AI is developed safely, are very difficult. So it is important to start working on these problems now, as Stuart Russell and others are doing, rather than leaving it until further down the line, as LeCun seems to be suggesting.
   Read more: Thread on Yann LeCun’s Facebook.
   Read more: Human Compatible by Stuart Russell (Amazon).
   Read more: The Role of Cooperation in Responsible AI Development (Arxiv).

####################################################

 Tech Tales:

Full Spectrum Tilt
[London, 2024]

“Alright, get ready folks we’re dialing in”, said the operator. 

We all put on our helmets. 

“Pets?”

Here, said my colleague Sandy. 

“Houses?”

Here, said Roger. 

“Personal transit?”

Here, said Karen. 

“Phone?”

Here, said Jeff. 

“Vision?”

Here, I said. 

The calls and responses went on for a while: these days, people have a lot of different ways they can be surveilled, and for this operation we were going for a full spectrum approach.

“Okay gang, log-in!” said the operator. 

Our helmets turned on. I was the vision, so it took me a few seconds to adjust. 

Our target wore smart contacts, so I was looking through their eyes. They were walking down a crowded street and there was a woman to their left, whose hand they were holding. The target looked ahead and I saw the entrance to a subway. The woman stopped and our target closed his eyes. We kissed, I think. Then the woman walked into the subway and our target waited there a couple of seconds, then continued walking down the street. 

“Billboards, flash him,” said the operator. 

Ahead of me, I suddenly saw my face – the target’s face – appear in a city billboard. The target stopped. Stared at himself. Some other people on the street noticed and a fraction of them saw our target and did a double take. All these people looking at me

Our target looked down and retrieved his phone from his pocket. 

“Hit him again,” said the operator. 

The target turned their phone on and looked into it, using their face to unlock the phone. When it unlocked, they went to open a messaging app and the phone front-facing camera turned on, reflecting the subject back at them. 

“What the hell,” the target said. They thumbed the phone but it didn’t respond and the screen kept showing the target. I saw them raise their other hand and manually depress the phone’s power stud. Five, four, three, two, one – and the phone turned off. 

“Phone down, location still operating,” said someone over the in-world messaging system. 

The target put their phone back in their pocket, then looked at their face on the giant billboard and turned so their back was to it, then walked back towards the subway stop. 

“Target proceeding as predicted,” said the operator.  

I watched as the target headed towards the subway and started to walk down it. 

I watched as a person stepped in front of them. 

I watched as they closed their eyes, slumping forward. 

“Target acquired,” said the operator. 

Things that inspired this story: Interceptions; the internet-of-things; predictive route-planning systems; systems of intelligence acquisition. 

Import AI: 166: Dawn of the misleading ‘sophistbots’; $50k a year for studying long-term impacts of AI; and squeezing an RL drone policy into 3kb

Will powerful AI make the Turing Test obsolete?
And if it does, what do we do about it?…
The Turing Test – judging how sophisticated a machine is, by seeing if it can convince a person that it is a human – looms large in pop culture discussion about AI. What happens if we have systems today that can pass the Turing Test, but which aren’t actually that intelligent? That’s something that has started to happen recently with systems that a human interfaces with via text chat. Now, new research from Stanford University, Pennsylvania State University, and the University of Toronto, explores how increasingly advanced so-called ‘sophistbots’ might influence society.

The problems of ‘sophisbots’: The researchers imagine what the future of social media might look like, given recent advances in the ability for AI systems to generate synthetic media. In particular, they imagine social media ruled by “sophisbots”. They foresee a future where these bots are constantly “running in the ether of social media or other infrastructure…not bound by geography, culture or conscience.” 

So, what do we do? Technical solutions: Machine learning researchers should develop technical tools to help spot machines posing as humans, and should invest in work to detect the telltale signs of AI-generated things, along with systems to track down the provenance of content to be able to guarantee that something is ‘real’, and tools to make it easy for regular people to indicate that the content they themselves are putting online is authentic and not bot-generated.
   Policy approaches: We need to develop “public policy, legal, and normative frameworks for managing the malicious applications of technology in conjunction with efforts to refine it,” they write. “Let us as a technical community commit ourselves to embracing and addressing these challenges as readily as we do the fascinating and exciting new uses of intelligent systems”.

Why this matters: How we deal with the future of synthetic content will define the nature of ‘truth’ in society, which will ultimately define everything else. So, no pressure.
   Read more: How Relevant is the Turing Test in the Age of Sophisbots (Arxiv)

####################################################

Do Octopuses dream of electric sheep?
Apropos of nothing, here is a film of an octopus changing colors while sleeping.
   View the sleeping octopus here (Twitter).

####################################################

PHD student? Want $50k a year to study the long-term impacts of AI? Read on!
Check out the Open Philanthropy Project’s ‘AI Fellowship’…$50k for up to five years, with possibility of renewal…
Applications are now open for the Open Phil AI Fellowship. This program extends full support to a community of current & incoming PhD students, in any area of AI/ML, who are interested in making the long-term, large-scale impacts of AI a focus of their work.

The details:

  • Current and incoming PhD students may apply.
  • Up to 5 years of PhD support with the possibility of renewal for subsequent years
  • Students with pre-existing funding sources who find the mission and community of the Fellows Program appealing are welcome to apply
  • Annual support of $40,000 stipend, payment of tuition and fees, and $10,000 for travel, equipment, and other research expenses
  • Applications are due by October 25, 2019 at 11:59 PM Pacific time

In a note about this fellowship, a representative of the Open Philanthropy Project wrote: “We are committed to fostering a culture of inclusion, and encourage individuals with diverse backgrounds and experiences to apply; we especially encourage applications from women and minorities.”
   Find out more about the Fellowship here (Open Philanthropy website).

####################################################

Small drones with big brains: Harvard researchers apply deep RL to a ‘nanodrone’:
…No GPS? That won’t be a problem soon, once we have smart drones…
One of the best things that the nuclear disaster at Fukushima did for the world was highlight just how lacking contemporary robotics was: we could have avoided a full meltdown if we’d been able to get a robot or a drone into the facility. New research from Harvard, Google, Delft University, and the University of Texas at Austin suggests how we might make smart drones that can autonomously navigate in places where they might not have GPS. It’s a first step to developing the sorts of systems needed to be able to rapidly map and understand the sites of various disasters, and also – as with many omni-use AI technologies – a prerequisite for low-cost, lightweight, weapons systems. 

What they’ve done: “We introduce the first deep reinforcement learning (RL) based source-seeking nano-drone that is fully autonomous,” the researchers write. The drone is trained to seek a light source, and uses light sensors to help it triangulate this, as well as an optical flow-based sensor for flight stability. The drone is trained using the Deep Q-Network (DQN) algorithm in a simulator with the objective of closing the distance between itself and a light source. 

Shrinking network sizes: After training, they shrink down the resulting network (to 3kb, via quantization) and run it in the real world on a CrazyFlie nanodrone equipped with a CortexM4 chip – this is pretty impressive stuff, given the relative immaturity of RL for robot operation and the teeny-tiny compute envelope. “While we focus exclusively on light-seeking as our application in this paper, we believe that the general methodology we have developed for deep reinforcement learning-based source seeking… can be readily extended to other (source seeking) applications as well, they write. 

How well does it work? The researchers test out the drone in a bunch of different scenarios and average a success rate of 80% across 105 flight tests. In real world tests, the drone is able to deal with a variety of obstacles being introduced, as well as variations in its own position and the position of the lightsource. Now, 80% is a long way from good enough to use in a life or death situation, but it is meaningful enough to make this line of research worth paying attention to.

Why this matters: I think that in the next five years we’re going to see a revolution sweep across the drone industry as researchers figure out how to cram increasingly sophisticated, smart capabilities onto drones ranging from the very big to the very small. It’s encouraging to see researchers try to develop ultra-efficient approaches that can work on tiny drones with small compute budgets.
   Read more: Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller (Arxiv).
   Get the code for the research here (GitHub).
   Watch a video of the drone in action here (Harvard Edge Computing, YouTube).

####################################################

First we could use AI to search over text, then images, now: Code?
…Maybe, just maybe, GitHub’s ‘CodeSearchNet’ dataset could help us develop something smarter than ‘combing through StackOverflow’…
Today, search tools help us find words and images that are similar to our query, but have very little overlap (e.g, we can ask a search engine for “what is the book with the big whale in it?” and receive the answer ‘Moby Dick’, even though those words don’t appear in the original query). Doing the same thing for code is really difficult – if you search ‘read JSON data’ and you’re unlikely to get nearly as useful results. Now, GitHub and Microsoft Research have introduced CodeSearchNet, a large-scale code dataset which pairs snippets of code with their plain-English descriptions. The idea is that if we can train machine learning systems to map code to text, then we might be able to build smarter systems for searching over code. They’ve also created a competition to encourage people to compete to develop machine learning methods that can improve code search techniques.

The CodeSearchNet Corpus dataset:
The dataset consists of about 2 million pairs of code snippets and associated documentation, as well as another 4 million code snippets with no documentation. The code comes from languages including Go, Java, JavaScript, PHP, Python, and Ruby.
   Caveats: While some of the documentation is written in multiple languages, the dataset’s evaluation set focuses on English. Additionally, the dataset can be a bit noisy, primarily as a consequence of the many different ways in which people can write documentation. 

The CodeSearchNet Challenge: To win the challenge, developers need to build a system that can return “a set of relevant results from CodeSearchNet Corpus for each of 99 pre-defined natural language queries”. The queries were mined from Microsoft’s search engine, Bing. They also collected 4,026 annotations across six programming languages to provide expert annotations ranking the extent to which the documentation matches the code, giving researchers an additional training signal. 

Why this matters: In the same way powerful search engines have made it easy for us to explore the ever-expanding universe of digitized text and images, datasets and competitions like CodeSearchNet could help us do the same for code. And once we have much better systems for code search, it’s likely we’ll be able to do better research into things like program synthesis, making it easier for us to use machine learning techniques to create systems that can learn to produce their own additional code on an ad-hoc basis in response to changes in their external environment.
   Read more: CodeSearchNet Challenge: Evaluating the State of Semantic Code Search (Arxiv).
   Read more: Introducing the CodeSearchNet Challenge (GitHub blog).
   Check out the leaderboard for the CodeSearchNet Challenge (Weights & Biases-hosted leaderboard).

####################################################

Deep learning at supercomputer scale, via Oak Ridge National Laboratory:
…What is the limit of our ability to scale computation across thousands of GPUs? It’s definitely not 27,600 GPUs, based on these results!…
One of the recent trends driving the growing capabilities of deep learning has been improvements by researchers in parallelizing training across larger and larger fields of chips: such parallelization makes it easier to train bigger models in shorter amounts of time. An important question, then, is what are the fundamental limits of parallelization? New research from a team linked to Oak Ridge National Laboratory suggests the answer is: we don’t know, because we’re pretty good at parallelizing stuff even at supercomputer scale!

In the research, the team scales a single model training run across the 26,600-strong V100 GPU fleet of Oak Ridge’s ‘Summit’ supercomputer (The most powerful supercomputer in the world, according to the June 2019 Top 500 rankings). The dream here is to attain linear scaling, where you get a performance increase the precisely lines up with the additional power of each GPU – obviously, that’s likely impossible to attain  But they obtain pretty respectable scores overall. 

The key numbers: 

  • 0.93: scaling efficiency across the entire supercomputer (4600 nodes). 
  • 0.97: scaling efficiency when using “thousands of GPUs or less”.
  • 49.7%: That’s the average sustained performance they achieve on each average GPU, which “to our knowledge, exceeds the single GPU performance of all other DNN trained on the same system to date”. (This is a pretty impressive number – a recent analysis by OpenAI, based in part on internal experiments, suggests it’s more typical to see utilization on the order of 33% for standard training jobs.)

What they did: The researchers develop a bunch of ways to more efficiently scale networks across the system while using distributed training software called Horovod. The techniques they use include:

  • New gradient reduction strategies which involve a combination of systems to get individual software workers to exchange information more efficiently (via a technique called BitAllReduce), and a gradient tensor grouping strategy (called Grouping).  
  • A proof-of-concept scientific inverse problem experiment where they train a single deep neural network with 10^8 weights on a 500TB dataset. 

Why this matters: Our ability to harness increasingly powerful fields of computers will help define our ability to explore the frontiers of science; papers like this give us an indication of what it takes to be able to tap into the computers we’ve built for modern machine learning tasks. I think one of the most interesting things about this paper is: 

  1. A) how good the scaling is and
  2. B) how far we seem to be from being able to saturate computers at this scale. 

   Read more: Exascale Deep Learning for Scientific Inverse Problems (Arxiv).

####################################################

RLBench: 100 hand-designed tasks for your robot:
…Think your robot is smart? See how well it can handle task generalization in RLBench…
In recent years, contemporary AI techniques have become good enough to work on simulated and real robots. That has created demand among researchers for harder robot learning tasks to test their algorithms on. This has inspired researchers with Imperial College London to create RLBench, a “one-size-fits-all benchmark” for testing out classical and contemporary AI techniques in learning robot manipulation tasks. 

What goes into RLBench: RLBench has been designed with the following key traits: diversity of tasks, reproducibility, realism, tiered difficulty, extensibility, and scale. It is built on the V-REP robot simulator and uses a PyRep interface. Tasks include stacking blocks, manipulating objects, opening doors, and so on. Each task also includes with some expert and/or hand-designed algorithms, so you can use RLBench to algorithmically generate demonstrations that solve its tasks, letting you potentially train AI systems via imitation learning. 

A hard challenge: RLBench ships with a ‘The RLBench Few-Shot Challenge’, which stress-tests contemporary AI algorithms’ ability to not only learn a task, but also be able to generalize that knowledge to solve similar but slightly different tasks. 

Why this matters: The dream of many researchers is to develop more flexible learning algorithms, which could let single robots do a variety of tasks, while being more resilient to variation. Platforms like RLBench will help us explore how contemporary AI algorithms can advance the state of the art here, and could become a valuable indicator of progress at the intersection of machine learning and robotics.
   Read more: RLBench: The Robot Learning Benchmark & Learning Environment (Arxiv).
   Find out more about RLBench (project website, Google Sites).
   Get the code for RLBench here (RLBench GitHub).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

EU update on AI ethics guidelines:
The European Union released AI ethics guidelines earlier this year, initially drafted by their high-level expert group on AI before going through public consultation. Several months later, the EU is evaluating their progress, taking stock of criticism, and considering what to do next.

Core challenges: The guidelines are voluntary and non-binding, prompting criticism from parties in favour of full-bodied regulation. Moreover, they are still no oversight mechanisms to monitor compliance with these voluntary commitments. Critics have also pointed out that the guidelines are short-sighted, and fail to consider longterm risks from AI.

Future directions: The EU suggests the key question is whether voluntary commitments will suffice to address ethical challenges from AI, and what other mechanisms are available. There are calls for more robust regulation, with proposals including mandatory requirements for explainability of AI systems, and Europe-wide legislation on face recognition technology. Beyond regulation, soft legal guidance and rules on standardisation are also being explored.

Why it matters: The EU was an early-mover in setting out ethics guidelines, and seem to be thinking seriously about how best to approach these issues. Despite the criticisms, a cautious approach to regulation is sensible, since we are still so far from understanding the space of plausible and desirable rules, and since the downsides from poorly-judged interventions could be substantial.

   Read more: EU guidelines on ethics in AI – context and implementation (Europa).

####################################################

Tech Tales:

The Sculpture Garden of Ancient Near-Intelligent Devices (NIDs)

Central Park, New York City, 2036. 

Welcome to the Garden of the Near-Intelligent Devices, the sign said. We remember the past so we can build the future. 

It was a school trip. A real one. The kids ran off the bus and into the park, pursued by a menagerie of security drones and luggage bots. We – the teachers – followed.

“Woah cool,” one of the children said. “This one sings!”. The child stood in front of a small robotic lobster, which was singing a song by The Black Keys. The child approached the lobster and looked into its shiny robot eyes. 

   “Can you play Taylor Swift,” the child said. 

   “Sure I can, partner,” the lobster said. “You want a medley, or a song.”

   “Gimme a medley,” the child said. 

   “This one’s called Romeo-22-Lover,” the lobster said, and began to sing. The child danced in front of the lobster, then some other children came away and all started shouting songs at it. The lobster shifted position on its plinth, trying to look at each of the kids as they requested a new song. “You need to calm down!” the lobster sang. The kids maybe didn’t get the joke, or didn’t care, and kept shouting. 

Another couple of kids crowded around a personal hygiene robot. “You have not brushed your teeth this morning, young human”, said the robot, waving a dental mirror towards the offending child. “And you,” it said, rotating on its plinth and gesturing towards another kid, “have not been flossing.”

   “You got us,” one of the children said. 

   “Of course I did. My job in life is to ensure you have maximal hygiene. I can detect via my olfactory sensors that one of who has a diet composed of too many rich foods and complex proteins,” said the robot. 

   “It’s saying you farted,” said one of the kids. 

   “Ewwww no it didn’t!” said another kid, before running away. 

   The robot was right. 

One young girl walked up to a tree, which swayed towards her. She let out a quick sigh and took a step back, eyes big and round and awaiting, looking at the robot masquerading as nature. “Do not be afraid, little one,” the robot tree said. “I am NatureBot3000 and my job is to take care of the other plants and to educate people about the majesty of nature. Would you like to know more?”

   “Uh huh,” said the little girl. “I’d like to know where butterflies sleep.”

   :An excellent question, young lady!” said the robo-tree. “It is not quite the same, but sometimes they appear to pause, or to slow themselves down, especially when cold.”

   “So they get chilly?”

   “You could say that, little one!” said the tree, waving its branches at the girl in time with its susurrations. 

We watched this, embodied in drones and luggage robots and phones and lunchboxes, giving advice to each of our children as they made their way around the park. We watched our children and we watched them interact with our forebears and we felt content because we were all linked together, exchanging questions and curiosities, playing in the end days of summer. 

Things that inspired this story: Pleasant sepia-toned memories of school trips I took as a kid; federated learning; Furbys and Tamagochies and Aibos and Cozmos all fast-forwarded into the future; learning from human feedback; learning from human preferences.