30 | September | 2019

Import AI: 166: Dawn of the misleading ‘sophistbots’; $50k a year for studying long-term impacts of AI; and squeezing an RL drone policy into 3kb

by Jack Clark

Will powerful AI make the Turing Test obsolete?
…And if it does, what do we do about it?…
The Turing Test – judging how sophisticated a machine is, by seeing if it can convince a person that it is a human – looms large in pop culture discussion about AI. What happens if we have systems today that can pass the Turing Test, but which aren’t actually that intelligent? That’s something that has started to happen recently with systems that a human interfaces with via text chat. Now, new research from Stanford University, Pennsylvania State University, and the University of Toronto, explores how increasingly advanced so-called ‘sophistbots’ might influence society.

The problems of ‘sophisbots’: The researchers imagine what the future of social media might look like, given recent advances in the ability for AI systems to generate synthetic media. In particular, they imagine social media ruled by “sophisbots”. They foresee a future where these bots are constantly “running in the ether of social media or other infrastructure…not bound by geography, culture or conscience.”

So, what do we do? Technical solutions: Machine learning researchers should develop technical tools to help spot machines posing as humans, and should invest in work to detect the telltale signs of AI-generated things, along with systems to track down the provenance of content to be able to guarantee that something is ‘real’, and tools to make it easy for regular people to indicate that the content they themselves are putting online is authentic and not bot-generated.
Policy approaches: We need to develop “public policy, legal, and normative frameworks for managing the malicious applications of technology in conjunction with efforts to refine it,” they write. “Let us as a technical community commit ourselves to embracing and addressing these challenges as readily as we do the fascinating and exciting new uses of intelligent systems”.

Why this matters: How we deal with the future of synthetic content will define the nature of ‘truth’ in society, which will ultimately define everything else. So, no pressure.
Read more: How Relevant is the Turing Test in the Age of Sophisbots (Arxiv).

####################################################

Do Octopuses dream of electric sheep?
Apropos of nothing, here is a film of an octopus changing colors while sleeping.
View the sleeping octopus here (Twitter).

####################################################

PHD student? Want $50k a year to study the long-term impacts of AI? Read on!
…Check out the Open Philanthropy Project’s ‘AI Fellowship’…$50k for up to five years, with possibility of renewal…
Applications are now open for the Open Phil AI Fellowship. This program extends full support to a community of current & incoming PhD students, in any area of AI/ML, who are interested in making the long-term, large-scale impacts of AI a focus of their work.

The details:

Current and incoming PhD students may apply.
Up to 5 years of PhD support with the possibility of renewal for subsequent years
Students with pre-existing funding sources who find the mission and community of the Fellows Program appealing are welcome to apply
Annual support of $40,000 stipend, payment of tuition and fees, and $10,000 for travel, equipment, and other research expenses
Applications are due by October 25, 2019 at 11:59 PM Pacific time

In a note about this fellowship, a representative of the Open Philanthropy Project wrote: “We are committed to fostering a culture of inclusion, and encourage individuals with diverse backgrounds and experiences to apply; we especially encourage applications from women and minorities.”
Find out more about the Fellowship here (Open Philanthropy website).

####################################################

Small drones with big brains: Harvard researchers apply deep RL to a ‘nanodrone’:
…No GPS? That won’t be a problem soon, once we have smart drones…
One of the best things that the nuclear disaster at Fukushima did for the world was highlight just how lacking contemporary robotics was: we could have avoided a full meltdown if we’d been able to get a robot or a drone into the facility. New research from Harvard, Google, Delft University, and the University of Texas at Austin suggests how we might make smart drones that can autonomously navigate in places where they might not have GPS. It’s a first step to developing the sorts of systems needed to be able to rapidly map and understand the sites of various disasters, and also – as with many omni-use AI technologies – a prerequisite for low-cost, lightweight, weapons systems.

What they’ve done: “We introduce the first deep reinforcement learning (RL) based source-seeking nano-drone that is fully autonomous,” the researchers write. The drone is trained to seek a light source, and uses light sensors to help it triangulate this, as well as an optical flow-based sensor for flight stability. The drone is trained using the Deep Q-Network (DQN) algorithm in a simulator with the objective of closing the distance between itself and a light source.

Shrinking network sizes: After training, they shrink down the resulting network (to 3kb, via quantization) and run it in the real world on a CrazyFlie nanodrone equipped with a CortexM4 chip – this is pretty impressive stuff, given the relative immaturity of RL for robot operation and the teeny-tiny compute envelope. “While we focus exclusively on light-seeking as our application in this paper, we believe that the general methodology we have developed for deep reinforcement learning-based source seeking… can be readily extended to other (source seeking) applications as well, they write.

How well does it work? The researchers test out the drone in a bunch of different scenarios and average a success rate of 80% across 105 flight tests. In real world tests, the drone is able to deal with a variety of obstacles being introduced, as well as variations in its own position and the position of the lightsource. Now, 80% is a long way from good enough to use in a life or death situation, but it is meaningful enough to make this line of research worth paying attention to.

Why this matters: I think that in the next five years we’re going to see a revolution sweep across the drone industry as researchers figure out how to cram increasingly sophisticated, smart capabilities onto drones ranging from the very big to the very small. It’s encouraging to see researchers try to develop ultra-efficient approaches that can work on tiny drones with small compute budgets.
   Read more: Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller (Arxiv).
   Get the code for the research here (GitHub).
   Watch a video of the drone in action here (Harvard Edge Computing, YouTube).

####################################################

First we could use AI to search over text, then images, now: Code?
…Maybe, just maybe, GitHub’s ‘CodeSearchNet’ dataset could help us develop something smarter than ‘combing through StackOverflow’…
Today, search tools help us find words and images that are similar to our query, but have very little overlap (e.g, we can ask a search engine for “what is the book with the big whale in it?” and receive the answer ‘Moby Dick’, even though those words don’t appear in the original query). Doing the same thing for code is really difficult – if you search ‘read JSON data’ and you’re unlikely to get nearly as useful results. Now, GitHub and Microsoft Research have introduced CodeSearchNet, a large-scale code dataset which pairs snippets of code with their plain-English descriptions. The idea is that if we can train machine learning systems to map code to text, then we might be able to build smarter systems for searching over code. They’ve also created a competition to encourage people to compete to develop machine learning methods that can improve code search techniques.

The CodeSearchNet Corpus dataset: The dataset consists of about 2 million pairs of code snippets and associated documentation, as well as another 4 million code snippets with no documentation. The code comes from languages including Go, Java, JavaScript, PHP, Python, and Ruby.
Caveats: While some of the documentation is written in multiple languages, the dataset’s evaluation set focuses on English. Additionally, the dataset can be a bit noisy, primarily as a consequence of the many different ways in which people can write documentation.

The CodeSearchNet Challenge: To win the challenge, developers need to build a system that can return “a set of relevant results from CodeSearchNet Corpus for each of 99 pre-defined natural language queries”. The queries were mined from Microsoft’s search engine, Bing. They also collected 4,026 annotations across six programming languages to provide expert annotations ranking the extent to which the documentation matches the code, giving researchers an additional training signal.

Why this matters: In the same way powerful search engines have made it easy for us to explore the ever-expanding universe of digitized text and images, datasets and competitions like CodeSearchNet could help us do the same for code. And once we have much better systems for code search, it’s likely we’ll be able to do better research into things like program synthesis, making it easier for us to use machine learning techniques to create systems that can learn to produce their own additional code on an ad-hoc basis in response to changes in their external environment.
   Read more: CodeSearchNet Challenge: Evaluating the State of Semantic Code Search (Arxiv).
   Read more: Introducing the CodeSearchNet Challenge (GitHub blog).
   Check out the leaderboard for the CodeSearchNet Challenge (Weights & Biases-hosted leaderboard).

####################################################

Deep learning at supercomputer scale, via Oak Ridge National Laboratory:
…What is the limit of our ability to scale computation across thousands of GPUs? It’s definitely not 27,600 GPUs, based on these results!…
One of the recent trends driving the growing capabilities of deep learning has been improvements by researchers in parallelizing training across larger and larger fields of chips: such parallelization makes it easier to train bigger models in shorter amounts of time. An important question, then, is what are the fundamental limits of parallelization? New research from a team linked to Oak Ridge National Laboratory suggests the answer is: we don’t know, because we’re pretty good at parallelizing stuff even at supercomputer scale!

In the research, the team scales a single model training run across the 26,600-strong V100 GPU fleet of Oak Ridge’s ‘Summit’ supercomputer (The most powerful supercomputer in the world, according to the June 2019 Top 500 rankings). The dream here is to attain linear scaling, where you get a performance increase the precisely lines up with the additional power of each GPU – obviously, that’s likely impossible to attain But they obtain pretty respectable scores overall.

The key numbers:

0.93: scaling efficiency across the entire supercomputer (4600 nodes).
0.97: scaling efficiency when using “thousands of GPUs or less”.
49.7%: That’s the average sustained performance they achieve on each average GPU, which “to our knowledge, exceeds the single GPU performance of all other DNN trained on the same system to date”. (This is a pretty impressive number – a recent analysis by OpenAI, based in part on internal experiments, suggests it’s more typical to see utilization on the order of 33% for standard training jobs.)

What they did: The researchers develop a bunch of ways to more efficiently scale networks across the system while using distributed training software called Horovod. The techniques they use include:

New gradient reduction strategies which involve a combination of systems to get individual software workers to exchange information more efficiently (via a technique called BitAllReduce), and a gradient tensor grouping strategy (called Grouping).
A proof-of-concept scientific inverse problem experiment where they train a single deep neural network with 10^8 weights on a 500TB dataset.

Why this matters: Our ability to harness increasingly powerful fields of computers will help define our ability to explore the frontiers of science; papers like this give us an indication of what it takes to be able to tap into the computers we’ve built for modern machine learning tasks. I think one of the most interesting things about this paper is:

A) how good the scaling is and
B) how far we seem to be from being able to saturate computers at this scale.

####################################################

RLBench: 100 hand-designed tasks for your robot:
…Think your robot is smart? See how well it can handle task generalization in RLBench…
In recent years, contemporary AI techniques have become good enough to work on simulated and real robots. That has created demand among researchers for harder robot learning tasks to test their algorithms on. This has inspired researchers with Imperial College London to create RLBench, a “one-size-fits-all benchmark” for testing out classical and contemporary AI techniques in learning robot manipulation tasks.

What goes into RLBench: RLBench has been designed with the following key traits: diversity of tasks, reproducibility, realism, tiered difficulty, extensibility, and scale. It is built on the V-REP robot simulator and uses a PyRep interface. Tasks include stacking blocks, manipulating objects, opening doors, and so on. Each task also includes with some expert and/or hand-designed algorithms, so you can use RLBench to algorithmically generate demonstrations that solve its tasks, letting you potentially train AI systems via imitation learning.

A hard challenge: RLBench ships with a ‘The RLBench Few-Shot Challenge’, which stress-tests contemporary AI algorithms’ ability to not only learn a task, but also be able to generalize that knowledge to solve similar but slightly different tasks.

Why this matters: The dream of many researchers is to develop more flexible learning algorithms, which could let single robots do a variety of tasks, while being more resilient to variation. Platforms like RLBench will help us explore how contemporary AI algorithms can advance the state of the art here, and could become a valuable indicator of progress at the intersection of machine learning and robotics.
   Read more: RLBench: The Robot Learning Benchmark & Learning Environment (Arxiv).
   Find out more about RLBench (project website, Google Sites).
   Get the code for RLBench here (RLBench GitHub).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

EU update on AI ethics guidelines:
The European Union released AI ethics guidelines earlier this year, initially drafted by their high-level expert group on AI before going through public consultation. Several months later, the EU is evaluating their progress, taking stock of criticism, and considering what to do next.

Core challenges: The guidelines are voluntary and non-binding, prompting criticism from parties in favour of full-bodied regulation. Moreover, they are still no oversight mechanisms to monitor compliance with these voluntary commitments. Critics have also pointed out that the guidelines are short-sighted, and fail to consider longterm risks from AI.

Future directions: The EU suggests the key question is whether voluntary commitments will suffice to address ethical challenges from AI, and what other mechanisms are available. There are calls for more robust regulation, with proposals including mandatory requirements for explainability of AI systems, and Europe-wide legislation on face recognition technology. Beyond regulation, soft legal guidance and rules on standardisation are also being explored.

Why it matters: The EU was an early-mover in setting out ethics guidelines, and seem to be thinking seriously about how best to approach these issues. Despite the criticisms, a cautious approach to regulation is sensible, since we are still so far from understanding the space of plausible and desirable rules, and since the downsides from poorly-judged interventions could be substantial.

####################################################

Tech Tales:

The Sculpture Garden of Ancient Near-Intelligent Devices (NIDs)

Central Park, New York City, 2036.

Welcome to the Garden of the Near-Intelligent Devices, the sign said. We remember the past so we can build the future.

It was a school trip. A real one. The kids ran off the bus and into the park, pursued by a menagerie of security drones and luggage bots. We – the teachers – followed.

“Woah cool,” one of the children said. “This one sings!”. The child stood in front of a small robotic lobster, which was singing a song by The Black Keys. The child approached the lobster and looked into its shiny robot eyes.

“Can you play Taylor Swift,” the child said.

“Sure I can, partner,” the lobster said. “You want a medley, or a song.”

“Gimme a medley,” the child said.

“This one’s called Romeo-22-Lover,” the lobster said, and began to sing. The child danced in front of the lobster, then some other children came away and all started shouting songs at it. The lobster shifted position on its plinth, trying to look at each of the kids as they requested a new song. “You need to calm down!” the lobster sang. The kids maybe didn’t get the joke, or didn’t care, and kept shouting.

Another couple of kids crowded around a personal hygiene robot. “You have not brushed your teeth this morning, young human”, said the robot, waving a dental mirror towards the offending child. “And you,” it said, rotating on its plinth and gesturing towards another kid, “have not been flossing.”

“You got us,” one of the children said.

“Of course I did. My job in life is to ensure you have maximal hygiene. I can detect via my olfactory sensors that one of who has a diet composed of too many rich foods and complex proteins,” said the robot.

“It’s saying you farted,” said one of the kids.

“Ewwww no it didn’t!” said another kid, before running away.

The robot was right.

One young girl walked up to a tree, which swayed towards her. She let out a quick sigh and took a step back, eyes big and round and awaiting, looking at the robot masquerading as nature. “Do not be afraid, little one,” the robot tree said. “I am NatureBot3000 and my job is to take care of the other plants and to educate people about the majesty of nature. Would you like to know more?”

“Uh huh,” said the little girl. “I’d like to know where butterflies sleep.”

:An excellent question, young lady!” said the robo-tree. “It is not quite the same, but sometimes they appear to pause, or to slow themselves down, especially when cold.”

“So they get chilly?”

“You could say that, little one!” said the tree, waving its branches at the girl in time with its susurrations.

We watched this, embodied in drones and luggage robots and phones and lunchboxes, giving advice to each of our children as they made their way around the park. We watched our children and we watched them interact with our forebears and we felt content because we were all linked together, exchanging questions and curiosities, playing in the end days of summer.

Things that inspired this story: Pleasant sepia-toned memories of school trips I took as a kid; federated learning; Furbys and Tamagochies and Aibos and Cozmos all fast-forwarded into the future; learning from human feedback; learning from human preferences.

Import AI

September 30, 2019

Import AI: 166: Dawn of the misleading ‘sophistbots’; $50k a year for studying long-term impacts of AI; and squeezing an RL drone policy into 3kb

by Jack Clark