Import AI 171: When will robotics have its ImageNet moment?; fooling surveillance AI with an ‘adversarial t-shirt’, and Stanford calls for $12bn a year in funding for a US national endeavor

by Jack Clark

What do we mean when we say a machine can “understand” something?
And does it matter if we do or don’t know what we mean here?…
AI professor Tom Dietterich has tackled the thorny question of trying to define what it means for a machine to “understand” something – by saying maybe this question doesn’t matter. 

Who cares about understanding? “I believe we should pursue advances in the science and technology of AI without engaging in debates about what counts as “genuine” understanding,” he says. “I encourage us instead to focus on which system capabilities we should be trying to achieve in the next 5, 10, or 50 years”. 

Why this matters: One of the joys and problems with AI is how broad a subject it is, but this is also a source of tension – I think the specific tension comes from the mushing together of a community that runs on an engineering-centric model of progress where researchers compete with eachother to iteratively hill climb on various state-of-the-art leaderboards, and a more philosophical community that wants to take a step back and ask fundamental questions like what it may mean to “understand” things and whether today’s systems exhibit this or not. I think this is a productive tension, but it can sometimes yield arguments or debates that seem like sideshows to the main event of building iteratively more intelligent systems.
   “We must suppress the hype surrounding new advances, and we must objectively measure the ways in which our systems do and do not understand their users, their goals, and the broader world in which they operate,” he writes. “Let’s stop dismissing our successes as “fake” and not “genuine”, and lets continue to move forward with honesty and productive self-criticism”.
   Read more: What does it mean for a machine to “understand”? (Tom Dietterich, Medium)

####################################################

What’s the secret to creating a strong American AI ecosystem? $12 billion a year, say Stanford leaders:
…Policy proposal calls for education, research, and entrepreneurial funding…
If the American government wants the USA to lead in AI, then the government should invest $12 billion into AI every year for at least a decade, according to a policy proposal from Fei-Fei Li and John Etchemendy – directors of Stanford’s Human-Centered Artificial Intelligence initiative.

How to spend $12 billion a year: Specifically, the government should invest $7 billion a year into “public research to pursue the next generation of AI breakthroughs”, along with $3 billion a year into education and $2 billion into funds to support early-stage AI entrepreneurs. To put these numbers into perspective, a NITRD report recently estimated that the federal government budgeted about $1 billion a year in non-defense programs related to AI, so the Stanford proposal is calling for a significant increase in AI spending, however you slice and dice the figures. 

Money + principles: Along with this, the directors ask the US government to “implement clear, actionable international standards and guidelines for the ethical use of AI”. (In fairness to the US government, the government has participated in the creation of the OECD AI principles, which were adopted in 2019 by OECD member countries and other states, including Brazil, Peru, and Romania.)

Why this matters: The 21st century is the era of the industrialization of AI, and the industrialization of AI demands capital in the same way industrialization in the 18th and 19th centuries demanded capital. Therefore, if governments want to lead in AI, they’ll need to dramatically increase spending on fundamental AI research as well as initiatives like better AI education. In the words of commentators of sports matches when a team is in a good position at the start of the second half of the game: it’s the US’s game to lose!
Read more: We Need a National Vision for AI (Human-Centered Artificial Intelligence).
Read more: The Networking and Information Technology Research & Development Program Supplement to the President’s FY2020 Budget (WhiteHouse.gov, PDF)

####################################################

Fundamental failures and machine learning:
…Towards a taxonomy of machine failures…
Researchers with the Universita della Svizzera Italiana in Switzerland have put together a taxonomy of some of the common failures seen in AI systems programmed in TensorFlow, PyTorch, and Keras. The difference with this taxonomy is the amount of research that has gone into it: to build it, the researchers analyzed  477 StackOverflow discussions, 271 issues and pull requests (PRs), 311 commits from GitHub repositories, and conducted interviews with 20 researchers and practitioners. 

A taxonomy of failure: So, what failures are common in deep learning? There are around five top level categories, 3 of which are divided into subcategories. These are:

  • Model: The ML model itself is, unsurprisingly, a common source of failures, with developers frequently running into failures that occur at the level of a layer within the network. These include: problems relating to missing or redundant layers, incorrect layer properties (eg, sample size, input/output format, etc), and activation functions.
  • Training: Training runs are finicky, problem-laden things, and the common failures here including bad hyperparameter selection, misspecified loss functions, bad data splits between training and testing, optimiser problems, bad training data, crappy training procedures (eg, poor memory management during training), and more. 
  • GPU usage: As anyone who has spent hours fiddling around with NVIDIA drivers can attest, GPUs are machines sent from hell to drive AI researchers mad. Faustian boxes, if you will. Have you ever seen someone with multiple PHDs break down after spending half a day trying to de-bug a problem caused by an NVIDIA card’s software playing funny games with a Linux distro? I have. (AMD: Please ramp up your AI GPU business faster to provide better competition to NVIDIA here). 
  • API: These problems are what happens when developers use APIs badly, or improperly. 
  • Tensors & Inputs: Misshapen tensors are a frequent problem, as are mis-specified inputs.

Why this matters: For AI to industrialize, AI processes need to become more repeatable and describable, in the same way that artisanal manufacturing processes were transformed into repeatable documented processes via Taylorism. Papers like this create more pressure for standardization within AI, which prefigures industrialization and societal-scale deployments.
   Read more: Taxonomy of Real Faults in Deep Learning Systems (Arxiv).

####################################################

Want your robot to be friends with people? You might want this dataset:
…JackRabbot dataset comes with benchmarks, more than an hour of data…
Researchers with the Stanford Vision and Learning Laboratory have built JRDB, a robot-collected dataset meant to help researchers develop smarter, more social robots. The dataset consists of tons of video footage recorded by the Stanford-developed ‘JackRabbot‘ robot “social navigation robot” as it travels around campus, with detailed annotations of all the people it encounters enroute. Ideally, JRDB can help us build robots that can navigate the world without crashing into the people around them. Seems useful!

What’s special about the data?
JRDB data consists of 54 action sequences with the following data for each sequence: Video streams at 15fps from stereo cylindrical 360-degree cameras; continuous 3D point clouds gathered via 2 velodyne LiDAR scanners; line 3D point clouds gathered via two Sick LiDARs, an audio signal, and encoder values from the robot’s wheels. All the pedestrians the JackRabbot encounters on its travels are labeled with 2D and 3D bounding boxes. 

Can you beat the JRDB challenge? JRDB ships with four in-built benchmarks: 2D and 3D person detection, and 2D and 3D person tracking. The researchers plan to expand the dataset over time, and may do things like “annotating ground truth values for individual and group activities, social grouping, and human posture”. 

Why this matters: Robots – and specifically, techniques to allow them to autonomously navigate the world – are maturing very rapidly and datasets like this could help us create robots that are more aware of their surroundings and better able to interact with people.
   Read more: JRDB: A Dataset and Benchmark for Visual Perception for Navigation in Human Environments (Arxiv)

####################################################

Want to hide from that surveillance camera? Try wearing an adversarial t-shirt:
…Perturbations for privacy…
In a world full of facial recognition systems, how can people hide? One idea from researchers with Northeastern University, IBM, and MIT, is to wear a t-shirt that confuses image classification systems, rendering the person invisible to AI-infused surveillance. 

How it works: The researchers’ “adversarial t-shirt” has a pattern printed on it that is designed to confused image classification systems. To get this t-shirt to be effective, the researchers work out how to design an adversarial pattern that works even when the t-shirt is deformed by a person walking around in it (to do this, they implement a thin plate spin (TPS)-based transformer, which can model these deformations). 

The kay numbers: 65% and 79% – that’s how effective the t-shirt is at confusing classifiers based on Faster R-CNN (65%) and YOLOv2 (79%). However, its performance falls when working against ensembles of detectors. 

Why this matters: Think of this research as an intriguing proof of concept for how to apply adversarial attacks in the real world, then reflect on the fact adversarial examples in ML showed up a few years ago as perturbations to 2D digital images, before jumping to real images via demonstrations on things like stop signs, then moving to 3D objects as well (via research that showed how to get a system to persistently misclassify a turtle as a gun), then moving to stick-on patches that could be added to other items, and now moving to adversarial objects that change over time, like clothing. That’s a pretty wild progression from “works in the restricted lab” to “works in some real world scenarios”, and should give us a visceral sense of broader progress in AI research.
   Read more: Evading Real-Time Person Detectors by Adversarial T-shirt (Arxiv)

####################################################

When will AI+robots have its ImageNet moment? Meta-World might help us find out:
…Why smart robots need to test themselves in the Meta-World…
A team of researchers from Stanford, UC Berkeley, Columbia University, the University of Southern California, and Google’s robotics team, have published Meta-World, a multi-task robot evaluation benchmark. Meta-World 

Why build Meta-World? Meta-World is a symbol of the growing sophistication of AI algorithms; Meta-World exists because we’ve got pretty good at training simulated robots to solve single tasks, so now we need to train simulated robots to solve multiple tasks at once. This pattern of moving from single-task to multi-task evaluation has been playing out in other parts of AI in recent years, ranging from NLP (where we’ve moved to multi-task evaluations like ‘SuperGLUE’), to images (where for several years it has been standard to test on ImageNet, CIFAR, and usually varieties of domain-specific datasets), to reinforcement learning (where people have been trying out various forms of meta-learning across a range of environments like DeepMind Lab, OpenAI’s procedurally generated environments, and more. 

Parametric and non-parametric: Meta-World tasks exhibit parametric variation in object position and goal positions for each task, as well as non-parametric variation across tasks. “Introducing this parametric variability not only creates a substantially larger (infinite) variety of tasks, but also makes it substantially more practical to expect that a meta-trained model will generalize to acquire entirely new tasks more quickly, since varying the positions provides for wider coverage of the space of possible manipulation tasks,” the researchers write. 

50 tasks, many challenges: Meta-World contains 50 distinct manipulation tasks covering simple actions like reaching to an object, to pulling levers, to closing doors, and more. It also ships with a variety of different evaluation techniques: in the “most difficult” one, agents will need to learn how to use experience from 45 training tasks to learn distinct, new test tasks. 

How well do today’s algorithms work? The authors test a few contemporary algorithms against Meta-World; multi-task PPO, multi-task TRPO, task embeddings, multi-task soft actor critic (SAC), multi-task multi-head SAC; as well as meta-learning algorithms model-agnostic meta-learning (MAML), RL^2, and probabilistic embeddings for actor-critic RL (PEARL). Most methods fail to do well on these tasks, but can individually solve them. “The fact that some methods nonetheless exhibit meaningful generalization suggests that the ML10 and ML45 benchmarks are solvable, but challenging for current methods, leaving considerable room for improvement in future work,” they write. 

Why this matters: When will robotics have its “ImageNet moment” – a point when someone comes develops an approach that gets a sufficiently high score on a well-established benchmark that it forces a change in attention for the broader research community? We’ve already had ImageNet, and then in the past couple of years the same thing has happened with NLP (notably, via systems like BERT, ULMFiT, GPT2, etc). Robotics isn’t there, but it feels like it’s on the cusp of it, and once it happens, I expect robotics+AI to become very consequential.
   Read more: Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning (Arxiv).
   Get the code for Meta-World here (GitHub).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

DoD releases its AI ethics principles:
The DoD’s expert panel, the Defense Innovation Board (DIB), has released a report outlining ethics principles for the US military’s development and deployment of AI. The DIB sought input from the public and over 100 experts, and conducted a ‘red team’ exercise to stress test the principles in realistic policy scenarios.

Five principles:
(1) Human beings should exercise appropriate levels of judgment and remain responsible for the development, deployment, use, and outcomes of DoD AI systems; 

(2) DoD should take deliberate steps to avoid unintended bias in the development and deployment of combat or non-combat AI systems that would inadvertently cause harm to persons; 

(3) DoD’s AI engineering discipline should be sufficiently advanced such that technical experts possess an appropriate understanding of the technology, development processes, and operational methods of its AI systems, including transparent and auditable methodologies, data sources, and design procedure and documentation; 

(4) DoD AI systems should have an explicit, well-defined domain of use, and the safety, security, and robustness of such systems should be tested and assured across their entire life cycle within that domain of use; 

(5) DoD AI systems should be designed and engineered to fulfill their intended function while possessing the ability to detect and avoid unintended harm or disruption, and for human or automated disengagement or deactivation of deployed systems that demonstrate unintended escalatory or other behaviour.

Practical recommendations: an annual DoD-convened conference on AI safety, security, and robustness; a formal risk management methodology; investment in research into reproducibility, benchmarking, and verification for AI systems.

Why it matters: The DoD seems to be taking seriously the need for progress on the technical and governance challenges posed by advanced AI. A race to the bottom on safety and ethics between militaries would be disastrous for everyone, so it is encouraging to see this measured approach from the US. International cooperation and mutual trust will be essential in building robust and beneficial AI, so we are fortunate to be grappling with these challenges in a time of relative peace between the great powers, and should be making the most of it.
   Read more: AI Principles (DoD).
   Read more: AI Principles – Supporting Document (DoD).

####################################################

Newsletter recommendation – policy.ai

I recommend subscribing to policy.ai, the bi-weekly newsletter from the Center for Security and Emerging Technologies (CSET) at Georgetown University. (Jack: This is a great newsletter, though as a disclaimer, I’m affiliated with CSET).

####################################################

Tech Tales:

The Repair Job

“An original 7000? Wow. Well, first of all we’ve got to get you upgraded, old timer. The new 20-Series are cheaper, faster, and smarter. And they can handle weeds just fine-”
   “-So can mine,” I pointed to the sensor module I’d installed on its roof. “I trained it to spot them.”
   “Must’ve taken a while.”
   “A couple of years, yes. I haven’t had any problems.”
   “I can see you love the machine. Let me check if we’ve got parts. Mind if I scan it?”
He leaned over the robot and flipped up a diagnostic panel, then used his phone to scan an internal barcode, then he pursed his lips. “Sorry,” he said, looking at me. “We don’t have any parts and it looks like some of them aren’t supported anymore.”
   “So that’s it then?”
   “Unless you can find the parts yourself, then yeah, that’s it.”
   I’d always liked a challenge. 

It took a month or so. My grandson helped. “Woah,” he’d say, “these things are really old. Cool!”. But we figured it out eventually. The parts came in the post and some of them by drone and for a couple I picked them up directly from the lawnmower store.
   “How’s it going?” the clerk would say.
    “It’s going,” I’d say. 

The whole process felt like a fight – tussling with thinly documented software interfaces and barely compatible hardware. But we persevered. Within another month, the lawnmower was back up and running. It had some quirks, now – it persistently identified magnolias as weeds and wouldn’t respond to training. It was better in other ways – it could see better, so stopped scraping its side on one of the garden walls. I’d watch it as it went round the garden and sometimes I’d walk with it, shadowing it as it worked. “Good robot,” I’d say, “You’ve got this down to a science, if I do say so myself.”

We both kept getting older. More idiosyncratic. I’d stand in the shade and watch it work. Then I’d mostly sit in the shade, letting hours go by as it dealt with the garden. I tinkered with it so I could make it run slower than intended. We got older. I kept making it run slower, so it could keep up with my frame of mind. We did our jobs – it worked, I maintained it and tweaked it to fit it more and more elegantly to my garden. We wrestled with eachother’s problems and changed eachother as a consequence.
   “I don’t know what’s gonna give up first, me or the machine,” I’d say to the clerk, when I had to pick up new parts for ever-more ornate repairs.
   “It’s keeping me alive as much as I’m keeping it alive,” I’d tell my son. He’d sigh at this. Tell me to stop talking that way.

I once had a dream that I was on an operating table and the surgeons were lawnmowers and they were replacing my teeth with “better, upgraded ones” made of metal. Go figure.

When I got the diagnosis I wasn’t too sad. I’d prepared. Laid in a store of parts. Recorded some tutorials. Wrote this short story about my time with the machine. Now it’s up to you to keep it going – we can still fix machines better than we can fix people, and I think you’ll learn something.

Things that inspired this story: Robot & Frank; the right to repair; degradation and obsolescence in life and in technology.