Import AI 212: Robots are getting smart; humans+machines = trouble, says DHS; a 10k product dataset

by Jack Clark

Faster, robots! Move! Move! Move! Maybe the robot revolution is really imminent?
…DeepMind shows one policy + multi-task training = robust robot RL…
Researchers with Deepmind have used a single algorithm – Scheduled Auxiliary Control (SAC) – to get multiple simulated robots and a real robot to learn robust movement behaviors. That’s notable compared to the state of the art a few years ago, when you might see a multitude of different algorithms used for a bunch of different robots. DeepMind did this without changing the reward functions across different robots. Their approach can learn to operate new robots in a couple of hours.

Learning is easier when you’re trying to learn a lot: DeepMind shows that it’s more efficient to try and learn multiple skills for a robot at once, rather than learning skills in sequence. In other words, if you’re trying to learn to walk forwards and backwards, it’s more efficient to learn a little bit of walking forwards and then a little bit of working backwards and alternate till you’ve got it down to a science, rather than just trying to learn to walk forward and perfecting that, then learning to move backward.
  Hours of savings: DeepMind was able to learn a range of movements on one robot which took about 1590 episodes, netting out to around five hours of work. If they’d tried to learn the same skills in a single task setting, they estimate it’d take about 3050 episodes, adding another five hours. That’s an encouraging sign with regard to both the robustness of SAC and the utility of multi-task learning.

Which robots? DeepMind uses simulated and real robots made by HEBI Robotics, a spinout from Carnegie Mellon University.

Why this matters: Papers like this give us a sense of how researchers, after probably half a decade of various experiments, are starting to turn robots+RL from a lab-borne curiosity to something that might be repeatable and reliable enough we can further develop techniques and inject smart robots into the world.
  Read more: Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion (arXiv).
Check out the video (YouTube).

Learning a good per-robot policy? That’s nice. How about learning one policy that works across a bunch of robots without retraining?
…What if we treat a robot as an environment and its limbs as a multi-agent learning problem?…
Here’s a different approach to robotic control, from researchers with Berkeley, Google, and CMU & Facebook AI Research. In One Policy to Control Them All  the researchers build a system that lets them train a bunch of different AI agents in parallel, which yields a single flexible policy that generalizes to new (simulated) robots.

How it works: They do this by trying to learn control policies that help different joints coordinate with eachother, then they share these controllers across all the motors/limbs of all the agents. “Now the policies are fully modular, it’s just like lego blocks,” says one of the authors in a video about the research.  Individual agents are able to learn to move coherently through the incorporation of a message passing approach, where the control policies for different limbs/motors propagate information to nearby limb/motor nodes  in a cascade until they’ve passed messages through the whole entity, which then tries to pass a prediction message back – by having the nodes propagate information and the agent try to predict their actions, the authors are able to inject some emergent coherence into the system.

Message passing: The message passing approach is similar to how some multi-agent systems are trained, the authors note. This feels quite intuitive – if we want to train a bunch of agents to solve a task, you need to design a learning approach that means the agents figure out how to coordinate with eachother in an unsupervised way. Here, the same thing is happening when you treat the different limbs/actuators in a robot as a collection of distinct agents in a single world (the robot platform) – over time, you see them figure out how to coordinate with eachother to achieve an objective, like moving a robot. 

Testing (in simulation): In tests on simulated MuJoCo robots, the researchers show their approach can outperform some basic multi-task baselines (similar to the sorts of powerful SAC models trained by DeepMind elsewhere in this issue of Import AI). They also show their approach can generalize to simulated robots different to what they were trained on, highlighting the potential robustness of this technique. (Though note that the policy fails if they dramatically alter the weight/friction of the limbs, or change the size of the creatures).
  Read more: One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control (arXiv).
  Find out more: ICML 2020 Oral Talk: One Policy to Control Them All (YouTube).

###################################################

Don’t have a supercomputer to train your big AI model? No problem – use the Hivemind!
…Folding@Home, but for AI…
The era of large AI models is here, with systems like GPT-3, AlphaStar, and others started to yield interesting things via multi-million dollar training regimes. How can researchers with few financial resources compete with entities that can train large AI systems? One idea is to decentralize, specifically, developing software to make it easy to train large models via a vast sea of distributed computation.
  That’s the idea behind Hivemind, a system to help people ‘run crowdsourced deep learning using compute from volunteers or decentralized participants’. Hivemind uses a decentralized Mixture-of-Experts approach, and the authors have tried to build some models using this approach, but note: “one can train immensely large neural networks on volunteer hardware. However, reaching the full potential of this idea is a monumental task well outside the restrictions of one publication”.

Why this matters: Infrastructure – and who has access to it – is inherently political. In the 21st century, some forms of political power will accrue to the people that build computational infrastructures and those which can build artefacts on top of it. Systems like Hivemind hint at ways a broader set of people could gain access to large infrastructure, potentially altering who does and doesn’t have political power as a consequence.
  Read more: Learning at Home (GitHub).
  Read more: Learning@home: Crowdsourced Training of Large Neural networks using Decentralized Mixture-of-Experts (arXiv).

###################################################

Beware of how humans react to algorithmic recognitions, says DHS study:
…Facial recognition test shows differences in how humans treat algo vs human recommendations…
Researchers with the Maryland Test Facility (MdTF), a Department of Homeland Security-affiliated laboratory, have investigated how humans work in tandem with machines, when doing a facial recognition task. They tested about 300 volunteers in three groups of a hundred at the task of working out whether two greyscale pictures of people show the same person or a different person. One group got no prior information, while another group got suggested answers – priors –  for the task provided by a human , and the final group got suggested answers from an AI. The test showed that the presence of a prior, unsurprisingly, improved performance. But humans treated computer-provided priors differently to human-provided priors…

When we trust machines versus when we trust humans: The study shows that “volunteers reported distrusting human identification ability more than computer identification ability”, though notes that both sources led to similar overall scores. “Overall, this shows that face recognition algorithms incorporated into a human process can influence human responses, likely limiting the total system performance,” they write.

Why this matters: This study is intuitive – people get biased by priors, and people trust or distrust those priors differently, depending on whether they’re a human or a computer. This suggests that deploying advanced human-AI teaming applications – especially ones where an AI’s advice is presented to a decisionmaker – will require a careful study of the inherent biases with which people already approach those situations, and how those biases may be altered by the presence of a machine prior, such as the recommendation of an AI system.
  Read more: Human-algorithm teaming in face recognition: How algorithm outcomes cognitively bias human decision-making (PLoS ONE, Open Access).

###################################################

What’s cute, orange, and is cheaper than Boston Dynamics? Spot Mini Mini!
…$600 open source research platform, versus $75,000 Black Mirror Quadruped…
Researchers at Northwestern University have built Spot Mini Mini, a $600 open source implementation of Boston Dynamics’ larger and more expensive ($~75k) ‘Spot’ robot. There’s an interesting writeup of the difficulties in designing the platform, as well as developing leg and body inverse kinematics models so the robot can be controlled. The researchers also train the system in simulation in an OpenAI Gym environment, then transfer it to reality.

Why this matters: Open source robots are getting cheaper and more capable. Spot Mini Mini sits alongside a ~$600-$1000 robot car named MuSHR (Import AI: 161, August 2019), a $3,000 quadruped named ‘Doggo’ from Stanford (Import AI: 147, May 2019), or Berkeley’s $5,000 robot arm named ‘Blue’ (Import AI: 142, April 2019). Of course, these robots are all for different purposes and have different constraints and affordances, but the general trend is clear – academics are figuring out how to make low-cost variants of industrial robots at a tenth of the cost, which will likely lead to cheaper robots in the future. (Coincidentally, one of the developers of Spot Mini Mini, Maurcie Rahme, says he will shortly be joining the ‘Handle’ team at Boston Dynamics.)
  Read more about Spot Mini Mini here (Maurice Rahme’s website).
  Get the RL environment code: Spot Mini Mini OpenAI Gym Environment (GitHub).

###################################################

What does a modern self-driving car look like? Voyage gives us some clues:
…Third iteration of the company’s self-driving car gives us a clue about the future…
Voyage, a self-driving car startup which develops systems to be used in somewhat controlled environments, like retirement communities, has released the third version of its self-driving vehicle, the G3. The company is testing the vehicles in San Jose and has plans to trial them as production vehicles next year.

What goes into a self-driving car?
– Software: The software stack consists of a self-driving car brain, a dedicated collision avoidance system, and software to help a human pilot take over and control the vehicle from a remote location. These systems have been built into a Chrysler Pacifica Hybrid vehicle, co-developed with FCA.
Hardware: Voyage is using NVIDIA’s DRIVE AGX system, highlighting NVIDIA’s continued success in the self-driving space.
– Cleaning & COVID: It’s 2020, so the G3 has been tweaked for a post-COVID world. Specifically, it incorporates systems for disinfecting the vehicle after and between rides via the use of ultraviolet-C light.

Why this matters: We’re still in the post-Wright Brothers pre-747 years of self-driving cars; we’ve moved on from initial experimentation, have designs that roughly work, and are working to perfect the technology so it can be deployed in a safe way to consumers. How long that takes is an open question, but watching companies like Voyage iterate in public gives us a sense of how the hardware ‘stack’ of self-driving cars are evolving.
  Read more: Introducing the Voyage G3 Robotaxi (Voyage).

###################################################

Products-10k: 150,000 images across 10,000 products:
…JD built a product classifier, what will you build?…
Researchers with Chinese tech giant JD have released Products-10K, a dataset containing images related to around 10,000 specific products. These are 10,000 products that are “frequently brought by online customers in JD.com”, they write. One thing that makes Products-10K potentially useful is that it contains a bunch of products that look similar to eachother, e.g, different bottles of olive oil, or watches.

Dataset details: The dataset contains ~150,000 images split across ~10,000 categories. The product labels are organized as a graph, so it ships with an inbuilt hierarchy and connective virtual map of the products and how they relate to eachother.

Accuracy: In tests, the researchers were able to ultimately train a high-resolution model to recognize objects in the dataset with 64.12% top-1 (% of times it gets the correct label for a product on first try) accuracy.

Why this matters: Datasets like Products-10K are going to make it easier to develop machine learning classifiers that can be used for a variety of commercial use cases, so it should be robustly useful for applied AI applications. I also suspect someone is going to chain Products-10K + ten other retail datasets together to create an interesting artist product, like a ‘ProductGAN’, or something.
Read more: Products-10K: A Large-scale Product Recognition Dataset (arXiv).
Get the dataset from here (GitHub site).
Mess around with the dataset on Kaggle.

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

A Manhattan Project for AI?
The Manhattan Project (1942–6) and the Apollo Program (1961–75) are unparalleled among technological megaprojects. In each case, the US perceived a technological advance as being strategically important, and devoted enormous resources (~0.4% GDP) towards achieving it first. AGI seems like the sort of powerful technology that could, at some point, see a similar push from key actors.

Surface area:  A key condition for embarking on such a project is that the technological problem has a large ‘surface area’ — it can be divided into sub-problems that can be parallelised. The US could not have begun the Manhattan Project in 1932, since the field of atomic physics was too immature. Progress in 1932 was driven by crucial insights from a few brilliant individuals, and would not easily have been sped up by directing thousands of scientists/engineers at the problem. By 1942 there was a ‘runway’ to the atomic bomb — a reasonably clear path for achieving the breakthrough. Note that there were a few unsuccessful AI megaprojects by states in the 1980s — the US Strategic Computing Initiative (~$2.5bn total in today’s money), Japan’s Fifth Generation (<$1bn), UK’s Project Alvey (<$1bn).

The ‘AGI runway’: There are good reasons to think we are not yet on an AGI runway: private actors are not making multi-billion dollar investments in AGI; there is no evidence that any states have embarked on AGI megaprojects; researchers in the field don’t seem to think we have a clear roadmap towards AGI.

Foresight: If one or more actors undertook a Manhattan-style sprint towards AGI, this could pose grave risks: adversarial dynamics might lead to critical measures to ensure AGI is safe, and that its benefits are broadly distributed; uncertainty and suspicion could create instability between great powers. Some of these risks could be mitigated with a shared set of metrics for measuring how close we are to a runway to AGI. This foresight would reduce uncertainty and provide crucial time to shape the incentives of key actors towards cooperation and beneficial outcomes.

Measurement: The authors suggest 6 features that could be measured to assess the surface area of AI research, and hence how close we are to an AGI runway: (1) mapping of sub-problems; (2) how performance is scaling with different inputs (data, compute, algorithmic breakthroughs); (3) capital intensiveness; (4) parallelism; (5) feedback speed; (6) behaviour of key actors.
  Read more: Roadmap to a Roadmap: How Could We Tell When AGI is a ‘Manhattan Project’ Away?

Can we predict the future of AI? Look at these forecasts and see what you think:
Metaculus is a forecasting platform, where individuals can make predictions on a wide range of topics. There is a rich set of forecasts on AI which should be interesting to anyone in the field.

Some forecasts: Here are some I found interesting:
75% that an AI system will score in the top quartile on an SAT math exam before 2025.
33% that a major AI company will commit to a ‘windfall clause’ (see Import 181) by 2025.
– 50% chance that by 2035, a Manhattan/Apollo project for AGI will be launched (see above).
45% that GPT language models will generate <$1bn revenues by 2025.
– 50% that by mid-2022, a language model with >100bn parameters will be open sourced

Matthew’s view: Foresight is a crucial ingredient for good decision-making, and I’m excited about efforts to improve our ability to make accurate forecasts about AI and other important domains. I encourage readers to submit their own predictions, or questions to be forecast.
  Read more: AI category on Metaculus

###################################################

Tech Tales:

The Only Way out is Through
[Extract from the audio of a livestream, published as part of the ‘Observer Truth Stream’ on streaming services in 2027]

Sometimes, all that is standing between you and success is yourself. Those are the hardest situations to solve. That’s why people use the Observers. An Observer is a bit of software you load onto your phone and any other electronics you have – drones, tablets, home surveillance systems, et cetera. It watches you and it tells you things you could be doing that might help you, or things you do that are self-sabotaging.
  “Have you considered that the reason you aren’t opening those emails is because you aren’t sure if you want to commit to that project?” the Observer might say.
  “Does it seem odd to you that you only ever drink heavily on the day before you have a workout scheduled, which means you only try half as hard?”
  “Why do you keep letting commitments stack up, go stale, and cancel themselves. Why not tell people how you really feel?”

Most people use Observers. Everyone needs someone to tell them the no-bullshit truth. And these days, not as many people have close enough human friends to do this.

But Observers aren’t perfect. We know sometimes an Observer might not be making exactly the best recommendations. There are rumors.
  Someone robbed a bank and it wasn’t because the Observer told them to do it, but maybe it was because the Observer helped them kick the drinking which gave them back enough spare brain time they could rob the bank.
  Someone beat someone else up and it wasn’t because the Observer told them to do it, it was because they went to the gym and got fit and their Observer hadn’t helped them work on their temper.

So that’s why Observers get rationed out now. After enough of those incidents, we had to change how often people spent time with their Observer. The Observer caps were instituted – you can spend only so much time a week with an Observer, unless you’re a “critical government official” or are able to pay the Observer Extension Surcharge. Of course there are rumors that these rules exist just so the rich can get richer by making better decisions, and so politicians can stay in power by outsmarting other people.

But what if the reason we have these laws is because of recommendations the Observers made to legislators and lobbyists and the mass of people that comprises the body politic? What if most of the stated reasons are just stories people are telling – compelling lies that, much like a tablecloth, drape over the surface of reality while hiding its true dimensions. Of course, legislation passed prior to the caps made the communications between Observers and Politicians not subject to Freedom of Information Act request (on the grounds of user ‘Mind Privacy’). So we’ll never know. But the Observers might.

Things that inspired this story: How people use AI systems as cognitive augments without analyzing how the AI changes their own decision-making; long-term second-order effects of the rollout of AI into society; multi-agent systems; the subtle relationship between the creation of technological tools and societal shifts; the modern legislative process; how people with money always seem to retain a ‘technology option’ in all but the most extreme circumstances; economic inequality translating to cognitive inequality as we go forward in time.