Import AI 138: Transfer learning for drones; compute and the “bitter lesson” for AI research; and why reducing gender bias in language models may be harder than people think

by Jack Clark

Why the unreasonable effectiveness of compute is a “bitter lesson” for AI research:
…Richard Sutton explains that “general methods that leverage computation are ultimately the most effective”…
Richard Sutton, one of the godfathers of reinforcement learning*, has written about the relationship between compute and and AI progress, noting that the use of larger and larger amounts of computation paired with relatively simple algorithms has typically led to the emergence of more varied and independent AI capabilities than many human-designed algorithms or approaches. “The only thing that matters in the long run is the leveraging of computation”, Sutton writes.

Many examples, one rule: Some of the domains where computers have beaten methods based on human knowledge include Chess, Go, speech recognition, and many examples in computer vision.

The bitter lesson: “We have to learn the bitter lesson that building in how we think we think does not work in the long run,” Sutton says. “The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.”

Why this matters: If compute is the main thing that unlocks new AI capabilities, then we can expect most of the strategic (and related geopolitical) landscape of AI research to re-configure in coming years around a compute-centric model, which will likely have significant implications for the AI community.
  Read more: The Bitter Lesson (Rich Sutton).
  * Richard Sutton literally (co)wrote the book on reinforcement learning.

#####################################################

AI + Comedy, with Naomi Saphra!
…Comedy set lampoons funding model in AI, capitalism, NLP…
Naomi Saphra, an NLP researcher, has put a video online of her doing stand-up AI comedy at a venue in Edinburgh, Scotland. Check out the video for her observations on working in AI research, funding AI research, tales about Nazi rocket researchers, and more.

  “You always have to ask yourself, who else finds this interesting? If you mean who reads my papers and cites my papers? The answer is nobody. If you mean who has given me money? The answer is mostly evil… you see I have the same problem as anyone in this world – I hate capitalism but I love money”.
  Watch her comedy set here: Naomi Saphra, Paying the Panopticon (YouTube).

#####################################################

Prototype experiment shows why robots might tag-team in the future:
…Use of a tether means 1+1 is greater than 2 here…
Researchers with the University of Tokyo, Japan, have created a two-robot team that can map its surroundings and traverse vertiginous terrain via the use of a tether, which lets an airborne drone vehicle assist a ground vehicle.

The drone uses an NVIDIA Jetson TX2 chip to perform onboard localization, mapping and navigation. The drone is equipped with a camera, time-of-flight sensor, and a laser sensor for height measurement. The ground vehicle is “based on a commercially available caterpillar platform” using a UP Core processing unit. The ground robot is running a copy of the robot operating system, which the airborne drone uses to connect to it.

Smart robots climb with a dumb tether: The robots work together like this: the UAV flies above the UGV and maps the terrain, feeding data down to the ground robot, giving it awareness of its surroundings. When the robots detect an obstruction, the UAV wraps the tether (which has a grappling hook on its end) around a tall object, and the UGV uses the secured tether to climb the object.

Real world testing: The researchers test their system in a small-scale real world experiment and find that the approach works, but has some problems: “Since we did not have a [tether] tension control mechanism due to the lack of sensor, the tether needed to be extended from the start and as the result, the UGV suffered from the entangled tether many times.”

Why this matters: In the future, we can imagine various robots of different types collaborating with eachother, using specialisms to operate as a unit, becoming more than the sum of their parts. Though as this experiment indicates we’re still at a relatively early stage of development here, and several kinks need to be worked out.
  Read more: UAV/UGV Autonomous Cooperation: UAV assists UGV to climb a cliff by attaching a tether (Arxiv).

#####################################################

Facebook tries to build a standard container for AI chips:
…New Open Compute Project (OCP) design supports both 12v and 48v inputs…
These days, many AI organizations are contemplating building data centers consisting of lots of different types of servers running many different chips, ranging from CPUs to GPUs to custom accelerator chips designed for AI workloads. Facebook wants to standardize the types of chassis used to house AI-accelerator chips, and has contributed an open source hardware schematic and specification to the Open Compute Project – a Facebook-born scheme to standardize the sorts of server equipment used by so-called hyperscale data center operators.

The proposed OCP accelerator module supports 12V and 48V inputs and can support up to 350W (12V) or up to 700W (48V) TDP (Thermal Design Power) for the chips in the module – a useful trait, given that many new accelerator chips guzzle significant amounts of power (though you’ll need to use liquid cooling for any servers consuming above 450W TDP). It can support single or multiple ASICs within each chassis, with support for up to eight accelerators per system.

Check out the design yourself: You can read about the proposed OCP Accelerator Module (OAM) in more detail here at the Open Compute Project (OCP) site.

Why this matters: As AI goes through its industrialization phase, we can expect people to invest more in the fundamental infrastructure which AI equipment requires. It’ll be interesting to see the extent to which there is demand for a standardized AI accelerator module, and symptoms for such demand will likely come from low-cost Asian-based original design manufacturers (ODMs) producing standardized chasses that use this design.
  Read more: Sharing a common form factor for accelerator modules (Facebook Code).

#####################################################

Want to reduce gender bias in a trained language model? Existing techniques may not work in the way we thought they did:
…Analysis suggests that ‘debiasing’ language models is harder than we thought…
All human language encodes within itself biases. When we train AI systems on human language, we tend to reflect the biases inherent to the language and to the data it was trained on. For this reason, word embeddings derived from AI systems trained over large corpuses of news datasets will frequently associate people of color with the concept of crime, while linking white people to professions. Similarly, these embeddings will tend to express gendered biases, with close concepts to a man being something like ‘king’ or ‘professional’, while a woman will typically be proximate to concepts like ‘homemaker’ or ‘mother’. Tackling these biases is complicated, requiring a mixture of careful data selection at the start of a project, and the application of algorithmic de-biasing techniques to trained models.

Now, researchers with Bar-Ilan University and the Allen Institute for Artificial Intelligence, have conducted an analysis that calls into question the effectiveness of some of the algorithmic methods used to debias models. “We argue that current debiasing methods… are mostly hiding the bias rather than removing it”, they write.

The researchers compare the embeddings in two different methods – Hard-Debiased (Bolukbasi et al) and GN-GloVe (Zhao et al) – which have both been modified to reduce apparent gender bias within trained models. They try to analyze the difference between the biased and debiased versions of each of these approaches, essentially by analyzing the different spatial relationships between embeddings from both versions. They find that these debiasing methods work mostly by shifting the problem to other parts of the models, so though they may fix some biases, other ones remain.

Three failures of debiasing: The specific failure modes they observe are as follows:

  • Words with strong previous gender bias are easy to cluster together
  • Words that receive implicit gender from social stereotypes (e.g. receptionist, hair-dresser, captain) still tend to group with other implicit-gender words of the same gender
  • The implicit gender of words with prevalent previous bias is easy to predict based on their vectors alone

  Why this matters: The authors say that “while suggested debiasing methods work well at removing the gender direction, the debiasing is mostly superficial. The bias stemming from world stereotypes and learned from the corpus is ingrained much more deeply in the embeddings space.”
  Studies like this suggest that dealing with issues of bias will be harder than people had anticipated, and highlights how much of the bias aspects of AI come from the real world data such systems are being trained on containing such biases.
  Read more: Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them (Arxiv).

#####################################################

Transfer learning with drones:
…Want to transfer something from simulation to reality? Add noise, and make some of it random…
University of Southern California, Los Angeles researchers have trained a drone flight stabilization policy in simulation and transferred it to multiple different real-world drones.

Simulate, noisily: The researchers add noise to a large number of aspects of the simulated quadcopter platform as well as by varying the motor lag on the simulated drone, creating synthetic data which they use to train more flexible policies. “To avoid training a policy that exploits a physically implausible phenomenon of the simulator, we introduce two elements to increase realism: motor lag simulation and a noise process,” they write. They also model noise for sensor and state estimation.

Transfer learning: They train the (simulated) drones using Proximal Policy Optimization (PPO) with a cost function designed to maximize stability of the drone platforms. They sanity-check the trained policies by running them in a different simulator (in this case, Gazebo using the RotorS package) and observing how well they generalize. “This sim-to-sim transfer helps us verify the physics of our own simulator and the performance of policies in a more realistic environment,” they write.

  They also validate their system on three real quadcopters, built around the ‘Crazyflie 2.0’ platform. “We build heavier quadrotors by buying standard parts (e.g., frames, motors) and using the Crazyflie’s main board as a flight controller,” they explain. They are able to demonstrate generalization of their policy across the different drone platforms, and show through ablations that adding noise and doing physics-based modelling of the systems during training can let them further improve performance.

Why this matters: Approaches like this show how people are increasingly able to arbitrage computers for real-world (costly) data; in this case, the researchers use compute to simulate drones, extend the simulation data with synthetically generated noise data and other perturbations, and then transfer this into the real world. Further exploring this kind of transfer learning approach will give us a better sense of the ‘economics of transfer’, and may allow us to build economic models that let us describe the tradeoffs between spending $ on compute for simulated data, and collecting real-world data.
  Read more: Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors (Arxiv).
  Check out the video here: Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors (YouTube).

#####################################################

Tech Tales

The sense of being looked at

Every day, it looks at something different. I spend my time, like millions of other people on the planet, working out why it is looking at that thing. Yesterday, the system looked at hummingbirds, and so any AI-operable camera in the world not deemed “safety-critical” spent the day looking at – or searching for – hummingbirds. The same was true of microphones, pressure sensors, and the various other actuators that comprise the inputs and outputs of the big machine mind.

Of course we know why the system does this at a high level: it is trying to understand certain objects in greater detail, likely as a consequence of integrating some new information from somewhere else that increases the importance of knowing about these objects. Maybe the system saw a bunch of birds recently and is now trying to better understand hummingbirds as a consequence? Or maybe a bunch of people have been asking the system questions about hummingbirds and it now needs to have more awareness of them?

But we’re not sure what it does with its new insights, and it has proved difficult to analyze how the system’s observation of an object changes its relationship to it and representation of it.

So you can imagine my surprise when I woke up today to find the camera in my room trained on me, and a picture of me on my telescreeen, and then as I left the house to go for breakfast all the cameras on the street turned to follow me. It is studying me, today, I suppose. I believe this is the first time it has looked at a human, and I am wondering what its purpose is.

Things that inspired this story: Interpretibility, high-dimensional feature representations, the sense of being stared at by something conscious.