Import AI 248: Google’s megascale speech rec system; Dynabench aims to solve NLP benchmarking worries; Australian government increases AI funding

by Jack Clark

Google makers a better speech recognition model – and scale is the key:
…Multilingual models match monolingual models, given sufficient scale…
Google has figured out how to surmount a challenge to training machine learning systems to understand multiple languages. In the past, monolingual models have typically outperformed multilingual models, because when you train a model on a whole bunch of languages, you can sometimes improve performance for the small-data languages but degrade performance on the large-data ones. No more! In a new study, Google shows that if you just train a large enough network on a large enough amount of data, you can get equivalent performance to a monolingual model, while being able to develop something that can do well on multiple languages at once.

The data: Google traits its model on languages ranging from English to Hindi to Chinese, with data amounts ranging from ~55,000 hours of speech to 7,700 per language, representing ~364,900 hours of speech in total across all languages. (To put it in perspective, it’s rare to get this much data – Spotify’s notably vast English-only podcast dataset clocks in at around 50,000 hours (Import AI 242), and a recent financial news dataset from Kensho weighs in at 5,000 hours (Import AI 244).)

Why large models are better: When Google trained a range of models on this dataset, it sound that “larger models are not only more data efficient, but also more efficient in terms of training cost as measured in TPU days – the 1B-param model reaches the same accuracy at 34% of training time as the 500M-param model”, they write. Google uses its ‘GShard’ infrastructure to train models of 220M, 370M, 500M, 1B, and 10B parameter counts.
There’s more work to do, though: “We do see on some languages the multilingual model is still lagging behind. Empirical evidence suggests it is a data balancing problem, which will be investigated in future.”
  Read more: Scaling End-to-End Models for Large-Scale Multilingual ASR (arXiv).

###################################################

Job opportunity! International policy lead for the UK’s CDEI:
Like AI policy? Want to make it better? Apply here…
The Center for Data Ethics and Innovation, a UK government agency focused on AI and data-intensive technologies, is hiring an International Policy Lead. “This is a unique cross-organisational role focused on producing high-quality international AI and data policy advice for CDEI leadership and teams, as well as connecting the CDEI’s experts with international partners, experts, and institutions.”
  Find out more and apply here (CDEI site).

###################################################

Australian government increases AI funding:
…All ‘$’ in this section refer to the Australian dollar…
The Australian government is investing a little over $124 million ($96m USD) into AI initiatives over the next four to six years, as part of the country’s 2021-2022 Federal Budget.

Four things for Australian AI: 

  • $53m for a “National Artificial Intelligence Center” which will itself create four “Digital Capability Centers” that will help Australian small and medium-sized businesses get connected to AI organizations and get advice on how to use AI.
  • $33.7 million to subsidize Australian businesses partnering with the government on pilot AI projects.
  • $24.7 million for the “Next Generation AI Graduates Program” to help it attract and train AI specialists. 
  • $12 million to be distributed across 36 grants to fund the creation of AI systems “that address local or regional problems” (Note: This is… not actually much money at all if you factor in things like data costs, compute costs, staff, etc.)

Why this matters: AI is becoming fundamental to the future technology strategy of most governments – it’s nice to see some outlay here. However, it’s notable to me how relatively small these amounts are, when you consider the size of Australia’s population (~25 million) and the fact these grants pay out over multiple years, and the increasing cost of large-scale AI research projects.
  Read more: Australia’s Digital Economy (Australian Government).

###################################################

Instead of building AIs to command, we should build AIs to cooperative with:
…New cooperative AI foundations aims to encourage research in building more collaborative machines…
A group of researchers think we need to build cooperative AI systems to get the greatest benefits from the nascent technology. That’s the gist of an op-ed in Nature by scientists with the University of Oxford, DeepMind, the University of Toronto, and Microsoft. The op-ed is accompanied by the establishment of a new Cooperative AI Foundation, which has an initial grant of $15m USD.

Why build cooperative AI? “AI needs social understanding and cooperative intelligence to integrate well into society”, the researchers write. “Cooperative intelligence is unlikely to emerge as a by-product of research on other kinds of AI. We need more work on cooperative games and complex social spaces, on understanding norms and behaviours, and on social tools and infrastructure that promote cooperation.”

Three types of cooperation: There’s room for research into systems that lead to better AI-AI collaboration, systems that improve AI-human cooperation, and tools that can help humans cooperative with eachother better.

Why this matters: Most of today’s AI research involves building systems that we delegate tasks to, rather than actively cooperate with. If we change this paradigm, I think we’ll build smarter systems and also have a better chance of developing ways for humans to learn from the actions of AI systems.
  Read more: Cooperative AI: machines must learn to find common ground (Nature).
  Find out more about the Cooperative AI foundation at the official website. ###################################################

Giant team of scientists tries to solve NLP’s benchmark problem:
…Text-processing AI systems are blowing up benchmarks as fast as they are being built. What now?…
A large, multi-org team of researchers have built Dynabench, software meant to support a new way to test and build text-processing AI systems. Dynabench exists because in recent years NLP systems have started to saturate most of the benchmarks available to them – SQuAD was quickly superseded by SQuADV2, GLUE was superceded by SuperGLUE, and so on. At the same time, we know that these benchmark-smashing systems (e.g, BERT, GPT2/3, T5) contain significant weaknesses which we aren’t able to test for today, the authors note.

Dynabench – the dynamic benchmark: Enter Dynabench. Dynabench is a tool to “evaluate models and collect data dynamically, with humans and models in the loop rather than the traditional static way”. The system makes it possible for people to run models on a platform where if, for example, a model performs very poorly on one task, humans may then generate data for these areas, which is then fed back into the model, which then runs through the benchmark again. “The data collected through this process can be used to evaluate state-of-the-art models, and to train even stronger ones, hopefully creating a virtuous cycle that helps drive progress in the field,” they say.

What you can use Dynabench for today: Today, Dynabench is designed around four core NLP tasks – testing out how well AI systems can perform natural language inference, how well they can answer questions, how they analyze sentiment, and the extent to which they can collect hate speech.
…and tomorrow: In the future, the researchers want to shift Dynabench from being English-only to being multilingual. They also want to carry out live model evaluation – “We would be able to capture not only accuracy, for example, but also usage of computational resources, inference time, fairness, and many other relevant dimensions.”
  Read more:Dynabench: Rethinking Benchmarking in NLP (arXiv).
Find out more about Dynabench at its official website.

###################################################

Google gets a surprisingly strong computer vision result using surprisingly simple tools:
…You thought convolutions mattered? You are like a baby. Now read this…
Google has demonstrated that you can get similar results on a computer vision task to systems that use convolutional neural networks, while instead using multi-layer perceptions (MLPs) – far simpler AI components.

What they did: Google has developed a computer vision classifier called MLP-Mixer (‘Mixer’ for short), a “competitive but conceptually and technically simple alternative” to contemporary systems that use convolutions or self-attention (e.g, transformers). Of course, this has some costs – Mixer costs dramatically more in terms of compute than the things it is competing with. But it also highlights how, given sufficient data and compute, a lot of the architectural innovations in AI can get washed away simply by scaling up dumb components to mind-bendingly large scales.

Why this matters: “We believe these results open many questions,” they write. “On the practical side, it may be useful to study the features learned by the model and identify the main differences (if any) from those learned by CNNs and Transformers. On the theoretical side, we would like to understand the inductive biases hidden in these various features and eventually their role in generalization. Most of all, we hope that our results spark further research, beyond the realms of established models based on convolutions and self-attention.”
Read more:MLP-Mixer: An all-MLP Architecture for Vision (arXiv).

###################################################

What do superintelligent-AI-risk skeptics think, and why?
…Worried about the people who aren’t worried about superintelligence? Read this…
Last week, we wrote about four fallacies leading to a false sense of optimism about progress in AI research (Import AI 247). This week, we’re looking at the opposite issue – why some people are worried that people are insufficiently worried about possibility of superintelligence. That’s the gist of a new paper from Roman Yampolskiy, a longtime AI safety researcher at the University of Louisville.

Why be skeptical? Some of the reasons to be skeptical of AI risks include: general AI is far away, there’s no obvious path from here to there, even if we built it and it was dangerous we could turn it off, superintelligence will be in some sense benevolent, AI regulation will deal with the problems, and so on.

Countermeasures to skepticism: So, how are AI researchers meant to rebut skeptics? One thing is to build more consensus among scientists around what comprises superintelligence, another is to try and educate people more about the technical priors that inform superintelligence-wary people, appeal to authorities like Bill Gates and Elon Musk which have talked about it, and, perhaps most importantly, “do not reference science-fiction stories”.

Why this matters: The superintelligence risk debate is really, really messy, and it’s fundamentally bound up with the contemporary political economy of AI (where many of the people who worry about superintelligence are also the people with access to resources and who are least vulnerable to the failures of today’s AI systems). That means talking about this stuff is hard, prone to ridicule, and delicate. Doesn’t mean we shouldn’t try, though!
Read more: AI Risk Skepticism (arXiv).

###################################################

Tech Tales:
[Department of Prior World Analysis, 2035]
During a recent sweep of REDACTED we discovered a cache of papers in a fire-damaged building (which records indicate was used as a library during the transition era). Below we publish in full the text from one undamaged page. For access to the full archive of 4332 full pages and 124000 partial scraps, please contact your local Prior World Analysis administrator).

On the ethics of mind-no-body transfer across robot morphologies
Heynrick Schlatz, Lancaster-Harbridge University
Published to arXiv, July, 2028.

Abstract:
In recent years, the use of hierarchical models for combined planning and movement has revolutionized robotics. In this paper we investigate the effects of transferring either a planning or a movement policy – but not both – from one robot morphology to another. Our results show that planning capabilities are predominantly invariant to movement policy transfer due to few-shot calibration, but planning policy transfer can lead to pathological instability.

Paper:
Hierarchical models for planning and movement have recently enabled the deployment of economically useful, reliable, and productive robots. Typical policies see both movement and planning systems trained in decoupled simulation environments, only periodically being trained jointly. This has created flexible policies that show better generalization and greater computational efficiency than policies trained jointly, or systems where planning and movement is distilled into the same single policy.

In this work, we investigate the effects of transferring either movement or planning policies from one robot platform to another. We find that movement policies can typically be transferred with a negligible performance degradation – even on platforms where they have more than twice the number of actuators to the originating platform. We find the same is not true for planning policies. In fact, planning policies demonstrate a significant degradation after being transferred from one platform to another.

Feature activation analysis indicates that planning policies suffer degradation to long-term planning, self-actualization, and generalization capabilities as a consequence of such transfers. We hypothesize the effect is analogous to what has been seen recently in biology – motor control policies can be transferred or fine-tuned from one individual to another, while attempts to transfer higher-order mental functions have proved unsuccessful and in some cases led to loss of life or mental function.

In figure 1., we illustrate transfer of a planning policy from robot morphology a) – a four-legged two-arm Toyota ground platform – to robot morphology b) a two-legged six-arm Mitsubishi construction platform. Table 1., reports performance of the movement policy under transfer. Table 2., reports performance of the planning policy. We find significant degradation of performance when conducting planning transfer. Analysis of feature activations shows within-distribution activations when deployed on originating platform a); upon transfer to platform b) we immediately see activation patterns shift to features previously identified to correlate to ‘confusion’, ‘dysmorphia’, and circuit activations linked to self-modelling, world-modelling, and memory retracement.

We recorded the planning policy transfer via four video cameras placed at corners of the demonstration room. As stills from video analysis in figure 2., show, we see the planning policy transfer leads to physical instability in the robot platform – it can be seen attempting to scale walls of demonstration room, then repeatedly moves with force into the wall. The robot was deactivated following attempts to use four of its six arms to remove one of its other arms. Features activated on the robot during this time showed high readings for dysphoria as well as a spike in ‘confusion’ activation which we have not replicated since, due to ethical concerns raised by the experiment.

Things that inspired this story: Reading thousands of research papers over the years for Import AI and thinking about what research from other timelines or worlds might look like; playing around with different forms for writing online; thinking about hierarchical RL which was big a few years ago but then went quiet and wondering if we’re due for an update; playing around with notions of different timelines and plans.