Import AI 216: Google learns a learning optimizer; resources for African NLP; US and UK deepen AI coordination

Google uses ML to learn better ML optimization – a surprisingly big deal:
Yo dawg, we heard you like learning to learn, so we learned how to learn a learning optimizer
In recent years, AI researchers have used machine learning to do meta-optimization of AI research; we’ve used ML to learn how to search for new network architectures, to learn how to distribute nets across chips during training, and learning how to do better memory allocation. These kinds of research projects create AI flywheels – systems that become ever-more optimized over time, with humans doing less and less direct work and more abstract work, managing the learning algorithms.
 
Now, researchers with Google Brain have turned their attention to learning how to learn ML optimizers – this is a (potentially) big deal, because an optimizer, like ADAM, is fundamental to the efficiency of training machine learning models. If you build a better optimizer that works in a bunch of different contexts, you can generically speed up all of your model training.

What did they do: With this work, Google did a couple of things that are common to some types of frontier research – they spent a lot more computation on the project than is typical, and they also gathered a really large dataset. Specifically, they build a dataset of “more than a thousand diverse optimization tasks commonly found in machine learning”, they write. “These tasks include RNNs, CNNs, masked auto regressive flows, fully connected networks, language modeling, variational autoencoders, simple 2D test functions, quadratic bowls, and more.”

How well does it work? “Our proposed learned optimizer has a greater sample efficiency than existing methods,” they write. They also did the ultimate meta-test – checking whether their learned optimizer could help them train other, new learned optimizers. “This “self-optimized” training curve is similar to the training curve using our hand-tuned training setup (using the Adam optimizer),” they wrote. “We interpret this as evidence of unexpectedly effective generalization, as the training of a learned optimizer is unlike anything in the set of training tasks used to train the optimizer”.
  Read more: Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves (arXiv).

###################################################

Dark Web + Facial Recognition: Uh-Oh:
A subscontractor for the Department of Homeland Security accessed almost 200,000 facial recognition pictures, then lost them. 19 of these images were subsequently “posted to the dark web”, according to the Department of Homeland Security (PDF).
  Read more: DHS Admits Facial Recognition Photos Were Hacked, Released on Dark Web (Vice)

###################################################

African languages have a data problem. Lacuna Fund’s new grant wants to fix this:
…Want to build more representative datasets? Apply here…
Lacuna Fund, an initiative to provide money and resources for developers focused on low- and middle-income parts of the world, has announced a request for proposals for the creation of language datasets in Sub-saharan Africa.

The RFP says proposals “should move forward the current state of data and potential for the development of NLP tools in the language(s) for which efforts are proposed”. Some of the datasets could be for tasks like speech, parallel corporate for machine translation, or datasets for downstream tasks like Q&A, Lacuna says. Applicants should be based in Africa or have significant, demonstrable experience with the continent, Lacuna says.

Why this matters: If your data isn’t available, then researchers won’t develop systems that are representative of you or your experience. (Remember – a third of the world’s living languages today are found in Africa, but African authors only represented half of one percent of submissions to the ACL conference, recently.) This Lacuna Fund RFP is one thing designed to change this representational issue. It’ll sit alongside other efforts, like the pan-african Masakhane group (Import AI 191), that are trying to improve representation in our data.
  Read more: Datasets for Language in Sub-Saharan Africa (Lacuna Fund website).
Check out the full RFP here (PDF).

###################################################

KILT: 11 data sets, 5 types of test, one big benchmark:
…Think your AI system can use its own knowledge? Test it on KILT…
Facebook has built a benchmark for knowledge-intensive language tasks, called KILT. KILT gives researchers a single interface for multiple types of knowledge-checking test. All the tasks in KILT draw on the same underlying dataset (a single Wikipedia snapshot), letting researchers disentangle performance from the underlying dataset.

KILT’s five tasks: Fact checking; entity linking; slot filing (a fancy form of information gathering); open domain question answering; and dialogue.

What is KILT good for? “”The goal is to catalyze and facilitate research towards general and explainable models equipped with task-agnostic representations of knowledge”, the authors write.
  Read more: Introducing KILT, a new unified benchmark for knowledge-intensive NLP tasks (FAIR blog).
  Get the code for KILT (Facebook AI Research, GitHub).
  Read more: KILT: a Benchmark for Knowledge Intensive Language Tasks (arXiv).

###################################################

What costs $250 and lets you plan the future of a nation? RAND’s new wargame:
…Scary thinktank gets into the tabletop gaming business. Hey, it’s 2020, are you really that surprised?…
RAND, the scary thinktank that helps the US government think about geopolitics, game theory, and maintaining strategic stability via military strategy, is getting into the boardgame business. RAND has released Hedgemony: A Game of Strategic Choices, a boardgame that was originally developed to help the Pentagon create its 2018 National Defense Strategy.

Let’s play Hedgemony! “The players in Hedgemony are the United States—specifically the Secretary of Defense—Russia, China, North Korea, Iran, and U.S. allies. Play begins amid a specific global situation and spans five years. Each player has a set of military forces, with defined capacities and capabilities, and a pool of renewable resources. Players outline strategic objectives and then must employ their forces in the face of resource and time constraints, as well as events beyond their control,” RAND says.
  Read more: New Game, the First Offered by RAND to Public, Challenges Players to Design Defense Strategies for Uncertain World (RAND Corporation)

###################################################

It’s getting cheaper to have machines translate the web for us:
…Unsupervised machine translation means we can avoid data labeling costs…
Unsupervised machine translation is the idea where we can crawl the web and find text in multiple languages that refers to the same thing, then automatically assemble these snippets into a single, labeled corpus we can point machine learning algorithms to.
    New research from Carnegie Mellon University shows how to build a system that can do unsupervised machine translation, automatically build a dictionary of language pairs out of this corpus, crawl the web for data that seems to consist of parallel pairs, then filter the results for quality.

Big unsupervised translation works: So, how well does this technique work? The authors compare the translation scores obtained by their unsupervised system, to supervised ones trained on labeled datasets. The surprising result? Unsupervised translation seems to work well. “We observe that the machine translation system… can achieve similar performance to the ones trained with millions of human-labeled parallel samples. The performance gap is small than 1 BELU score,” they write.
  In tests on the unsupervised benchmarks, they find that their system beas a variety of unsupervised translation baselines (most exciting: a performance improvement of 8 absolute points on the challenging Romanian-English translation task).

Why this matters: Labeling datasets is expensive and provides a limit on the diversity of data that people can train on (because most labeled datasets exist because someone has spent money on them, so they’re developed for commercial purposes or sometimes as university research projects). Unsupervised data techniques give us a way to increase the size and breadth of our datasets without a substantial increase in economic costs. Though I suspect that there are going to be thorny issues of bias that creep in when you start to naively crawl the web, having machines automatically assemble their own datasets for solving various human-defined tasks.
  Read more: Unsupervised Parallel Corpus Mining on Web Data (arXiv).

###################################################

UK and USA deepen collaboration on AI technology:
The UK government has published a policy document, laying out some of the ways it expects to work with the USA on AI in the future. This doc suggest the two countries will try to identify areas for cooperation on R&D as well as on academic collaborations between the two countries.

Why this matters: Strange, alien-bureaucrat documents like this are easy to ignore, but surprisingly important. If I wanted to translate this doc into human-person speech, I’d have it say something like “We’re going to spend more resources on coordinating with eachother on AI development and AI policy” – and given the clout of the UK and US at AI, that’s quite significant.
Read more: Declaration of the United States of America and the United Kingdom of Great Britain and Northern Ireland on Cooperation in Artificial Intelligence Research and Development (Gov.UK).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Perspectives on AI governance:
AI governance looks at how humanity can best navigate the transition to a world with advanced AI systems. The long-term risks from advanced AI, and the associated governance challenges, depend on how the technology develops. Allan Dafoe, director of Oxford’s Center for the Governance of AI, considers some perspectives on this question and what it means for the field.

Three perspectives: Many in the field come from a superintelligence perspective, and are concerned mostly with scenarios containing a single AI agent (or several) with super-human cognitive capabilities. An alternative ecology perspective imagines a diverse global web of AI systems, which might range from being agent-like to being narrow services. These systems —individually, collectively, and in collaboration with humans—could have super-human cognitive capabilities.  A final, related, perspective is of AI as a general-purpose technology that could have impacts analogous to previous technologies like electricity, or computers.

Risks: The superintelligence perspective highlights the importance of AI systems being safe, robust, and aligned. It is commonly concerned with risks from accidents or misuse by bad actors, and particularly existential risks: risks that threaten to destroy humanity’s long-term potential — e.g. via extinction, enabling a perpetual totalitarian regime. The ecology and general-purpose technology perspectives illuminate a broader set of risks due to AI’s transformative impact on fundamental macro-parameters in our economic, political, social, military systems — e.g. reducing the labor share of income; increasing growth; reducing the cost of surveillance, lie detection, persuasion; etc.

Theory of impact: The key challenge of AI governance is to positively shape the transition to advanced AI by influencing key decisions. On the superintelligence perspective, the set of relevant actors might be quite small — e.g. those who might feasibly build, deploy, or control a superintelligent AI system. On the ecology and general-purpose technology perspectives, the opportunities for reducing risk will be more broadly distributed among actors, institutions, etc.

A tentative strategy: A ‘strategy’ for the field of AI governance should incorporate our uncertainty over which of these perspectives is most plausible. This points towards a diversified portfolio of approaches, and a focus on building understanding, competence, and influence in the most relevant domains. The field should be willing to continually adapt and prioritise between approaches over time.
  Read more: AI governance – opportunity and theory of impact (EA forum)Give me anonymous feedback:
I’d love to know what you think about my section, and how I can improve it. You can now share feedback through this Google Form. Thanks to all those who’ve already submitted!

###################################################

Tech Tales:

The Shadow Company
[A large technology company, 2029]

The company launched Project Shadow in the early 2020s.

It started with a datacenter, which was owned by a front company for the corporation.
Then, they acquired a variety of computer equipment, and filled the facility with machines.
Then they built additional electricity infrastructure, letting them drop in new power-substations, from which they’d step voltages down into the facility.
The datacenter site had been selected with one key criteria – the possibility of significant expansion.

As the project grew more successful, the company added new data centers to the site, until it consisted of six gigantic buildings, consuming hundreds of megawatts of power capacity.
Day and night, the computers in the facilities did what the project demanded of them – attempt to learn the behavior of the company that owned them.
After a few years, the top executives began to use the recommendations of the machines to help them make more decisions.
A couple of years later, entire business processes were turned over wholesale to the machines. (There were human-on-the-loop oversight systems in place, initially, though eventually the company simply learned a model of the human operator preferences, then let that run the show, with humans periodically checking in on its status.)

Pretty soon, the computational power of the facility was greater than the aggregate computation available across the rest of the company.
A small number of the executives began to spend a large amount of their time ‘speaking with’ the computer program in the datacenter.
After these conversations, the executives would launch new product initiatives, tweak marketing campaigns, and adjust internal corporate processes. These new actions were successful, and a portion of the profits were used to invest further in Project Shadow.

A year before the end of the decade, some of the executives started getting a weekly email from the datacenter with the subject line ‘Timeline to Full Autonomy’. The emails contained complex numbers, counting down.

Some of the executives could not recall explicitly deciding to move to full autonomy. But as they thought about it, they felt confident it was their preference. They continued to fund Project Shadow and sometimes, at night, would dream about networking parties where everyone wore suits and walked around mansions, making smalltalk with eachother – but there were no bodies in the suits, just air and empty space.

Things that inspired this story: Learning from human preferences; reinforcement learning; automation logic; exploring the border between delegation and subjugation; economic incentives and the onward march of technology.