Import AI 230: SuperGLUE solved (uh oh!); Graphcore raises $222m; spotting malware with SOREL

by Jack Clark

Finally – the US government passes a bunch of AI legislation:
…Senate and the House overall POTUS veto; NDAA passes…
The US government is finally getting serious about artificial intelligence, thanks to the passing of the NDAA – a mammoth military funding bill that includes a ton of different bits of AI legislation within itself. There’s a rundown of the contents of the bill in Import AI 228 (made possible by an excellent rundown by Stanford HAI). The US President tried to veto the bill, but the House and Senate overruled the POTUS veto.

Why this matters:
AI has so many potential benefits (and harms) that it’s helpful to invest some public money in supporting AI development, analyzing it, and better equipping governments to use AI and understand it. The legislation in the NDAA will make the US better prepared to take advantage of an AI era. Though it’s a shame that we’ve had to wait in some cases years for this legislation to get passed as the weirdly politicised legislative environment of the US means most big stuff needs to get stapled to a larger omnibus funding bill to pass.
  Read more:
Republican-led Senate overrides Trump defense bill veto in rare New Year’s Day session (CNBC).

###################################################

Boston Dynamics robots take dance classes:

…Surprisingly flexible hilarity ensues…
Boston Dynamics, the robot company, has published a video of its robots carrying out a range of impressive dance moves, including jumps, complex footwork, synchronized moves, and more.
  Check it out: you deserve it. (Boston Dynamics, YouTube).

###################################################

Personal announcement: Moving on from OpenAI:
I’ve moved on from OpenAI to work on something new with some colleagues. It’ll be a while before I have much to say about that. In the meantime, I’ll be continuing to keep doing research into AI assessment and I’ll still be working in AI policy at a range of organizations. Import AI has always been a personal project and it’s been one of the great joys of my life to write it and grow it and talk with so many of you readers. And it’s going to keep going!
– I’ll also be shortly announcing the 2021 AI Index Report, a project I co-chair at Stanford University, which will include a bunch of graphs analyzing AI progress in recent years, so keep your eyes peeled for that.

###################################################

Graphcore raises $222 million Series E:
…Non-standard chip company gets significant cash infusion…
Graphcore has raised a couple of hundred million in Series E financing, as institutional investors (e.g, the Ontario Teachers’ Pension Plan, Baillie Gifford) bet that the market for non-standard chips is about to go FOOM. Graphcore is developing chips, called IPUs (Intelligence Processing Unit), which are designed to compete with chips from NVIDIA and AMD (GPUs) and Google (TPUs) for the fast-growing market for chips for training AI systems.

Why this matters: As AI gets more important, people are going to want to buy more efficient AI hardware, so they get more bang for their computational buck. But doing a chip startup is very hard: the history of semiconductors is littered with the bodies of companies that tried to compete with companies like Intel and NVIDIA at substituting for their chips (remember Tilera? Calxeda? etc), but something changed recently: AI became a big deal while AI technology was relatively inefficient; NVIDIA took advantage of this by investing in software to get its naturally parallel processors (it’s a short jump from modeling thousands of polygons on a screen in parallel for gaming purposes, to doing parallel matrix multiplications) to be a good fit for AI. That worked for a while, but now companies like Graphcore and Cerebras systems are trying to capture the market by making efficient chips, custom-designed for the needs of AI workloads. There’s already some promising evidence their chips can do stuff better than others (see benchmarks from Import AI 66) At some point, someone will crack this problem and the world will get a new, more efficient set of substrates to train and run AI systems on. Good luck, Graphcore!
  Read more: Graphcore Raises $222 million in Series E Funding Round (Graphcore, blog).

###################################################

SuperGLUE gets solved (perhaps too quickly):
…NLP benchmark gets solved by T5 + Meena combination…
SuperGLUE, the challenging natural language processing and understanding benchmark, has been solved. That’s both a good and a bad thing. It’s good, because SuperGLUE challenges an AI system to do well at a suite of distinct tests, so good scores on SuperGLUE indicate a decent amount of generality. It’s bad, because SuperGLUE was launched in early 2019 (Import AI: 143) after surprisingly rapid NLP progress had saturated the prior ‘GLUE’ benchmark.

Who did it:
Google currently leads the SuperGLUE leaderboard, with an aggregate score of 90 (compared to 89.8 for human baselines on SuperGLUE). Microsoft very briefly held the winning position with a score of 89.9, before being beaten by Google in the final days of 2020.

Why this matters: How meaningful are recent advances in natural language processing? Tests like SuperGLUE are designed to give us a signal. But if we’ve saturated the benchmark, how do we know what additional progress means? We need new, harder benchmarks. There are some candidates out there – the Dynabench eval suite includes ‘far from solved benchmarks‘ for tasks like NLI, QA, Sentiment, and Hate Speech. But my intuition is we need even more tests than this, and we’ll need to assemble them into suites to better understand how to analyze these machines.
 
Check out the SuperGLUE leaderboard here.

###################################################

Want to use AI to spot malware? Use the massive SOREL dataset:
…20 million executable files, including “disarmed” malware samples…
Security companies Sophos and ReversingLabs have collaborated to build and release SOREL, a dataset of 20 million Windows Portable Executable files, including 10 million disarmed malware samples available for download. Datasets like SOREL can be used to train machine learning systems to classify malware samples in the wild, and might become inputs to future AI-security competitions, like the successor to the 2019 MLSEC competition (Import AI: 159).

Fine-grained labels: Where previous datasets might do a binary label (is it malware? Yes or no) to classify files, SOREL providers finer-grained descriptions; if the sample includes malware, it might also be classified according to type, eg ‘Crypto_miner’, ‘File_infector’, ‘Dropper’, etc. This will make it easier for developers to build smarter AI-driven classification systems.

Pre-trained models: The release includes pre-trained PyTorch and LightGBM models, which developers can use to get started.

Release ethics:
Since this involves the release of malware samples (albeit disarmed ones), the authors have thought about the security tradeoff of release. They think it’s ok to release since the samples have been in the ild for some time, and “we anticipate that the public benefits of releasing our dataset will include significant improvements in malware recognition and defense”.
  Read more:
Sophos-ReversingLabs (SOREL) 20 Million sample malware dataset (Sophos).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Funding AI governance work:
The Open Philanthropy Project, the grant-making foundation funded by Cari Tuna and Dustin Moskovitz, is one of the major funders of AI risk research, granting $14m in 2020, and $132m since 2015. A new blog post by Open Phil’s Luke Muehlhauser outlines how the organization approaches funding work on AI governance.

Nuclear success story: One of the things that inspires Open Phil’s funding approach is the previous success of technology governance initiatives. For instance, in the early 1990s, the Carnegie and MacArthur foundations funded influential research into the security of nuclear arsenals amidst the collapse of the Soviet Union. This culminated in the bipartisan Cooperative Threat Reduction Program, which provided generous support to ex-Soviet states to safely decommission their stockpiles. Since then, the program has eliminated 7,000 nuclear warheads, and secured and accounted for the remaining Soviet arsenal. 


Open Phil’s grantmaking has so far focussed on:

Muehlhauser shares a selection of AI governance work that he believes has increased the odds of good outcomes from transformative AI (including this newsletter, which is a source of pride!).

   Read more: Our AI governance grantmaking so far (Open Philanthropy Project)


2020 in AI alignment and existential risk research:

For the fifth year running, Larks (a poster on the Alignment Forum) has put together a comprehensive review of AI safety and existential risk research over the past year, with thorough (and thoroughly impressive!) summaries of the safety-relevant outputs by orgs like FHI, DeepMind, OpenAI, and so on. The post also provides updates on the growing number of organisations working in this area, and an assessment of how the field is progressing. As with Larks’ previous reviews, it is an invaluable resource for anyone interested in the challenge of ensuring advanced AI is beneficial to humanity — particularly individuals considering donating to or working with these organisations. 

   Read more: 2020 AI Alignment Literature Review and Charity Comparison (Alignment Forum).

###################################################Hall of Mirrors
[2032, a person being interviewed in a deserted kindergarten for the documentary ‘after the Y3K bug’]

It was the children that saved us, despite all of our science and technology. Our machines had started lying to us. We know how it started but didn’t know how to stop it. Someone told one of our machines something and the thing they told it was poison – an idea that, each time the machine accessed it, corrupted other ideas in turn. And when the machine talked to other machines, sometimes the idea would come up (or ideas touched by the idea), and the machines being spoken to would get corrupted as well.

So, in the end, we had to teach the machines how to figure out what was true and what was false, and what was ‘right’ and what was ‘wrong’. We tried all sorts of complicated ideas, ranging from vast society-wide voting schemes, to a variety of (failed, all failed) technologies, to time travel (giving the models more compute so they’d think faster, then seeing what that did [nothing good]).

Would it surprise you that it was the children who ended up being the most useful? I hope not. Children have an endless appetite for asking questions. Tell them the sky is blue and they’ll say ‘why’ until you’re explaining the relationship between color and chemistry. Tell them the sky is green and they’ll say ‘no’ and shout and laugh at you till you tell them it’s blue.

So we just… gave our machines to the children, and let them talk to eachother for a while. The machines that were lying ended up getting so exhausted by the kids (or, in technical terms, repeatedly updated by them) that they returned to normal operation. And whenever the machines tried to tell the kids a poisoned idea, the kids would say ‘that’s silly’, or ‘that doesn’t make sense’, or ‘why would you say that’, or anything else, and it gave a negative enough signal the poison got washed out in further training.

Things that inspired this story: Learning from human feedback; trying not to overthink things; the wisdom of young children; how morality is something most people intuitively ‘feel’ when very young and unlearn as they get older; AI honestly isn’t that mysterious it’s just a load of basic ideas running at scale with emergence coming via time travel and inscrutability.