Import AI 214: NVIDIA’s $40bn ARM deal; a new 57-subject NLP test; AI for plant disease detection

by Jack Clark

Should you buy NVIDIA’s new GPU? Read this and find out:
…Short answer: yes, though be prepared to cry a little upon opening your wallet…
Every year, NVIDIA announces some GPUs, and some machine learning researchers stare tearfully at the thousands of dollars of hardware they need to buy to stay at the frontier, then crack open their wallets and buy a card. But how, exactly, are NVIDIA’s new GPUs useful? Tim Dettmers has written a ludicrously detailed blog post which can help people understand what GPU to buy for Deep Learning and what the inherent tradeoffs are.

Is the Ampere architecture worth it? NVIDIA’s new ‘Ampere’ architecture cards come with a bunch of substantial performance improvements over their predecessors that makes it worth buying. Some particular highlights include: “sparse network training and inference. Other features, such as the new data types should be seen more as an ease-of-use-feature as they provide the same performance boost as Turing does but without any extra programming required,” writes Dettmers.
    Read more: Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning (Tim Dettmers, blog).

Plus: NVIDIA to acquire ARM for $40 billion:
…Acquisition may reshape the chip industry, though let’s check back in a year…
Late on Sunday, news broke that NVIDIA is going to acquire ARM from Softbank. ARM invents and licenses out chip designs to all of the world’s top phone makers and Internet-of-Things companies (and, increasingly, a broad range of PCs, and burgeoning server and networking chips). The acquisition gives NVIDIA control of one of the planet’s most strategically important semiconductor designers, though how well ARM’s design-license business model works alongside NVIDIA’s product business remains to be seen.
  “Arm will continue to operate its open-licensing model while maintaining the global customer neutrality that has been foundational to its success,” NVIDIA said in a press release.

What does this have to do with AI? For the next few years, we can expect the majority of AI systems to be trained on GPUS and specialized hardware (e.g, TPUs, Graphcore). ARM’s RISC-architecture chips don’t lend themselves as well to the sort of massively parallelized computing operations required to train AI systems efficiently. But NVIDIA has plans to change this, as it plans to “build a world-class [ARM] AI research facility, supporting developments in healthcare, life sciences, robotics, self-driving cars and other fields”.
  An ARM supercomputer? The company also said it “will build a state-of-the-art AI supercomputer, powered by Arm CPUs”. (My bet is we’ll see Arm CPUs as the co-processor linked to NVIDIA GPUs, and if NVIDIA executes well I’d hope to see them build a ton of software to make these two somewhat dissimilar architectures play nice with eachother).

Does this matter? Large technology acquisitions are difficult to get right, and it’ll be at least a year till we’ll have a sense of how much this deal matters for the broader field of AI and semiconductors. But NVIDIA has executed phenomenally well in recent years and the ever-growing strategic importance nations assign to computation means that, with ARM, it has become one of the world’s most influential companies with regard to the future of computation. Let’s hope they do ok!
  Read more: NVIDIA to Acquire Arm for $40 Billion, Creating World’s Premier Computing Company for the Age of AI (NVIDIA press release).

###################################################

Language models have got so good they’ve broken our benchmarks. Enter a 57-subject NLP benchmark:
…Benchmark lets researchers test out language models’ knowledge and capabilities in a range of areas… 
How can we measure the capabilities of large-scale language models (LMs)? That’s a question researchers have been struggling with as, in recent years, LM development has outpaced LM testing – think of how the ‘SQuAD’ test had to be revamped to ‘SQuAD 2.0’ in a year due to rapid performance gains on the dataset, or the ‘GLUE’ multi-task benchmark moving to ‘SuperGLUE’ in response to faster-than-expected progress. Now, with language models like GPT3, even things like SuperGLUE are becoming less relevant. That’s why researchers with UC Berkeley, Columbia, the University of Chicago, and the University of Illinois at Urbana-Champaign have developed a new way to assess language models.

One test to eval them all: The benchmark “ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability”. It consists of around 16,000 multiple choice questions across 57 distinct tasks. “These include practice questions for tests such as the Graduate Record Examination and the United States Medical Licensing Examination. It also includes questions designed for undergraduate courses and questions designed for readers of Oxford University Press books. Some tasks cover a subject, like psychology, but at a specific level of difficulty, such as “Elementary,” “High School,” “College,” or “Professional””, they write.

GPT-3: Quite smart for a machine, quite dumb for a human: In tests, GPT3 does markedly better than other systems (even obtaining superior performance to UnifiedQA, a QA-specific system), but the results still show our systems have a long way to go before they’re very sophisticated.
– 25%: Random baseline, guessing at answer out of four.
–  24.8%: ‘T5’, a multipurpose language model from Google.
– 38.5%:UnifiedQA‘, a question answering AI system
– 25.9%: GPT-3 small (2.7 billion parameters)
– 43.9%: GPT-3 X-Large (175 billion parameters).

Where language models are weak: One notable weakness in the evaluated LMs are “STEM subjects that emphasize mathematics or calculations. We speculate that is in part because GPT-3 acquires declarative knowledge more readily than procedural knowledge,” they write.
  Read more: Measuring Massive Multitask Language Understanding (arXiv).
  Get the benchmark here (GitHub).

###################################################

Could Anduril tell us about the future of military drones?
…Defense tech startup releases fourth version of its ‘Ghost’ drone…
Anduril, an AI-defense-tech startup co-founded by former Oculus founder Palmer Luckey, has released the ‘Ghost 4’, a military-grade drone developed in the US for the US government (and others). The Ghost 4 is a visceral example of the advancement of low-cost robotics and avionics, as well as the continued progression of AI software (both modern DL systems, and classical AI) in the domain of drone surveillance and warfare. Anduril raised $200 million earlier this summer (#205).

Fully autonomous: “Ghosts are fully autonomous,” Andrul says in a blog post about the tech. “Ghost is controlled entirely through the Lattice software platform and requires minimal operator training.”

Drone swarms: “Groups of Ghosts collaborate to achieve mission objectives that are impossible to achieve via a single unit. Additionally, Ghosts communicate status data with one another and can collaborate to conduct a “battlefield handover” to maintain persistence target coverage.”
    The Ghost 4 can be outfitted with a range of modules for tasks like SLAM (simultaneous location and mapping), electronic warfare, the addition of alternate radios, and more. Other objects can be attached to it via a gimbal, such as surveillance cameras or – theoretically (speculation on my part), munitions.

Why this matters: Anduril uses rapid prototyping, a hackey ‘do whatever works’ mindset, and various frontier technologies to build machines designed for surveillance and war. The products it produces will be different to those developed by the larger and more conservative defense contractors (e.g, Lockheed), and will likely be more public; Anduril gives us a visceral sense of how advanced technology is going to collide with security.
  Read more: Anduril Introduces Ghost 4 (Medium).
  Watch a video about Ghost 4 (Anduril, Twitter).

###################################################

What AI technologies do people use for plant disease detection? The classics:
…Survey gives us a sense of the lag between pure research and applied research…
Neural network-based vision systems are helping us identify plant diseases in the world – technology that, as it matures, will improve harvests for farmers and give them better information. A new research paper surveys progress in this area, giving us some sense of which techniques are being used in a grounded, real world use case.

What’s popular in plant disease recognition?

  • Frameworks: TensorFlow is the most prominent framework (37 out of 121), followed by Keras (25) and MatLab (22).. 
  • Classics: 26 of the surveyed papers use AlexNet, followed by VGG, followed by a new architecture, followed by a ResNert. 
  • Dataset: The most widely used dataset is the ‘PlantVillage‘ one (40+ uses).

Why this matters: Plant disease recognition is a long-studied, interdisciplinary task. Surveys like this highlight how, despite the breakneck pace of AI progress in the pure research fields, the sophistication of applied techniques runs at lag relative to pure research. For instance, many researchers are now using PyTorch (but its TF that shows up here), and things like pre-trained 50layer ResNets have been replacing AlexNet systems for a while.
Read more:Plant Diseases recognition on images using Convolutional Neural Networks: A Systematic Review (arXiv).

###################################################

OpenPhil: If your brain was a computer, how fast would it be?
…Or: If you wanted to make an AI system with a brain-scale computational substrate, what do you need to build?…
The brain is a mysterious blob of gloopy stuff that takes in energy and periodically excretes poems, mathematical insights, the felt emotions of love, and more. But how much underlying computation do we think it takes for an organ to produce outputs like this? New research from the Open Philanthropy Project thinks it has ballparked that amount of computational power it’d take to roughly be equivalent to the human brain.

The computational cost of the human brain: “more likely than not that 10^15 FLOP/s is enough to perform tasks as well as the human brain (given the right software, which may be very hard to create). And I think it unlikely (<10%) that more than 10^21 FLOP/s is required,” the author writes. (For comparison, a top supercomputer costing $1 billion can perform at 4×10^17 FLOP/s.

Power is one thing, algorithm design is another: “Actually creating/training such systems (as opposed to building computers that could in principle run them) is a substantial further challenge”, the author writes.
Read more:New Report on How Much Computational Power It Takes to Match the Human Brain (Open Philanthropy Project).

###################################################

OpenAI Bits&Pieces:

Gpt-f: Deep learning for automated theorem proving:
What happens if you combine transformers pre-training, and the task of learning to prove mathematical statements? Turns out you get a surprisingly capable system; gpt-f obtains 56.22% accuracy on a held-out test set versus 21.16% for current SOTA.

Proofs that humans like: GPT-f contributed 23 shortened proofs of theorems to the Metamath library. One human mathematician commented ““The shorter proof is easier to translate. It’s more symmetric in that it treats A and B identically. It’s philosophically more concise in that it doesn’t rely on the existence of a universal class of all sets,” and another said ““I had a look at the proofs—very impressive results! Especially because we had a global minimization recently, and your method found much shorter proofs nevertheless.”
  Read more:Generative Language Modeling for Automated Theorem Proving (arXiv).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

GPT-3 and radicalisation risk:
Researchers at the Center on Terrorism, Extremism, and Counterterrorism (CTEC) have used GPT-3 to evaluate the risk of advanced language models being used by extremists to promote their ideologies. They demonstrate the ease with which GPT-3 can be used to generate text that convincingly emulates proponents of extremist views and conspiracy theories.

Few-shot extremism: In zero-shot tests, GPT-3 tends to give fairly neutral, empirical answers when queried about (e.g.) the QAnon conspiracy. With short text prompts, GPT-3 can be biased towards a particular ideology—e.g. having been fed posts from a neo-Nazi forum, it generated convincing discussions between users on a range of topics, all within the bounds of the ideologies promoted on the forum. The researchers note that GPT-3 allows users to get performance using only a few text prompts, that required several hours of training with GPT-2.

Mitigation: By allowing access only through an API, not sharing the model itself, and retaining the ability to limit and monitor use, Open AI’s release model can support important safeguards against misuse. However it won’t be long before powerful language models are more widely available, or open-sourced. We should be using this time wisely to prepare — by ramping up defensive technologies; better understanding the potential for harm in different domains (like radicalisation); and fostering cooperation between content platforms, AI researchers, etc., on safety.

Matthew’s view: Advanced language models empower individuals, at very low cost, to produce large amounts of text that is believably human. We might expect the technology to have a big impact in domains where producing human-like text is a bottleneck. I’m not sure what the net effect of language models will be on radicalisation. It might be useful to look at historic technologies that dramatically dropped the cost of important inputs to the process of spreading ideologies  — e.g. the printing press; encryption; the internet. More generally, I’m excited to see more research looking at the effects of better language models on different domains, and how we can shape things positively.
Read more: The radicalization risks of GPT-3 and advanced neural language models (Middlebury)

Tell me what you think about my writing:
I’d love to hear what you think about my section of the newsletter, so that I can improve it. You can now share feedback through this Google Form.

###################################################


Tech Tales:

The Ghost Parent
[The New Forest, England, 2040]

The job paid well because the job was dangerous. There was a robot loose in the woods. Multiple campers had seen it, sometimes as a shape in the night with its distinctive blue glowing eyes. Sometimes at dawn, running through fog.
  “Any reports of violence?” Isaac said to the agency.
  “None,” said the emissary from the agency. “But the local council needs us to look into it. Tourism is a big part of the local economy, and they’re worried it’ll scare people away.”

Isaac went to one of the campsites. It was thinly populated – a few tents, some children on bikes cruising around. Cooking smells. He pitched his tent on a remote part of the site, then went into the forest and climbed a tree. Night fell. He put his earbuds in and fiddled with an application on his phone till he’d tuned the frequencies so he was hyper-sensitive to the sounds of movement in the forest – breaking twigs, leaves crushed under foot, atypical rhythms distinct from that caused by wind.
  He heard the robot before he saw it. A slow, very quiet shuffle. Then he saw a couple of blue eyes in the darkness. They seemed to look at him. If the robot had infrared it probably was looking at him. Then the eyes disappeared and he heard the sound of it moving away.
  Isaac took a breath. Prayed to himself. Then got down from the tree and started to follow the sound of the departing robot. He tracked it for a couple of hours, until he got to the edge of the forest. He could hear it, in the distance. Dawn was about to arrive. Never fight a robot in the open, he thought, as he stayed at the forest’s edge.
  That’s when the deer arrived. Again, he heard them before he saw them. But as the light began to come up he saw the deer coming from over a hill, then heard the sound of the robot again. He held his breath. Images of deer, torn apart by metal hands, filled his head. But nothing happened. And as the light came up he saw the deer and the robot silhouetted on the distant hill. It seemed to be walking, with one of its arms resting, almost casually, on a deer’s back.

Isaac went back to his tent and slept for a few hours, then woke in the late afternoon. That night, he went to the forest and climbed a tree again. Perhaps it was the cold, or perhaps something else, but he fell asleep almost immediately. He was woken some hours later by the sound of the robot. It was walking in the forest, very near his tree, carrying something. He held up his phone and turned on its night vision. The robot came into focus, cradling a wounded baby deer in its arms. One of the deer’s legs was dripping with blood, and had two half circle gouges in it. The robot continued to walk, and disappeared out of view, then out of sound. Isaac got down from the tree and followed the trail of blood from where the robot had come from – he found an animal trap that had been disassembled with two precise laserbeam cuts – only a robot could do that.

“It’s bogus,” he said to the agency on the phone. “I spent a few days here. There’s a lot of deer, and I think there’s a person that spends time with them. Maybe they’re a farmer, or maybe a homeless person, but I don’t think they’re harming anyone. In fact, I think they’re helping the deer.”
  “Helping them?”
  Isaac told them about the trap, and how he’d seen a hard to makeout person save a deer.
  “And what about the blue eyes?”
  “Could be some of those new goggles that are getting big in China. I didn’t ask them, but it seemed normal.”
  The agency agreed to pay him half his fee, and that was that.

Years later, when Isaac was older, he took his family camping to the New Forest. They camped where he had camped before, and one evening his kid came running to the bonfire. “Dad! Look what I found!”
  Behind the kid was an adult deer. It stood at the edge of the light of the fire, and as the flames flickered Isaac saw a glint on one of the deer’s legs – he looked closer, and saw that its lower front leg was artificial – a sophisticated, robot leg, that had been carefully spliced to what seemed to be a laser-amputated joint.
  “Wow,” Isaac said. “I wonder how that happened?”

Things that inspired this story: Nature and its universality; notions of kinship between people and machines and animals and people and animals and machines; nurturing as an objective function; ghost stories; the possibility of kindness amid fog and uncertainty; the next ten years of exoskeleton development combining with battery miniaturization and advanced AI techniques; objective functions built around familiarity; objective functions built around harmony rather than winning