Import AI 215: The Hardware Lottery; micro GPT3; and, the Peace Computer

Care about the future of AI, open source, and scientific publication norms? Join this NeurIPS workshop on Publication Norms :
The Partnership on AI, a membership organization that coordinates AI industry, academia, and civil society, is hosting a workshop at NeurIPS this year about publication norms in AI research. The goal of the workshop is to help flesh out different ways to communicate about AI research, along with different strategies for publishing and/or releasing the technical components of developed systems. They’ve just published a Call for Papers, so if you have any opinions on the future of publication norms and AI, please send them in. 

What questions are they interested in? Some of the questions PAI is asking include: “What are some of the practical mechanisms for anticipating future risks and mitigating harms caused by AI research? Are such practices actually effective in improving societal outcomes and protecting vulnerable populations? To what extent do they help in bridging the gap between AI researchers and those with other perspectives and expertise, including the populations at risk of harm?”
  Read more: Navigating the Broader Impacts of AI Research (NeurIPS workshop website).
  Disclaimer: I participate in the Publication Norms working group at PAI, so I have some bias here. I think longtime readers of this newsletter will understand my views – as we develop more powerful technology, we should invest more resources into mapping out the implications of the technology and communicating this to people who need to know, like policymakers and the general public.

Want different publication norms? Here are some changes worth considering:
…And here are the ways they could go wrong…
How could we change publication norms to increase the range of beneficial impacts from AI research and reduce the downsides? That’s an idea that the Montreal AI Ethics Institute (MAIEI) has tried to think through in a paper that discusses some of the issues around publication norms and potential changes to the research community.

Potential changes to publication norms: So, what changes could we implement to change the course of AI research? Here are some ideas:
– Increase paper page limits to let researchers include negative results in papers.
– Have conferences require ‘broader impacts’ statements to encourage work in this area.
– Revamp the peer-review process
Use tools, like benchmarks or the results of third-party expert panels, to provide context about publication decisions

How could changes to publication norms backfire? There are several ways this kind of shift can go wrong, for example:
– Destroy science: If implemented in an overly restrictive manner, these changes could constrain or halt innovation at the earliest stages of research, closing off potentially useful avenues of research.
– Black market research: It could push some types of perceived-as-dangerous research underground, creating private networks.
– Misplaced accountability: Evaluating the broader impacts of research is challenging, so the institutions that could encourage changes in publication norms might not have the right skillsets. 
  Read more: Report prepared by the Montreal AI Ethics Institute (MAIEI) for Publication Norms for Responsible AI by Partnership on AI (arXiv).

###################################################

How good is the new RTX3080 for deep learning? This good
Puget Systems, a custom PC builder company, has evaluated some of the new NVIDIA cards. “Initial results with TensorFlow running ResNet50 training looks to be significantly better than the RTX2080Ti,” they write. Check out the post for detailed benchmarks on ResNet-50 training in both FP16 and FP32.
  Read more: RTX3080 TensorFlow and NAMD Performance on Linux (Puget Systems, lab blog)

###################################################

The Hardware Lottery – how hardware dictates aspects of AI development:
…Or, how CPU-led hardware development contributed to to a 40-year delay us being able to efficiently train large-scale neural networks…
Picture this: it’s the mid-1980s and a group of researchers announce to the world they’ve trained a computer to categorize images using a technology called a ‘neural network’. The breakthrough has a range of commercial applications, leading to a dramatic rise in investment in ‘connectionist’ AI approaches, along with development of hardware to implement the matrix multiplications required to do efficient neural net training. In the 1990s, the technology is turned into production and, though very expensive, finds its way into the world, leading to a flywheel of investment into the tech.
  Now: that didn’t happen then. In fact, the above happened in 2012, when a team from the University of Toronto demonstrated good results on the ‘ImageNet’. The reason for their success? They’d figured out how to harness graphical processing units (GPUs) to do large-scale parallelized neural net training – something the traditional CPUs are bad at because of their prioritization of fast, linear processing.

In ‘The Hardware Lottery’, Google Brain researcher Sara Hooker argues that many of our contemporary AI advances are a product of their hardware environment as well as their software one. But though researchers spend a lot of time on software, they don’t pay as much attention as they could to how our hardware substrates dictate what types of research are possible. “Machine learning researchers mostly ignore hardware despite the role it plays in determining what ideas succeed,” Hooker says, before noting that our underlying hardware dictates our ability to develop certain types of AI, highlighting the neural net example.

Are our new computers trapping us? Now, we’re entering a new era where researchers are developing chips even more specialized for matrix multiplication than today’s GPUs. See: TPUs, and other in-development chips for AI development. Could we be losing out on other types of AI as a consequence of this big bet on a certain type of hardware? Hooker thinks this is possible. For instance, Hooker notes that Capsule Networks – an architecture that includes “novel components like squashing operations and routing by agreement” which aren’t trivial to optimize for GPUs and TPUs, leading to less investment and attention from researchers.

What else could we be spending money on? “More risky directions include biological hardware, analog hardware with in-memory computation, neuromorphic computing, optical computing, and quantum computing based approaches,” Hooker says.
  Read more: The Hardware Lottery (arXiv).

###################################################

Better-than-GPT3 performance with 0.1% the number of parameters:
…Sometimes, small is beautiful, though typically for specific tasks…
This year, OpenAI published research on GPT-3, a class of large language models pre-trained on significant amounts of text data. One of the notable things about GPT-3 was how it did very well on the difficult multi-task SuperGLUE benchmark without SuperGLUE-specific pre-training – instead, OpenAI loaded SuperGLUE problems into the context window of an already trained GPT-3 model and tried to get it to output the correct answer.
  GPT-3 did surprisingly well at this, but at a significant cost: GPT3 is, to use a technical term, a honkingly large language model, with the largest version of it coming in at 175 BILLION parameters. This makes it expensive and challenging to run.

Shrinking GPT-3-scale capabilities from billions to millions of parameters: Researchers with the Ludwig Maximilian University of Munich have tried to see if they can match or exceed the results of a GPT-3 model, but with something far smaller and more efficient. Their approach fuses a training technique called PET (pattern-exploiting training) with a small pre-trained Albert model, letting them create a system that “outperform GPT-3 on SuperGLUE with 32 training examples, while requiring only 0.1% of its parameters”.

Comparisons:
– PET: 223 Million parameters, 74.0 average SuperGLUE score.
– GPT3: 175 Billion parameters, 71.8 average SuperGLUE score.

Why this matters: This project highlights some of the nice effects of large-scale AI training – it creates information about what very large and comparatively simple models can do, which creates an incentive for researchers to come up with smarter, more efficient, and more specific models that match the performance. That’s exactly what is going on here. Now, PET-based systems are going to have fewer capabilities than large-scale GPT-architectures broadly, but they do indicate ways we can get some of the same equivalent capabilities as these large models via more manageably sized ones.
  Read more: It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners (arXiv).
  Get the few-shot PET training set here (Pattern-Exploiting Training (PET), GitHub).

###################################################

Wound localization via YOLO:
…Better telemedicine via deep learning…
If we want telemedicine to be viable, we need to develop diagnostic tools that patients can use on their own smartphones and computers, letting them supply remote doctors with information. A new project from an interdisciplinary team at the University of Wisconsin chips away at this problem by developing a deep learning system that does wound localization via a smartphone app.

Finding a wound – another type of surveillance: Wound localization is the task of looking at an image and identifying a wounded region, then segmenting out that part of the image and classifying it for further study. This is one of those classic tasks that is easy for a trained human but – until recently – challenges a machine. Thanks to recent advances in image recognition, we can now use the same systems developed for object detection for custom tasks like wound detection.

What they used: They  use a dataset of around ~1000 wound images, then apply data augmentation to this to expand it to around 4,000 images. They then test out a version of a YOLOv3 model – YOLO is a widely used object detection system – alongside a Single Shot Multibox Detector (SSD) model. They then embed these models into a custom iOS-based application which runs these models against a live camera feed. This app lets a patient use their phone to either take a picture or record a live video and runs detection against this.
  YOLO vs SSD: In tests, the YOLOv3 system outperformed SSD by a significant margin. “The robustness and reliability testing on Medetec dataset show very promising result[sic]”, they write.
  What’s next? “Future work will include integrating wound segmentation and classification into the current wound localization platform on mobile devices,” they write .
  Read more: A Mobile App for Wound Localization using Deep Learning (arXiv).

###################################################

DeepMind puts a simulation inside a simulation to make smarter robots:
Sure, you can play Go. But what about if you have to play it with a simulated robot?
Do bodies matter? That’s a huge question in AI research, because it relates to how we develop and test increasingly smart machines. If you don’t think bodies matter, then you might be happy training large-scale generative models on abstract datasets (e.g, GPT3 trained on a vast amount of text data). If you think bodies matter, then you might instead try to train agents that need to interact with their environment (e.g, robots).
    Now, research from DeepMind tries to give better answers to this question by combining the two domains, and having embodied agents play complex symbolic games inside a physics simulation – instead of having a system play Go in the abstract (as with AlphaGo), DeepMind now simulates a robot that has to play Go on a simulated board.

What they test on: In the paper, DeepMind tests its approach on Sokoban (MuJoBan), Tic Tac Toe (MuJoXO), and Go (MuJoGo).

Does this even work: In preliminary tests, DeepMind builds a baseline agent based on an actor-critic structure with the inclusion of an ‘expert planner’ (which helps the game agent figure out the right actions to take in the games it is playing). DeepMind then ties the learning part of this agent to the expert system via the inclusion of an auxiliary task to follow the expert actions in an abstract space. In tests, they show that their approach works well (in a sample efficient way) on tasks like Sokoban, Tic Tac Toe, and Go, though in one case (Tic Tac Toe) a naive policy outperforms the one with the expert.

DeepMind throws the embodied gauntlet: DeepMind thinks these environments could serve as motivating challenges for the broader AI research community: “. We have demonstrated that a standard deep RL algorithm struggles to solve these games when they are physically embedded and temporally extended,” they write. “Agents given access to explicit expert information about the underlying state and action spaces can learn to play these games successfully, albeit after extensive training. We present this gap as a challenge to RL researchers: What new technique or combination of techniques are missing from the standard RL toolbox that will close this gap?”
  Read more: Physically Embedded Planning Problems: New Challenges for Reinforcement Learning (arXiv).
  Watch the video: Physically embodied planning problems: MuJoBan, MuJoXO, and MuJoGo (YouTube).

###################################################

What does GPT-3 mean for AI progress?
GPT-3 is perhaps the largest neural network ever trained. Contra the expectations of many, this dramatic increase in size was not accompanied by diminishing or negative returns — indeed, GPT-3 exhibits an impressive capability for meta-learning, far beyond previous language models.

Some short-term implications:
  1) Models can get much larger — GPT-3 is expensive for an AI experiment, but very cheap by the standards of military and government budgets.
  2) Models can get much better — GPT is an old approach with some major flaws, and is far from an ‘ideal’ transformer, so there is significant room for improvement.
  3) Large models trained on unsupervised data, like GPT-3, will be a major component of future DL systems.

The scaling hypothesis: GPT-3 demonstrates that when neural networks are made very large, and trained on very large datasets with very large amounts of compute, they can become more powerful and generalisable. Huge models avoid many of the problems of simpler networks, and can exhibit properties, like meta-learning, that are often thought to require complicated architectures and algorithms. This observation lends some support to a radical theory of AI progress — the scaling hypothesis. This says that AGI can be achieved with simple neural networks and learning algorithms, applied to diverse environments at huge scale — there is likely no ‘special ingredient’ for general intelligence. So as we invest more and more computational resources to training AI systems, these systems will get more intelligent.

Looking ahead: The scaling hypothesis seems to have relatively few proponents outside of OpenAI, and may only be a settled question after (and if) we build AGI. Nonetheless, it looks plausible, and we should take seriously the implications for AI safety and governance if it turned out to be true. The most general implication is that AI progress will continue to follow trends in compute. This underlines the importance of research aimed at understanding things like — the compute requirements for human intelligence (Import 214); measuring and comparing compute and other drivers of AI progress (Import 199): trends in the cost of compute (Import 127).
  Read more: On GPT-3 – Meta-Learning, Scaling, Implications, And Deep Theory (Gwern)

Portland’s strict face recognition ban

Portland has approved a ban on the public and private use of face recognition technology in any “place of public accommodation” where goods or services are offered.The ban is effective from January 1st. This will be the strictest measure in the US — going further than places like Oakland, Berkeley, which have prohibited government agencies from using the tech. Oregon has already passed a statewide ban on the police use of body-cams with face recognition.

Read more: Portland approves strictest ban on facial recognition technology in the U.S. (The Oregonian)

Work at Oxford University’s Future of Humanity Institute

The Future of Humanity Institute — where I work — is hiring researchers at all levels of seniority, across all their research areas, including AI governance and technical AI safety. Applications close 19th October.

Read more and apply here.

Tell me what you think

I’d love to hear what you think about my section of the newsletter, so that I can improve it. You can now share feedback through this Google Form. Thanks to all those who’ve already submitted!

###################################################

Tech Tales:

The Peace Computer
[2035, An Underground Records Archive in the Northern Hemisphere]

They called it The Peace Computer. And though it was built in a time of war, it was not a weapon. But it behaved like one right until it was fired.

The Peace Computer started as a plan drawn on a whiteboard in a basement in some country in the 21st century. It was an AI program designed for a very particular purpose: find a way out of what wonks termed the ‘iterative prisoner’s dilemma’, that caused such conflict in international relations.

Numerous experts helped design the machine: Political scientists, ethnographers, translators, computer programmers, philosophers, physicists, various unnamed people from various murky agencies.

The Peace Computer started to attract enough money and interested that other people began to notice; government agencies, companies, universities. Some parts of it were kept confidential, but the whisper network meant that, pretty soon, people in other countries heard murmurings of the Peace Computer.

And then the Other Side heard about it. And the Other Side did what it had been trained to do by decades of conflict: it began to create its own Peace Computer. (Though, due to various assumptions, partial leaks, and mis-represented facts, it thought that the Peace Computer was really a weapon – no matter. It knew better. It would make its own Peace Computer and put an end to all of this.) 

Both sides were tired and frustrated by the war. And both sides were sick of worrying that if the war went on, one of them would be forced to use one of their terrible weapons, and then the other side would be forced to respond in kind. So both sides started dumping more money into their Peace Computers, racing against eachother to see who could bring  about an end to the war first.

The scientists who were building the Peace Computers became aware of all of this as well. They started thinking about their counterparts – their supposed enemies.
What are they doing? One of the scientists would think, programming something into the machine.
Will this hurt my counterpart? Thought someone on the other side. I worry it will. And is that fair?
  If you can make this, you must have spent a long time studying. Do you really want to hurt me? thought another scientist.
Maybe the Peace Computers can befriend eachother? thought another scientist.

But the pressure of the world pushed the different sides forward. And The Peace Computers became targets of sabotage and spying and disruption. Shipments of electronic chips were destroyed. Semiconductor manufacturing equipment was targeted. Bugs, some obvious and some subtle, were introduced everywhere.

Still, the nations raced against eachother. Various leaders on both sides used their Peace Computer timelines to ward off greater atrocities.
  “Yes, we could launch ArchAngel, but the Peace Computer will have the same strategic effect with none of the infrastructure cost,” said one of them.
  “Our options today are strictly less good from a deterrence standpoint than those we’ll have in a few months, when phase one of the system goes online”.

* * *

The records are vague on which side eventually ‘won’ the Peace Computer race. All we know is that tensions started to ratchet down at various fault lines around the world. Trade resumed. Someone, somewhere, had won.

Now we speculate whether we are like someone who has been shot and hasn’t realized it, or whether we have been cured. Due to the timelines on which the Peace Computer is alleged to work, the truth will be clearer a decade from now.

Things that inspired this story: The voluminous literature written around ‘AI arms races’; contemporary geopolitics and incentive structures; the pugwash conferences; Szilard;