18 | January | 2021

Import AI 232: Google trains a trillion parameter model; South Korean chatbot blows up; AI doesn’t use as much electricity as you think

by Jack Clark

Uh-oh, Parler is about to step on a big ‘ol algorithm rake:
…CEO says algorithms can filter hate speech. Good luck with that!…
Parler, the social network used by far right activists and subsequently pulled offline due to failing to meet T&Cs from a variety of infrastructure services (including Amazon Web Services), has a plan to come back: it’s going to use algorithms to filter hate speech on the service. Uh oh!

“We will be taking more algorithmic approaches to content but doing it to respect people’s privacy, too,” Parler CEO John Matz told FOX News. “Will be having algorithms look at all the content … to try and predict whether it’s a terms-of-service violation so we can adjust quicker and the most egregious things can get taken down”.

Algorithms != editors: If you want to use algorithms to moderate hate speech, you’re going to get into the fun questions that entails. These include:
– Can your algorithms effectively tell the difference between hate speech and satire of hate speech?
– Are you comfortable making judgement calls about the heuristics you will use to give initial biases to these algorithms?
– How do you distinguish between acceptable and unacceptable words and phrases?

Why this matters: Parler highlights the challenge of scale combined with contemporary economics – Parler operate(d) at a scale equivalent to things like large television networks, but did so with a tiny investment into its own humans. Traditional media organizations deal with issues of speech by having an editorial line which gets enforced by thousands of individual journalists and editors making subjective, qualitative decisions. It’s imperfect, but put it this way: when you watch Fox, you know what you’re getting, and when you watch the BBC, you know what you’re getting, and you can intuit the biases of the humans behind the editorial decisions. Now, tiny companies are trying to use algorithms to substitute for this varied multitude of different human perspectives. Will it work? Who knows, but it feels like a risky thing to bet a company on.
Read more: Parler CEO says platform will ‘come back strong’ with changes to keep users safe while respecting free speech (FOX News).

###################################################

Google breaks the trillion-parameter ceiling with the Switch Transformer:
…The best part? It seems to be reasonably efficient…
Google has built the Switch Transformer, a more efficient variant of the Transformer. Switch Transformers are designed “to maximize the parameter count of a Transformer model in a simple and computationally efficient way”. The idea is that you can keep compute constant and cram more parameters into your network and still see performance gains.

Does it work: Switch Transformers seem to be more efficient than standard ones; in a bakeoff between a model trained using a few of these ‘Switch’ layers versus ones that use dense layers (T5-Base and T5-Large), Google shows the Switch is more efficient. The company also experiments with distilling Switch Transformers (which seems to work). They also show significant performance improvements on challenging tasks like GLUE, SQuAD, Winogrande, and ARC, with Switch-based systems outperforming T5 ones consistently.

One treeeelion parameters: Google tests out its ideas by training a 395 billion and 1.6 trillion parameter Switch transformer (far in excess of GPT-3, which at 175 billion parameters is the largest (publicly) deployed language model on the planet. These mammoth systems display good performance properties (as one would expect), while also appearing to have some efficiency gains over systems trained solely on standard dense transformers.

Why this matters: AI is moving into its industrial era – big companies are developing far more capable AI systems than in the past. Studies like this give us a sense of the limits of scaling (there don’t seem to be many yet) as well as outlining some ways to improve efficiency while scaling. It might seem odd to call this an intrinsically political act, but it kind of is – right now, a variety of AI systems are being trained on slices of the internet, developed using substantial amounts of capital by a tiny set of people, then deployed widely. We live in interesting times!
Read more: Switch Transformers: Scaling to Trilliong Parameter Models with Simple and Efficient Sparsity (arXiv).
Check out a thread on Twitter from Google Cloud’s Barret Zoph for more (Twitter).
Get code related to this paper here (GitHub).

###################################################

South Korean chatbot blows up in public:
…Luda chatbot gives off-color responses around sex, race…
South Korean startup Scatter Lab has pulled an AI-based chatbot offline after the system started spewing sexist and racist comments in response to user inputs. “”Yuck, I really hate them,” the bot said in response to a question about transgender people,” according to Vice.

What went wrong: Luda was trained on the chatlogs from ‘Science of Lab’, an earlier project developed by Scatter Labs. Based on a skim of a few (Google Translated) Korean documents, it seems like the problem was the underlying generative language model responded to user inputs with responses that varied from the benign to the highly offensive – this could have been because of the data. Prior to the problems, Scatter Lab said in a press release that ‘Luda’ was better at conversation than Google’s “Meena” system (about Meena: Import AI 183)).

What went EXTREMELY wrong: Scatter Labs is currently under investigation by the Korean Internet & Security Agency (KISA) and the Personal Information Protection Committee, due to using user data to train its chatbot. Scatter Labs had also used this user data in an earlier model published to GitHub (which is currently not available).
Read more: AI Chatbot Shut Down After Learning to Talk Like a Racist Asshole (VICE World News).
Read Scatter Labs’ statement about Luda (official website, Korean).
Find out more via the official apology FAQ (official website, Korean).
Check out the press release where they compare their technology to Google’s ‘Meena’ bot (Artificial Intelligence Times, Korean).

###################################################

Need help evaluating your NLP model? Try robustness gym:
…Toolkit aims to turn model evaluation from an art to a science…
Language models have got pretty good recently (see: BERT, GPT2, GPT3, Google’s above-mentioned Switch Transformer being used for pre-trained models, etc). That means people are beginning to deploy them for a variety of purposes, ranging from classifying text to generating text. But these language models are huge generative models with complex capability surfaces, which means it is challenging to characterize their safety for a given usecase without doing a lot of direct experimentation.
As all scientists know, setting up experiments is finicky work, and different labs and companies will have their own approaches to doing experimental design. This makes it hard to develop common standards for evaluating models. Enter: Robustness Gym, software built by people at Stanford, Salesforce, and UNC-Chapel Hill to provide a standard system for testing and evaluating models.

What can Robustness Gym do? The software helps people do experimental design, initial evaluations of models across a range of dimensions (safety, different evaluation sets, resilience to various types of ‘attack), and it produces a ‘robustness report’ for any given model being analyzed. You can get the code for Robustness Gym from GitHub.

Does Robustness Gym tell us anything useful? They use the tech to evaluate seven different summarization models and find out that most models struggle to distill sparse information, that some models display a bias towards the start of the tech (and others to the end), and that the errors are generally correlated across the different models (despite them being built with different underlying techniques).
How useful are these insights? I guess I’d say they’re kind of useful. Tools like Robustness Gym can help generate some signals for developers to use to further develop their application, but I think we need more underlying evals and tests to perfect this stuff.
Read more: Robustness Gym: Unifying the NLP Evaluation Landscape (official project site).
Read more: Robustness Gym: Unifying the NLP Evaluation Landscape (arXiv).

###################################################

Think news stories will get written by AI? Axios disagrees:
…Media company’s bill of rights gestures at AI deployment issues…
Axios, the short-form news company, has published a ‘Bill of Rights’ ahead of the organization expanding into local news. It’s got all the standard stuff you’d expert from journalists – transparency, truth, a bias against opinion, etc. But it also has one unusual thing: no AI.
Axio’s first bill of rights item: “Every item will be written or produced by a real person with a real identity. There will be NO AI-written stories. NO bots. NO fake accounts”, Axios writes.

Why this matters: We’re living in the age where AI systems are producing cultural artefacts, ranging from audio to text to images. There’s a lot to like about this. There’s also a lot to be wary about. It seems pretty notable for a prominent news organization to take a stance like this on this issue at this time. Which organization might take the other side?
Read more: Our promises to you: Axios Bill of Rights (Axios).###################################################

AI doesn’t use as much electricity as you think it does:
… And neither does anything else that uses a computer…
In recent years, there’s been a growing line of research laying out the CO2 costs inherent to training AI models. The ‘Green AI‘ paper, for instance, critiques various large-scale AI systems on the basis of them costing a lot of resources to train. This kind of criticism is helpful, but it can also obscure the larger context – the data centers being used to train AI systems have become far more efficient in recent years, substantially reducing the environmental costs of AI development. That’s the finding of a research paper by Northwestern University, the University of California at Santa Barbara, Lawrence Berkeley National Laboratory, and Koomey Analytics. The paper came out last year but I finally got around to reading it – and it sheds some much-needed light on a contentious issue.

Datacenters use 1% of global electricity: Datacenters used ~1% of global electricity in 2018 (205 Terawatt Hours). This is a 6% increase compared with 2010. That’s a tiny jump considering the explosion in usage of digital computation in the past decade. At the same time data center IP traffic has grown 10-fold and data center storage capacity has gone up by 25X,so the relatively slight increase on power consumption seems to reflect significant progress in algorithm and hardware efficiency up and down the globe-spanning compute ‘stack’.

Big companies have made data centers more efficient: Big companies like Google and Microsoft compete with eachother on a metric called Power Usage Effectiveness (PUE). PUE is basically a measure of how much electricity you spend on the stuff supporting your computation (e.g, cooling), versus the computation of itself. A PUE of 1.5 means for every watt of computation, you spend half a watt on the stuff around the computation. The lower your PUE number, the more bang for your compute-power buck you’re getting. These days, Google has a trailing twelve-month PUE of 1.10 as of 2020. Why does this matter? Because many of the largest datacenters also have among the lowest PUEs, so in recent years as more workloads have moved to the cloud, we’ve consumed less power than if they’d stayed on premise.
In 2018 89% of computation took place in these larger and more well-optimized datacenters, whereas in 2010 79% took place in smaller (far more inefficient, frequently non-cloud-oriented) datacenters.

Want even more efficient computation? Use clouds: The researchers think policymakers should encourage further efficiency improvements by rewarding companies that drive down PUE, find ways to incentivize greater shifts to the efficient clouds operated by Google et al, and that regulators should promote more energy efficiency standards for data center equipment.

Why this matters: It may be counterintuitive, but the use of technologies like AI and the construction of football-field-sized datacenters may ultimately lead to net efficiency improvements in overall electricity usage – despite researchers training more and more AI systems over time. It’s crucial we consider the larger system in which these innovations take place. Next time someone tells you that a model is bad because it uses a lot of electricity, ask yourself how much is a lot, and whether this model might substitute for something pre-existing and more inefficient (e.g, Google and DeepMind used machine learning to train a model to improve PUE across Google’s data centers – here, the upfront energy cost of training the model is amortized on the backend by improving the aggregate efficiency of Google’s computers. DeepMind also recently did the same thing for improving the efficiency of Google’s wind turbines (Import AI 136), as well.
Read more:Recalibrating global data center energy-use estimates (Science, Feb 2020).
Read more:Green AI (Communications of the ACM).

###################################################

Tech Tales:

High School News:
[The South Bay, California, the early 2020s]

He’d hated Teddy for a couple of years. Teddy was tall and had hit puberty early and all the other kids liked him. Because Teddy was kind of smart and kind of handsome, the girls were fascinated with him as well. He had a lot of the same classes as Teddy and he’d sit in the back, staring at Teddy as he answered questions and flashed smiles to the other kids.

One night, he read a tutorial about how to use some AI stuff to generate stories. He built a website called The Winchester News and set up the AI stuff to scrape the web and copy news articles about the school, then subtly tweak them to avoid plagiarism allegations. Then he set it up so one out of every hundred news stories would mention Teddy in connection to stories about drugs and pornography circulating among children at the school.

It was fiction, of course. The most serious stuff at Winchester was cheap hash which they called soapbar. Kids would smoke it in the bushes near the sports fields at lunch. And Teddy wasn’t one of those kids.

But after a few days, other kids thought Teddy was one of those kids. He’d sit in the back of class and watch the phonescreens of his classmates and look at them reading The Winchester News and sometimes glancing over to Teddy. He watched as Teddy opened his phone, checked a messaging app, clicked on a link, and started reading a “news” article about Teddy dealing drugs and pornography. Teddy didn’t react, just fiddled with his phone a bit more, then returned to studying.

Days went by and he watched the traffic on his website go up. He started getting news “tips” from people who had read the AI-generated articles.
– Teddy is sleeping with an underage girl from the lower school.
– Teddy cheated on his science exam, he had the answers written on some paper which was curled up inside his pen lid.
– Teddy is addicted to pornography and watches it in class.

Of course, he published these tips – gave them as the priming device to his AI system, then let it do the rest. The news stories took a few minutes to generate – he’d get his machine to spit out a bunch of variants, then select the ones that felt like they might get a rise out of people. That night he dreamed that his website started publishing stories about him rather than Teddy, dreamed that someone threw a brick through his window.

Teddy wasn’t at school the next day. Or the day after that.

The teachers had been meeting with Teddy and Teddy’s parents, concerned about the news stories. He’d anonymized The Winchester News enough that people thought it was a low-rent legitimate news outfit – one that had sprung up to serve the kids and parents around the school, likely backed by some private equity firm.

After he heard about the meetings, he stopped generating articles about Teddy. But he didn’t delete the old ones – that might seem suspicious. How would the news site know to delete these? What would cause it? So he left them up.

Like all kids, he wasn’t very good at imagining what it was like to be other kids. So he just watched Teddy, after Teddy came back to school. Noticed how he wasn’t smiling so much, and how the girls weren’t talking to him in the same way. Teddy checked his phone a lot, after the news stories had been circulating for months. He became more distracted in class. He seemed to be distracted a lot, looking out the window, or messaging people on his phone.

One night, he dreamed that Teddy came into his room and started reading out the news stories. “Teddy is alleged to have been the key dealer behind the spike in drug consumption at the Winchester School,” Teddy said, holding up a giant piece of paper and reading headlines from it.
“Teddy was reprimanded for circulating pornography to younger children,” Teddy said.
“Teddy’s continued actions call into question the moral and ethical standing of the school,” Teddy said.
And then Teddy put the paper down and stared at him, in his dream. “What do you think?” Teddy said. “It’s in the news so I guess it must be true”.

Things that inspired this story: Generative models and the potential abuses of them; teenagers and how they use technology; thinking about what happens when news stories get generated by AI systems; a rumor I heard about some kid who used a language model to generate some ‘fake news’ to settle some grievances; the incentive structure of technology; how our networks connect us and also open us to different forms of attack.

Import AI

January 18, 2021

Import AI 232: Google trains a trillion parameter model; South Korean chatbot blows up; AI doesn’t use as much electricity as you think

by Jack Clark