Import AI 279: Baidu adds knowledge to a language model; US military + AI; how China thinks about AI governance

Happy New Year! I took the end of 2021 off to think, read, relax, and eat. I hope readers found some time to do the same. I expect I’m going to change some things up around Import AI this year – it’s going to get weirder, more specific, and hopefully more valuable! I’m also going to finesse the short story collection I’ve been putting together, based on the tech tales in this newsletter. Good luck to all readers for their own 2022 plans – we’ll go on this journey together!

#############################

Here’s how to build GPT-3 in the open:
…What’s it like replicating GPT-3? It’s extremely difficult!…
BigScience, an initiative to train a GPT-3-scale model on a public supercomputer, is currently trying to train a 104B model. Training models at this scale is something of an artisanal science, with lots of researchers working from hard-won rules of thumb in-tandem with things like scaling laws. Here’s a nice ‘lessons learned’ writeup from BigScience on the challenges it has faced in training, 13B and 104B-scale models so far.
  Read more: Lessons learned (BigScience, GitHub).

####################################################

BAIDU’s shows how to inject more knowledge into a language model:
…ERNIE 3.0 shows how to teach a big neural net to use an external knowledge base…
Baidu has developed ERNIE 3.0, an AI model that can use an external knowledge base to help it provide more accurate answers. Last year, an ERNIE 3.0 model won the highly competitive SuperGLUE challenge (Import AI 259). The special thing about ERNIE is that it fuses a big GPT-3-esque language model with a large external knowledge base.

Massive scale:
Baidu has also developed ERNIE 3.0 ‘Titan’, a 260 billion parameter model that, Baidu says, “is the largest Chinese dense pre-training model as far as we know”. In tests, ERNIE 3.0 Titan gets state-of-the-art results on a vast set of benchmarks that evaluate skills as diverse as question answering, text generation, text summarization, interpretation, and dialogue.

Novel, heterogeneous chip cluster:
Another interesting thing about this paper is the chips they train on – V100s and Huawei ‘Ascend’ processors. It’s quite unusual to see hybrid training of this form, and it seems like Baidu felt it was interesting enough to invest some engineering resources in making it possible – the company augmented its ‘PaddlePaddle’ AI framework with ” distributed training technology, including fine-grained parallelism, heterogeneous hardware-aware training, and fault tolerance mechanism to train the 260B model on both Nvidia V100 GPU and Ascend 910 NPU clusters.”

Why this matters:
Most people seem to act like GPT-3 models are exclusively being developed by a small set of Western actors, most of whom get tagged using the pejorative ‘big tech’ brush. But papers like this show that GPT-3 models are a global phenomenon. We should remember that the world we live in is going to be increasingly defined by different cultures expressing themselves through increasingly large, sophisticated AI models.
  Read more:
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation (arXiv).

####################################################

Why smaller can be smarter for real-world AI (here: computer vision for quality control on solar panels):
…When 1 million parameters can beat 100 million parameters…
The past few years of AI has been distinguished by the ‘bigger is better’ phenomenon, as companies develop ever-larger models that consumer ever-larger amounts of compute and data. Now, a paper from researchers with Friedrich-Alexander University Erlangen-Nuremberg in Germany reminds us that bigger isn’t always better – especially when it comes to real-world, applied AI. In this paper, they compare different approaches to building an image classifier that can spot defects in solar panels.

What they did: They trained a simple 8-layer convolutional neural net on a dataset of 4341 original, labeled images from a solar plant. The ~4000 images were each labeled with one of eight classes (e.g, ‘good’, ‘crack’, ‘splinter’, et cetera. They then applied a significant amount of data augmentation to enhance the size of the dataset.

How well did it do? Their custom, simple network outperformed a network based on VGG-architecture model pre-trained on the vast ImageNet dataset. This is interesting, because a common practice in AI research is to finetune domain-specific classifiers from generic ones based on ImageNet. Here, we find that their system gets better precision (0.971 versus 0.990), while having 100X fewer parameters (1,707,208 versus 138,357,544) and being significantly smaller in terms of memory footprint (~16MB versus 800MB). All this nets out to a network that is smarter, as well as more performant (inference of 0.50ms, versus 9.13ms).

Why this matters: Papers like this remind us that a little bit of thoughtful engineering goes a long way in AI, and we should bear in mind that while increasingly large networks are interesting, they’re not the only game in town when it comes to building things that have real economic value. “We expect that the following years will demand for more research on edge analytics. This means that more research will be needed on small, yet powerful artificial neural networks for industry cases”, they write.
  Read more: A Light in the Dark: Deep Learning Practices for Industrial Computer Vision (arXiv).

####################################################

What’s the US military going to do about AI? The NDAA holds a clue.
…AI education! Procurement! Data storage! And more…
Every year, the somewhat dysfunctional US congress always manages to pass a bill – the National Defense Authorization Act. This bill (which weighs in at around $800bn in annual outlay) is the thing that funds the US military. Therefore, the NDAA has become one of the main pieces of legislation to look at when trying to understand how the US military thinks about – and will work on – frontier technologies like AI. An analysis of the 2021 NDAA from Stanford’s ‘HAI’ center gives us a sense of what’s happening in AI and the US military.

What the NDAA says is going to happen: Some highlights from this years NDAA include:
– The DoD is going to trial different ways of procuring AI technology
– The DoD will create ‘executive education activities’ to help senior officials understand AI.
– The DoD will do a comparative analyses of the US and China’s efforts to deploy things relating to directed energy systems, hypersonics, cyberspace, and other frontier areas
– Creating an assessment of the “current and emerging office and defensive cyber posture of U.S. adversaries”
– Build DoD infrastructure to “support state-of-the-art tools and modern processes to enable the testing of AI capabilities”.
– “Evaluate the feasibility and advisability of creating DOD data repositories, available to public and private entities, to facilitate the development of AI capabilities.”

Why this matters: The US military is a lot like a supertanker – it’s slow, huge, and unwieldy. But once it starts to turn, boy does it turn! This NDAA analysis shows us the DoD is beginning to turn its attention and significant resources towards AI, which will have significant downstream implications for the nature of conflict and the way that future wars are conducted (and, eventually, learned).
  Read more: Summary of AI Provisions from the National Defense Authorization Act 2022 (Stanford HAI blog).

####################################################

What is China going to do about AI governance?
…China might do more ambitious tech regulations than the West…
Here’s a nice summary from the Carnegie Endowment for International Peace about what three prominent Chinese policy organizations are doing with regard to AI governance.

Cyberspace Administration of China (CAC): Last year, it released 30 rules for regulating internet recommendation algorithms, and also developed a three-year roadmap for governing other complex algorithms deployed at internet scale. This would be analogous to a Western government publishing a list of specific rules for regulating, for example, Facebook’s recommendation engine. Ambitious!

China Academy of Information and Communications Technology (CAICT): This organization released a  whitepaper on trustworthy AI – this is mostly notable because it’s in distribution with what other major regulators in other geographies are thinking about.

Ministry of Science and Technology (MOST): This organization released some guidelines for universities and companies on internal reviews around ethics issues relating to technology, as well as a fairly high-level description of some ethical norms for AI development.

Why this matters: “The potential impact of these regulatory currents extends far beyond China. If the CAC follows through on certain requirements for algorithmic transparency and explainability, China will be running some of the world’s largest regulatory experiments on topics that European regulators have long debated,” Matt Sheehan of Carnegie writes. Running regulatory experiments is a big deal – Western governments did a tiny bit of this after the great financial crisis in 08/09, but have done relatively little about technology governance. I think China has a good chance of defining what ambitious, applied tech regulation looks like.
  Read more: China’s New AI Governance Initiatives Shouldn’t Be Ignored (Carnegie Endowment for International Peace).

####################################################

The Last Tower Defense Fighter

[Historical analysis written in 2080 and stored in the archives at Iceland, at the Orbital Archive, and in the hardened repositories on Moon4 and Mars1.]

Back in the late 2020s there were a bunch of tower defense games that got pretty big. They always worked in the same way: you, the player, can see a landscape from overhead, and you need to place various weapons around it. Meanwhile, the enemies make there way across the landscape, following loosely described paths across a variety of different scenes – narrow trenches dug between mountains, wide roads across countryside, right-angled streets in urban centers. 

With these games, you get a high score in relation to how many enemies you kill, and if any of the enemies get to the ‘end’ of a course (usually, the bottom of a screen), you lose – the implication is that you die. 

Anyway, in around 2028 one of the big games built some add-ins for its league. Now, if you were one of the players in the elite-tier of the game, you’d get the opportunity to play in matches where there were cash prizes – these matches were advertised as being extraordinarily difficult, with more enemies on screen than in the normal game, larger and more complex maps, and sometimes the enemies were able to use powerups that meant they could attack your own towers and take them down. 

It was a sensation. Everyone wanted to play the game within a game. Kids all around the world streamed themselves playing the game for hours, as they tried to get good enough to have a shot at entering the league within the league. 

By the end of 2028, streams of league players were pulling in millions of concurrent viewers. A whole industry formed where people commentated about the games. Sometimes people overcame great odds and won – then they’d publish videos of themselves with their cash prizes and what they spent them on; sport cars, fine dining, nice hotels, and all the usual tchotchkes of people who come into some fast money. 

In 2029, there was a leak out of the Department of Defense that pulled the cover off. It turned out this game was actually a stealth DoD project. The normal games were helping the DoD train various strategic AI systems, which it used in planning for logistics and munitions placement during conflict. No one was very surprised by this. Back in that decade, most things that got big on the internet were fronts. 

What did surprise people was the leak about the cash league – the cash league was real. Real in the sense that the ‘monsters’ in the game were real – they were real people that the United States happened to be fighting. Whenever someone was playing the game, their commands were being converted to a different, real-world environment, where they marshalled the combined munitions of drones, sniper teams, artillery, tanks, jets, and all the other machinery of the military. And when their towers were destroyed, real Americans were dying – blown up by grenades or IEDs or RPGs, or any of the other ways people killed eachother, back then.

Of course, there was an outcry – for a while. Player numbers dipped for a while. But the number of spectators increased. And the US military, having struggled publicly for years with backwards technology and difficulty in recruitment, doubled down.
  “We need these people to protect our country,” the Pentagon said, one day. “Without the next generation, we’ll lose the next generation”.

A few years later, the enemies of the US followed in its footsteps. There were games where you stole people. Games where you had to try and find a spy moving through a bustling, crowded urban area. Games where you had to execute someone and then exfiltrate the footage of their execution to a friendly intermediary.

What inspired this story: Tower defense games like Bloons and Kingdom Rush; domain randomization; the remorseless logic of multi-country non-hot military conflict; the Last Starfighter (movie); fine-tuning; pre-training and few-shot adaptation; propaganda and the need to present the most dangerous beliefs via play or theatre or anything else that can elide and delight.