Import AI 287: 10 exaflop supercomputer; Google deploys differential privacy; humans can outsmart deepfakes pretty well

Graphcore plans a 10 exaflop supercomputer:

…And you thought Facebook’s 5 exaflops were cool…
Graphcore has announced a plan to build the so-called “Good Computer” in 2024. This computer will have 10 exaflops of what Graphcore calls AI floating point compute (and what literally everyone else calls mixed-precision compute, meaning the computer mostly does a lot of b16 ops with a smattering of b32 ops, versus the b64 ops done by typical supercomputers). The ‘Good Computer’ will also have 4 petabytes of memory, support AI models with sizes of up to 500 trillion parameters, and will cost ~$120 million, depending on configuration.

Why this matters: Graphcore is one of the small number of companies that design their own processors. Graphcore’s so-called Intelligence Processing Units (IPUs) have been around for a while, but it’s not clear yet how much traction the company has in the market. The Good Computer is a sign of its ambitions (and to put it into perspective, Facebook this year announced plans to build its own 5 exaflop ‘AI supercomputer’ over next couple of years (#282)). The future is going to be ruled by the people that can wield this vast amount of computational power effectively.
  Read more: Graphcore Announces Roadmap To Ultra Intelligence AI Supercomputer (Graphcore blog).

####################################################

AI industrialization: Cutting AlphaFold training time from 11 days to 67 hours:
…First you make the new thing, then others refine it…
One common hallmark of industrialization is process refinement – first you build a thing, like a new type of engine, then you work out how to make it cheaper and easier to produce in a repeatable way. New research from National University of Singapore, HPC-AI Technology Inc, Helixon, and Shanghai Jiao Tong University applies this to AlphaFold – specifically, they built FastFold, which reduces the amount of time it takes to train the open source version of DeepMind’s AlphaFold from ~11 days to ~67 hours. This isn’t remarkable, but it’s notable as a stand-in for what happens with pretty much every AI system that gets released – it comes out, then people make it way cheaper. “To the best of our knowledge, FastFold is the first performance optimization work for the training and inference of protein structure prediction models,” they write.  FastFold also gets a 7.5 ∼ 9.5× speedup for long sequences

What they did: This paper is basically a kitchen sink of improvements based on a detailed study of the architecture of AlphaFold.

One caveat: This is comparing the official DM AlphaFold implementation on 128TPUv3 cores versus 512 A100s (though with a further caveat the times are different; aggregate 20738 GPU hours versus 33792 TPU hours). The tl;dr is it’s likely a significant reduction in training time (and the code is available), though it’d be nice to see some third-parties benchmark this further.

Why this matters: For AI to truly influence the world, AI models need to become reliable and repeatable to train, and ideally for people willing to spend on the hardware, fast to train. That’s what’s going on here.
  Read more: FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours (arXiv).
  Get the code here: FastFold (GitHub).

####################################################

Cohere announces its latest language model – but doesn’t say much about it:
…’Extremely Large’ is, tautologically, Extremely Large…
Language-model-as-a-service startup Cohere has announced a new model, its ‘Extremely Large’ model. Extremely Large outperforms Cohere’s ‘Large’ model on tasks ranging from named entity recognition to common sense reasoning. Cohere recently announced a new fundraise (#285) and CEO Aidan Gomez told Fortune that “Getting into a ‘largest model’ battle isn’t productive”. It seems Cohere are living by their values here.

Why this matters: Like it or not, Cohere is in a competitive market, as it tries to sell access to its language model and out-compete rivals like AI21 Labs, OpenAI, CoreWeave, and others. It’ll be interesting to see if ‘Extremely Large’ makes a splash, and I’d be curious to see more benchmarks that evaluate its performance more broadly.
  Read more:
Cohere launches Extremely Large (Beta) (Cohere blog).

####################################################

Google puts differential privacy into (prototype) production:
…Here’s one way the company can get ahead of regulators…

Federated learning is where you train a neural network model on a mixture of local devices (e.g, phones), and central devices (e.g, servers). Differential privacy (DP) is where you fuzz this data such that you can’t infer the original data, thus protecting user privacy. Google has just announced that it has successfully smushed these two technologies together, allowing it to have “deployed a production ML model using federated learning with a rigorous differential privacy guarantee.”

What they did: For their first proof-of-concept deployment, they used a DP-respecting algorithm called DP-FTRL “to train a recurrent neural network to power next-word-prediction for Spanish-language Gboard users.”

How they did it: “Each eligible device maintains a local training cache consisting of user keyboard input, and when participating computes an update to the model which makes it more likely to suggest the next word the user actually typed, based on what has been typed so far. We ran DP-FTRL on this data to train a recurrent neural network with ~1.3M parameters. Training ran for 2000 rounds over six days, with 6500 devices participating per round. To allow for the DP guarantee, devices participated in training at most once every 24 hours.”


Why this matters: In recent years, policymakers (particularly those in Europe) have started to write increasingly detailed recommendations about the need for tech companies to protect user privacy (e.g, GDPR). These regulations don’t align very well with how contemporary AI systems are developed and trained, given their dependency on vast amounts of user data. Techniques like a combination of federated learning and DP may let companies get ahead of the regulatory landscape – though it’s early days. “We are still far from being able to say this approach is possible (let alone practical) for most ML models or product applications,” Google writes. Consider this an intriguing proof of concept.
  Read more: Federated Learning with Formal Differential Privacy Guarantees (Google Blog).


####################################################

Humans: More robust against deepfakes than you feared:
…MIT study suggests we should be worried, but not panicking…
MIT researchers have conducted a 5,000+ person-study to figure out how susceptible people are to deepfakes. The good news? If you’re showing someone a faked video along with synthetic audio and text, there’s a reasonable chance they’ll guess that it’s fake. The bad news? People’s ability to identify deepfakes gets worse as you strip back modalities – so a silent video accompanied by a text transcript is hard, a silent video is harder, and just some text is hardest.

What they did: MIT recruited ~500 people to see how well they could identify deepfakes displayed on an MIT-created public website. It also got more than 5,000+ internet passers by to do the same test as well. Then, it grouped the cohorts together, filtered them for the ones paying attention, and ultimately got 5,727 participants who provide 61,792 truth discernment judgments across a bunch of different videos of Trump and Biden saying things. The data for this experiment came from the Presidential Deepfake Dataset, which consists of 32 videos of Trump and Biden making political speeches – half the videos are real, and half are fake. MIT then perturbed the videos further, swapping out audio tracks, text, and so on. 

What they found: “Participants rely more on how something is said – the audio-visual cues – rather than what is said – the speech content itself,” they write. “Political speeches that do not match public perceptions of politicians’ beliefs reduce participants’ reliance on visual cues.”
  Text is harder than video: “Across the 32 text transcripts, the least accurately identified one is identified correctly in 27% of trials, the most accurately identified one is identified correctly in 75% of trials, and the median accurately identified one is identified correctly in 45% of trials.”
  So are silent videos: Similarly for silent videos without subtitles, the median accurately identified one is identified correctly in 63% of trials and the range of accurate identification from the least to the most accurately identified is 38% to 87% of trials.

Why this matters: The more modalities you have, the better people do. “Ordinary people can sometimes, but not always, recognize visual inconsistencies created by the lip syncing deepfake manipulations. As such, the assessment of multimedia information involves both perceptual cues from video and audio and considerations about the content (e.g., the degree to which what is said matches participants’ expectations of what the speaker would say, which is known as the expectancy violation heuristic60). With the message content alone, participants are only slightly better than random guessing at 57% accuracy on average.”

One fly in the ointment: There’s one problem that unites these things – AI keeps on getting better. My fear is that in two years, people will find it a lot more challenging to identify fake videos with audio. Therefore, we’ll need to rely on people’s inner-media-critic to help them figure out if something is real or fake, and the way the world is going, I’m not sure that’s a robust thing to rely on. 

    Read more: ​​Human Detection of Political Deepfakes across Transcripts, Audio, and Video (arXiv).

   Check out the website used in the experiment: DeepFakes, Can You Spot Them? (MIT Website).


####################################################


Have some crazy ideas? Want money? Check out FTX’s new fund:
…Plans to deploy between $100m and $1 billion this year…
Crypto trading firm FTX has announced the FTX Future Fund (FFF). FFF is a philanthropic fund that will concentrate on “making grants and investments to ambitious projects in order to improve humanity’s long-term prospects”. The fund has also published some of its areas of interest, so people can have a sense of what to pitch it. It has a bunch of ideas but, this being Import AI, I’ll highlight the AI stuff.

What FTX is interested in giving grants on: AI alignment and specifically via “well-designed prizes for solving open problems in AI alignment”, AI-based cognitive aids, bridging gaps in the AI and ethics ecosystem via studying “fairness and transparency in current ML systems alongside risks from misaligned superintelligence.”

Why this matters: It’s starting to feel like the development of a good AI ecosystem is less blocked on funding than it is on talent – initiatives like the FTX Future Fund show there’s ample money for projects in this area. Now, the question is finding the talent to absorb the money. Perhaps some of the readers for this newsletter can be that talent!
  Read more: Announcing the Future Fund (FTX).
  Find out more about the projects: Project Ideas (FTX).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

System Cards: an approach to improving how we report the capabilities and limitations of AI systems

… In building on Models Cards and Datasheets, System Cards take into account the surrounding software and AI components … 

Researchers from Facebook (technically Meta AI Research, but I currently refuse to entertain this cynical hiding-from-controversy rebrand – Jack) have published a case study on ways to document Instagram feed-ranking via a concept they call System Cards. System Cards are designed to “increase the transparency of ML systems by providing stakeholders with an overview of different components of an ML system, how these components interact, and how different pieces of data and protected information are used by the system.” In this way, System Cards are philosophically similar to Model Cards (#174), data sheets for datasets, and ways to label reinforcement learning systems (#285).

System Cards: “A System Card provides an overview of several ML models that comprise an ML system, as well as details about these components, and a walkthrough with an example input.” System cards can be accompanied by step-by-step guides for how an input into a system leads to a certain output. 

How this is different: System Cards account for non-ML components of a system, and also describe the relationships between these systems (for instance, how data moves through a service). System cards are also meant to highlight upward and downward dependencies. They’re designed to be used by both technical and non-technical people.

Why it matters: System Cards contain a lot more information than other things like Model Cards and Datasheets, and they may make it easier for people to understand not only the system in question, but the larger technical context in which it is deployed and in which it has dependencies. If System Cards become more widely used, they could also generate valuable metadata for analyzing the field of deployed ML systems more broadly.

   Read more: System-Level Transparency of Machine Learning | Facebook AI Research

####################################################

Tech tales:

Some things that were kind of holy

[Recollections of the 2025-2030 period]

The 21st century was a confusing time to be religious – the old gods were falling away as fewer people believed in them, and the new gods hadn’t been born. But we did get protogods: AI systems that could speak with beautiful and persuasive rhetoric to almost anyone. Over time, these AI systems got more and more personalized, until people could ask them very specific questions, and get very specific answers that only made sense in the context of that person. Once this capability came online, we had the flash-problem of the ‘micro religions’. All kinds of micro identities had been brewing for years, like a fungus that took root on early social platforms like MySpace and Tumblr and Facebook and Instagram and TikTok, and then blossomed from there. Now, all these people with micro identities – the space wiccans, the anarcho-primitivists, the neo-cath-libertarians, the tankie-double-agents – got their own religions. Gods for space witches. Demons for anarchist Neanderthals. The flaming faces of god spraying money at the neo-Catholics.
  This, predictably, caused problems. The greatest problem was when the religious wars started. These weren’t traditional wars – nation states still had a premium on violence, and micro-identities barely touched the physical world. But they were information wars. People repurposed AI systems to generate and magnify the outputs of their own gods, then pointed them at the shared social media platforms people used. Twitter conversations would get taken over by pseudo-identities preaching the need to return to a simpler time, and then they would be quote-tweeted into oblivion by the witches claiming that now was the time for ascendance. Screenshots of these quote tweets would get magnified on the more overtly religious social networks by screenshots taken by the neo-Catholics and circulated as evidence that the great Satan was walking the earth. And these conversations would then be recycled back into twitter and commented on by the anti-pascals-wager atheists identities, which would trigger another cycle of religious preaching, and so on.
    The synthetic-theology accords were passed soon after.

Things that inspired this story: How the more one becomes an island, the more one creates a demon and an angel for that specific island; the need for humans to have beliefs; the commodification of belief into a symbol of identity; social networks as a hybrid of organic social needs and capitalist attention-harvesting; generative AI models like GPT3 and the logical consequences of their successors; watching Raised by Wolves and thinking about Future Christianity.