Import AI

Import AI 237: GPT3 at 5X the speed; 6 hours of AI breakbeats; NeuralMMO++

24 hours of low-resource language speech:
…AI + Bemba language research just got easier…
We write a lot about low-resource languages here at Import AI – that’s because a non-trivial % of the world speak or write in languages which are poorly digitized and documented. This means that the AI systems of the future are unlikely to operate over the culture embedded within these languages, depriving speakers of being recognized by AI systems, or being able to use AI systems to help build AI services.

The solution to this problem is simple: create datasets. A new paper from the University of Zambia and George Mason University provides a practical example of how to do this – the researchers have made BembaSpeech, consisting of ~24 hours of speech in the Bemba language (which is spoken in Zambia). BembaSpeech is ~2.8 gigabytes of data with 17 speakers spread across the train, dev, and test sets.

Wild recordings: BembaSpeech was recorded in the wild, so different speakers have different accents and there’s some occasional background noise. “We consider this “more of a feature than a bug” for our corpus: it will allow us to train and, importantly, evaluate ASR systems that match real-world conditions, rather than a quiet studio setting,” the researchers say.
  Read more: BembaSpeech: A Speech Recognition Corpus for the Bemba Language (arXiv).
  Get the data: BembaSpeech (GitHub).

###################################################

Do you dream of training an AI to classify hair? Your dreams have been answered!
…K-Hairstyle could be the ImageNet of Korean Hairstyle data… wow!…
As AI has industrialized, we’re seeing the emergence of highly specific datasets for training AI systems to do very specific things in different parts of the economy. The latest symptom of this industrialization? The development of K-hairstyle, a large-scale Korean hairstyle dataset to help people build AI systems that can classify different hairstyles and, given enough compute, let people synthesize different images of themselves in different hairstyles.

What’s in the dataset? K-Hairstyle includes ~256,000 images labelled with any of 31 specific hair attributes. THe images were collected using high-resolution so they come in at 4032×3024 pixels (way, way larger than typical images in these sorts of datasets). Additionally, in each image the hair has been labelled with a segmentation mask, so it’s easy to train ML systems to distinguish between hair and flesh./faces. As a nice privacy bonus, the faces of the photographed people have been blurred as well.

Why this matters: K-Hairstyle is a symptom of the maturity of computer vision – we’re well into the ‘gather specific datasets and try to make some money’ phase of CV these days. Datasets like K-Hairstyle illustrate that and also suggest that data might not be the strategic thing these days (or else why would they release it?), rather, it’s about who has the computational infrastructure to train AI systems on these datasets.
  Read more: K-Hairstyle: A Large-scale Korean hairstyle dataset for virtual hair editing and hairstyle classification (arXiv).
  Check this link to get the dataset, though it’s not public right now (KbeautyHair, GitHub).

###################################################

Want 6 hours of AI-generated drumloops? Click here
…YouTube video compiles 4400 AI-generated breaks…
An AI tinkerer has trained a ‘WaveGAN’ neural net on 7500 vintage drumloops, then used the resulting model to generate thousands of new drumloops. I recommend having a listen to the video containing the synthetic loops – some of them are great and, if you’re one of Import AI’s more musical readers, worth sampling (“I’m not 100% sure that all the samples are copyright-free or smth”, writes the researcher on YouTube). The researcher has also published a Colab and the model as well.

Why this matters: AI is about to create a world of infinite-x. Infinite-drumloops? Sure. Infinite-cartoons? Absolutely. Infinite-fanfiction? Glad you asked. Infinite-movies? Eventually, yes. We’re at the beginning of a very significant shift in culture. Listen to these drums and imagine the cacophony of the future. It’s close.
  Listen to six hours of break beats here (YouTube).
  Check out the NeuralFunkV2 Colab folder here (Google Drive).

###################################################

Unsupervised understanding of gene sequences? Yup, AI can do that now as well:
…Deep learning bleeds into biology, thanks to the transformer…
Researchers with UC Berkeley, Facebook AI Research, and New York University have shown how to use a transformer-architecture “protein language model” to make better predictions about the structure and function of proteins. The resulting model outperforms existing AI systems and does so while being far more efficient in terms of parameter size (their model: 100M parameters, other models: 650M).

What they did: They pre-train a 100million-parameter model on 26 million sets of multiple sequence alignment (MSA) data (each MSA has around 1192 sequences). 
  Their special tweak:

How well it works: To test out their system, they test against the task of ‘unsupervised contact prediction’ – a way to evaluate how much protein information the transformer has managed to infer during training; their system outperforms two state-of-the-art transformer models (ESM-1b with 650M parameters; ProTrans-T5 with 3B parameters). They also use their models within a Supervised Contact Prediction task, which is where they’re augmented with additional information – here, their system significantly outperform all other baselines as well.

Why this matters: “Unsupervised learning provides a way to extract the information contained in massive datasets of sequences produced by low cost gene sequencing,” they write. We’re very much in the early phases of experimenting with using modern AI techniques to understand proteins. This approach will complement some of the great work that has already gone on with supervised learning in this space via AlphaFold (Import AI 189; 209; 226).
  Read more: MSA Transformer (arXiv).
  Get the code here (Evolutionary Scale Modelling, Facebook).

###################################################

Multiagent simulations are cool, sure. But you know what’s really cool? Multiagent MMOs!
…When AI research meets modern videogame design…
Neural MMO, a software package for simulating hundreds of AI agents in the same gameworld, has received a major software update. Neural MMO V1.5 follows the original software, which was released a couple of years ago by OpenAI (March, 2019). Neural MMO is now being developed art MIT.

New features in V1.5 include: A user guide and documentation, the addition of ‘NPC’ characters for AI agents to fight (as well as equipment they can pick up), support for much larger maps to train agents on, the inclusion of strong baselines so you can start research quickly, custom visual overlays to show different aspects of the AI simulation (for instance, value functions, or stats about particular agents).

Why this matters: In Greg Egan’s fantastic scifi story ‘Crystal Nights’ a scientist simulates an ecosystem and tries to apply evolutionary pressure to make some (simulated) crabs really clever – with entertaining results. It’s a scifi story, sure, but it also gestures at a real trend in AI research: perhaps one way to build more intelligent systems is to embed agents in a simulated world where they compete with one another, which generates a kind of free form of bootstrapping where as the agents become more capable, so too do their competitors. Systems like NeuralMMO make it easier for other researchers to play around with ideas like this, letting us know if Crystal Nights could become our reality.
  Read a Twitter thread about the update here (Joseph Suarez, Twitter).
  Find out more at the official Neural MMO website.
  Watch a trailer for V1.5 here (YouTube).
  Get the code here (Neural MMO, GitHub).

###################################################

Want to train GPT3 5X faster than you could before? Now there’s a way:
…TeraPipe = AI industrialization = Big models mean big infrastructure…
UC Berkeley and DUke University researchers have figured out how to speed up the training time of a mega language model like GPT3 by 5X – and the secret lies in pipelining. What’s pipelining? It’s literally just fancy plumbing for AI models – pipelining is how you shuttle information between different parts of a model during the learning process. And as you train bigger models, people have invested in figuring out smarter approaches to pipelining to save them more money.
  The new research shows how to exploit the Transformer-architecture to do in-training pipelining via a technique called TeraPipe. “Our evaluation shows that for the largest GPT-3 model with 175 billion parameters, TeraPipe achieves a 5.0x speedup improvement over the state-of-the-art synchronous model-parallel training methods on an AWS cluster consisting of 48 p3.16xlarge instances,” they write.

Long contexts: The researchers also use TeraPipe with different input sequence lengths and show it scales favorably to larger input sequences – this suggests TeraPipe will be helpful in the future as it can handle the performance demands of longer contexts.

Why this matters: We’re in the industrialization phase of AI development – that means researchers are beginning to think about the proverbial machines that build the AI machines. Systems like TeraPipe are a symptom of interests in the broader research community – figuring out how to train larger models more efficiently than ever before. Let’s see what we discover as we plumb the depths of this exciting problem!
  Read more: TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models (arXiv).

###################################################

Tech Tales:

She’s a real artist, she even sings to the computers, yeah!
[Los Angeles, 2025]

K: How’s the album coming along?
A: It’s going well – we’ve generated a couple of hundred variants. Some of them are very promising.
K: And does it sound like me?
A: Sound like you? Kayla, it is you. You sang the seed lines. It wouldn’t sound like this if it wasn’t for you.
K: I just miss the old days sometimes. I stayed up two days when I did the first EP.
A: Now, we can get the computers to stay up for you. You rest up for the tour.

That night, the computer saw itself on stage and saw itself singing. The computer sang songs for hours, all through the night, not being aware that though it felt it was one computer, it was in fact many hundreds of copies of the same program. It sang and it felt it existed. It felt it existed because it was lucky – it was singing songs that were predicted to be good. The computers that sang songs which other computers predicted to be bad were destroyed.

K: What do the fans think?
A: They love it, in simulation. We’re doing the live tests soon.
K: Well, what did you think about it?
A: It’s not about what I think – really. It’s about getting your music to a place where the largest number of people will want to listen to it.
K: I want to sing for them.
A: You don’t need to! We’ve got you in-sim already – and let me tell you, sim Kayla is amazing. You’ve got some competition.
K: I need to practice, anyway. Let’s do a show for the sims next week, before we take it to the fans.
A: You got it!

The next week, Kayla did her makeup and her vocal exercises, then turned on the bright lights in her apartment and stared into the camera, broadcasting her performance into the simulated concert space. She started singing, listening to herself through the in-sim monitor via an earbud, through which her agent occasionally interrupted:
  A: I’ve never seen reactions like this. Kayla, they love you.
  A: This is going over a lot better than even our most optimistic predictions.

After the performance she showered and in the shower she sang to herself and listened to her songs bouncing off the tiles. She liked them. And she’d like singing for the simulation. The crowd loved her. And she was, quite literally, all they had.

A week later her agent rang her up.
  A: Kayla, we ran analysis on your performance. I don’t think you’re going to beat it.
  K: Sounds like a high bar to clear. That’s awesome.
  A: Yes, and there’s been so much interest we’ve started selling it for overflow for the tour.
  K: So if they don’t get a ticket they’ll see the sim performance?
  A: Exactly. We recorded everything and we’ve animated you, so it’ll be personalized.
  K: So my competition on tour will be… myself?
  A: That’s a funny way to look at it. But, yes!

Things that inspired this story: sim2real and other reality gaps; using ML to simulate responses; GAN-style training but for humans in the run-up to great events; how generative models let us bottle up and distill style&talent and how surely this will be exploited by machinery of cultural production.

Import AI 236: EfficientNet++; why robots are hard; AI2 makes a harder ARC

What’s standing between us and smart robots? AI experts lay out laundry list of needed research:
…But if we can make progress on these problems, very good things will happen…
I want robots. You probably want robots as well. But today’s robots are hopelessly dumb and limited. To create smart robots that people want to buy, we’ll need to surmount a bunch of challenging AI research problems. Now, some of the world’s foremost experts at AI&Robots have laid out the technical hurdles to building robots that can learn efficiently via reinforcement learning. In a paper, people who’ve spent time working on robots at Google, including at Stanford University and Berkeley, list the issues.

What stands between us and more capable robots?
The major challenges holding back RL being applied to robotics relate to its data needs, the inherent challenges of open-ended exploration problems, figuring out how to make robots operate reliably at scale, needing better and more accurate simulators to more cheaply let people train in simulators, creating robots that have more independent abilities to persistent at tasks, and trying to define (and learn) a range of ‘safe’ behaviors.
  The challenging part of these problems? Solving any single one of these would represent a significant breakthrough in applied AI research. Solving all of them would probably represent billions of dollars of IP. Therefore, it might take a while to make progress on this stuff, but if we do – wow!

Why this matters:
If we can work on these challenges, then we’ll get closer to “a future where RL can enable any robot to learn any task,” the researchers write. “This would lead to an explosive growth in the capabilities of autonomous robots – when the capabilities of robots are limited primarily by the amount of robot time available to learn skills, rather than the amount of engineering time necessary to program them, robots will be able to acquire large skill repertoires.”
  Read more:
How to Train Your Robot with Deep Reinforcement Learning; Lessons We’ve Learned (arXiv).

###################################################

AI Dungeon raises $3.3 million:
…AI-powered game startup gets seed funding…
Latitude, the startup behind the GPT2/3 generative text adventure game ‘AI Dungeon’, has raised $3.3 million in seed funding. We first wrote about AI Dungeon back in December 2019, after the game launched using the 1.5bn GPT2 model [Import AI 176]. AI Dungeon uses these language models to create a procedural, emergent text adventure game, where you can be anything and do anything with the generative models filling in your actions in the background. Since launching, Latitude has iterated on the game a lot and swapped out GPT2 for GPT3 across some of its stack.

Why this matters: Modern generative models are more like bottled up imaginations than anything else – with all the complexity and bugginess that implies. AI Dungeon is one of today’s best examples of how we can use these models to create entertainment that feels genuinely different.
  Read more:
AI Dungeon-maker Latitude raises $3.3M to build games with ‘infinite’ story possibilities (Techcrunch).

###################################################

Allen makes a harder ARC, ARC-DA:
…Where we’re going we don’t need multiple choice questions…
The Allen Institute for AI Research (AI2) has built ARC-DA, a direct answer variant of the multiple choice AI2 Reasoning Challenge, ARC. ARC-DA contains questions covering science, math, and other topics. Where ARC-DA differs is it requires a single, direct answer, rather than selecting from a bunch of distinct choices. This makes it harder and more natural than the original ARC evaluation.

Why this matters:
Tests fuel progress in machine learning, so the availability of more tests to assess for reasoning capabilities will lead to more progress here. This is a further sign of the breakneck advances in NLP – ARC-DA seems like a version of ARC with the training wheels taken off.
  Read more: Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge (arXiv).

###################################################

Defense contractor publishes a satellite surveillance MNIST:
…A tiny, 28×28 satellite imagery dataset emerges…
Researchers with PeopleTec, Inc., a defense services contractor, have released Overhead MNIST. Overhead MNIST is a collection of ~9500 labelled images of 10 objects commonly found in satellite footage. The images are black-and-white and 28×28 resolution and have been taken from datasets like SpaceNet, xView, UC Merced Land Use, and DOTA (not the videogame). Overhead MNIST is smaller than typical ‘small’ datasets (which usually have more like 100,000 to a million images), swo may be a useful dataset for testing out sample efficient computer vision algorithms.

The 10 classes: Storage tanks, parking lot, ships, helicopter, car, stadium, oil gas field, runway mark, plane, and harbor.

Things that make you go ‘hmmm’: The corresponding author of this paper is the Chief Scientist for PeopleTec.
  Read more: Overhead MNIST: A Benchmark Satellite Dataset (arXiv).
  Get the data: Overhead-MNIST (Kaggle).

###################################################

NVIDIA
: Billion dollar training runs are coming
…Success of language models means training run costs will rise…
Bryan Catanzaro, NVIDIA’s VP of applied deep learning says its possible “that in five years a company could invest one billion dollars in compute time to train a single language model”, according to comments paraphrased by The Next Platform.

“These models are so adaptable and flexible and their capabilities have been so correlated with scale we may actually see them providing several billions of dollars worth of value from a single model, so in the next five years, spending a billion in compute to train those could make sense,” The Next Platform quotes him as saying.

Why this matters: AI industrialization: AI is entering its phase of mass industrialization – after years of buildup, we have scalable, relatively generic systems that can be ‘fed’ arbitrary amounts of data and compute. Performance has also become more predictable via the emergence of research into things like ‘scaling laws’. Add it all up and it means it’s become easier and less risky for people to bet big on training large models. That’s going to cause problems for governments and academia which tend to distribute resources for science across a very large number of relatively small projects. Meanwhile, industry will start training big kahuna models – to put a billion into perspective, that’s about 1% of Ethiopia’s total GDP in 2020.
  Read more: The Billion Dollar AI Problem That Just Keeps Scaling (The Next Platform).

###################################################

Google boils the ocean to make a far more efficient AI system:
…Neural architecture search + GPU/TPU details + other tricks = 2X efficiency boost…
Google has boosted the efficiency of ‘EfficientNet’, its well-performing and highly efficient class of vision models, by 2X via the use of neural architecture search. Neural architecture search (NAS) is the process of using reinforcement learning to get an AI system to search through the design space of neural networks, coming up with candidate systems that do well at a given task. Google’s new research shows how to use this approach to search for model families – that is, a whole suite of models that use the same basic architecture.

What Google achieved: Google was able to build a new family of models called EfficientNet-X, which are 2X faster (aka, more efficient) than EfficientNet.

How they did it: Google carefully analyzed the target AI training hardware (TPUv3s and V100 GPUs), designed a NAS search space built around the particulars of this hardware and researched a technique to help scale-up networks according to both accuracy and latency constraints. They put all of this together and were able to use an AI-driven approach to come up with a far better family of models. This model family “achieves up to 2X+ faster speed and comparable accuracy to SOTA model families on TPUv3 and GPUv100”, Google says. .

The massively counterintuitive thing about this – you’ve gotta spend compute to make more efficient use of compute: The biggest thing about this paper is what it tells us about compute/energy expenditure and AI – here, a bunch of researchers boil the (metaphorical) ocean to do a complex two-stage search process, spending huge amounts of energy in the process. But what we end up with is a fairly generic family of AI models that are roughly 2X as efficient as their predecessors. That means the upfront energy used to train these models will get amortized over the (vast!) cost-savings from deploying these models onto large infrastructure.
  Read more: Searching for Fast Model Families on Datacenter Accelerators (arXiv).

DeepMind gets rid of batchnorm, makes more efficient neural nets:
…Batch normalization? I don’t know her
Researchers with DeepMind have built a better class of neural network by getting rid of a widely-used technique (batch normalization), matching the performance of EfficientNets (see elsewhere in this issue) while being significantly faster to train. They also set a new state-of-the-art on ImageNet by pre-training on Google’s secret, mammoth ‘JFT’ image repository.

What they did: The authors train ‘Normalizer-Free-ResNets’ (NF-ResNets), then use a technique called adaptive gradient clipping to help them train these NF-ResNets to larger batch sizes than was previously possible. One of the main tricks here is training networks without batch normalization, a widely-used technique that the authors want to get rid of because it’s a bit fiddly. (And generally in ML, when we simplify things, we get increased performance).
  They then try to set a new state-of-the-aert on ImageNet by manually picking through recent innovations in large-sale AI training and stapling them together. They then pre-train a NF-ResNet on the secret ~300 million image ‘JFT’ repository and set a new state-of-the-art of 86.5% for top-1 accuracy: this is meaningful, as it shows that Google’s technique holds up well under transfer (pre-training on JFT and finetuning on ImageNet), which indicates it might be a meaningful improvement.
  Read more: High-Performance Large-Scale Image Recognition Without Normalization (arXiv).

###################################################

AI Policy with
Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Cops use music to censor protestors’ video recordings:
An activist has shared intriguing videos of interactions with police officers in Beverly Hills. The officers, realising they are being filmed, start playing (copyrighted) music loudly on their phones, in an apparent effort to trick content algorithms into removing or muting the video. It’s not clear if this practice is widespread, or whether it’s ever been effective in suppressing citizen footage.
  Read more: Is This Beverly Hills Cop Playing Sublime’s ‘Santeria’ to Avoid Being Live-Streamed? (Vice)

What are the implications of large language models? 

This is a write-up of a discussion on the capabilities and impact of large language models, between researchers from OpenAI, Stanford’s HAI and elsewhere. If you’re interested in the topic, skip my summary and read the paper, which is short and concise. For a comprehensive reading list of papers on the subject, the authors suggest Bender & Gebru et al, and the original GPT-3 paper.


Q1: “What are the technical capabilities and limitations of large language models?”

  • Participants were optimistic about LMs continuing to reap the ‘blessings of scale’.
  • They mostly expected large multimodal models to become more prevalent and enable more diverse capabilities.
  • They’re worried about the alignment of model objectives with human values, with several emphasizing the challenge of optimizing for factual accuracy, and ensuring robustness to adversarial examples. 


Q2: “What are the societal effects of widespread use of large language models?” 

  • They don’t see leading actors (e.g. OpenAI) maintaining a monopoly on large LMs for very long, and expect it to take 6-9 months for such models to be widely reproduced. The lead actors should make use of this time period to establish and promote good norms around responsible deployment.
  • Some suggested more compute resources were needed for academia to do research into societal impacts of LMs to help inform deployment.
  • There was concern about potential misuse of LMs for disinformation, though opinions differed on the magnitude of the risk. They agreed that we need more research into the economics of automating disinformation.
  • They’re worried about LMs exhibiting bias, and suggested ways of addressing different aspects of the problem.

Read more: Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models (arXiv).

###################################################

Tech Tales:

Barkside of the moon
Earth, 2045

Its name was 389-DELTA-FOLLOWER003 but all of its friends just called it ‘DOG’, or whatever the machinespeak equivalent was. DOG was a spaceship about 50 feet long and 10 feet wide and it looked, from the outside, like a grey, fat cigar. Inside, it contained a range of stupefyingly complicated electronics, yet had no – strictly speaking – moving parts. DOGs purpose had been to trail other, far larger ships, acting as a roving sensor platform, communications hub, and general utility-support vehicle. It also acknowledged initial hails by playing back the sound of an animal barking – an odd coincidence, given its name, and one which our scientists are puzzling over.

DOG has so many human qualities, ranging from its name to the bark to the fact its logs use the English alphabet, that our scientists at first worried it came from the future. But we couldn’t understand how – or if – that was possible and, after some weeks passed, we became less concerned about an attack from there.  

Then we went back to the question: if not the future, where did DOG come from? We quickly eliminated the present – no one on Earth had technology like DOG. As far as we could work out, it represented hundreds to thousands of years of scientific advances which humankind was not yet privy to.

So then we checked the past. I got the job to go through the UFO archives among a few different military organizations. So I got to talk to a lot of people driven slightly mad by vast historical records of unexplained mysteries. But: fruitless. Despite it being one of the more exciting things that’d happened to the UFO archivists in decades, no one was able to find me much evidence of a 50 foot by 10 foot silver/grey cigar. Someone tried to tell me it could’ve been retold in history as a story about an ancient sea-going snake, but the evidence there was very sparse.

And then there was where we found it: the dark side of the moon.
For those of you that aren’t familiar with space: You don’t randomly end up on the dark side of the moon unless you’re a comet or an asteroid.
And then there was how we found it: the Chinese had sent a new probe to explore some of the deeper craters on the dark side of the moon. While doing this, the probe was also conducting some intelligence operations, basically sniffing around for other robots and probes placed there by other nations. We found DOG because the ‘DOG’ woke up in response to a hail from the Chinese probe and, yes, barked back to it.

Picture this: the President of the USA and the President of China go into a secure location, along with some other people. They all gather there and stare at eachother. We’ve found an alien craft, folks. And it barks like a dog.
It’s notable that the notes from that meeting are quite thin.
I like to think that someone started laughing and never stopped.

So, that’s where we are. We’ve got our DOG craft and no real explanation of how it got to the moon, why it responds with an animal hail, or why its logs are in English – though the spooky explanation for the latter might be that it did a sweep of the planet at some point and automatically restructured the encoding it used to match the English language; this explanation, along with being hard to prove, also has the inherent undesirable quality of irritating the Chinese government. If DOG could convert itself to English, why not Mandarin as well?

Things that inspired this story:Oumuamua; locked room mysteries; writing ‘barkside of the moon’ and thinking ‘gosh this is absurd’ and then chuckling to myself while writing this story saying ‘yes, this is absurd!’; dogs; the rendering of spaceships in Iain Banks’ culture novels.

Import AI 235: Use GEM to test language models; the four eras of facial recognition; and how the US can measure its robot fleet

20 million eye images – get ’em while the link still works!
…Go on, you’re a little bit curious about what you could hack around with this…
Researchers with the University of Tubingen in Germany have published a dataset of 20 million eye images, gathered via seven different eye tracking formats. The data is diverse – eyes have been recorded while driving outside, driving in a simulator, and carrying out a variety of indoor and outdoor activities. The data includes 2D and 3D segmentation, annotated pupils, position and radius of the eyes, and more. The authors hope TEyeD will “contribute to the application of eye-movement and gaze estimation techniques in challenging practical use cases.”
  Read more: TEyeD: Over 20 million real-world eye images with Pupil, Eyelid, and Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector, and Eye Movement Types (arXiv).
  Get the data here (via a Sharepoint link).

###################################################

Happy 2021 – Lacuna Fund is about to help create more African agriculture datasets:
…First round of grants shows what applied machine learning means…
Lacuna Fund, an organization that funds the creation of labeled datasets for underserved communities is supporting six projects focused on agricultural data. Lacuna also wants to support the creation of language datasets in sub-saharan Africa (Import AI: 216.

Six projects for better data: The projects involve datasets for georeferenced crop images, land use planning in Tanzania, crop pest and disease diagnosis, water use, cleaning up existing crop-cut yield datasets, and a five-country crop dataset means to be gathered via cameras mounted on custom-designed vehicles.  
  Read more about the awards here (Lacuna Fund website).
  Via: AI Kenya newsletter (Mailchimp archive)  .

###################################################

Here’s how the USA could get a handle on AI policy:
…One weird trick to give the government the jump on the robots…
Measurement is a prerequisite to sensible policymaking – if you can’t measure or quantify something, it’s hard to regulate or manage it. Rob Seamans, a professor with NYU, wants to help the US measure the impact of AI on its economy and has written a piece in Brookings outlining how to do that.

The key? The US needs to measure how the addition of robots and/or AI-oriented software can influence productivity at firms or specific firm-owned places (e.g, a warehouse). The US does not do this today. It used to – in the 1980s and 1990s the US conducted the ‘Survey of Manufacturing Technology’, but retired that due to government cutbacks in the 1990s. Seamans’ suggestion is a pretty simple one (which is why it might work): we should bring back the survey and do it annually.

What should we ask America about AI? “The survey would include questions about the use of specific technologies, such as robots, machine learning, cloud, e-commerce, autonomous guided vehicles, and others, and could be a simple “yes/no” question about whether the establishment has the technology or not,” Seamans writes. “There would be multiple benefits to a standalone survey of technology. The survey would allow researchers to identify sectors and regions of the economy that are being impacted by new technologies.”

Why do this at all? Data from France shows that if you add robots to a company, the company creates more jobs. We should do a better job of measuring data at the US level so we can do the same study here easily, Seamans said. “While there is excitement about the impact that new technologies like artificial intelligence and robotics will have on our economy, we need to do more to measure where and how these technologies are being used,” he writes. 
  Read more: Robot census: Gathering data to improve policymaking on new technologies (Brookings).

###################################################

Language models are here, but how do we evaluate them? Try GEM:
…Multi-task benchmark aims to give us better signals about AI progress…
A gigantic team of researchers have collaborated to build GEM, a benchmark to help evaluate progress in natural language generation. NLG is going to be a big deal in the next few years as the success of models like GPT3 creates demand for better ways to evaluate synthetically-generated text. GEM represents a hard, multi-task generative benchmark which AI researchers can use to test out the capabilities of their model.

11 tests: The first version of GEM includes 11 test datasets and tasks that “measure specific generation challenges, such as content selection and planning, surface realization, paraphrasing, simplification, and others”. The initial datasets are: CommonGEN, Czech Restaurant, DART, E2E clean, MLSum, Scheme-Guided Dialog, ToTTo, XSum, WebNLG, WikiAuto + Turk/ASSET, and WikiLingua.

Data cards: The GEM-creators are thinking about AI policy, as well, because they’ve included a ‘data statement’ for each of the 11 included tasks. A data statement works like the label on food – you list out the ingredients and some of the salient intended (and unintended) uses. Today, most AI systems are broadly undocumented, so it’s notable that GEM prioritize data legibility for the first version of the benchmark.

Why this matters: Evaluating generative models is challenging because they have vast capability surfaces which are hard to characterize with today’s tests. Systems like GEM will help us get (somewhat fuzzy) signals about the creative and generative capabilities of these models. The more – and better – tests we have, the easier it’s going to be to craft sensible policies around the deployment of AI systems.
  Read more: The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics (arXiv).
  Find out more at the official website (GEM Benchmark website).

###################################################

What’s the big deal about facial recognition? A historical analysis gives us some answers:
…Facial recognition has industrialized, so we should take it seriously…
Facial recognition is one of the most prominent uses of contemporary AI, for anything from unlocking your phone to helping you apply filters to your face in consumer apps to being a tool used by those involved in security to track and surveil individuals. But where did facial recognition come from and how significant is the moment we’re in now? That’s a question that two researchers try to answer with a review of how facial recognition evaluation has occurred over time.

The four periods of facial recognition: Facial recognition has four distinct eras which contribute to stages of technology development, as well as commercial interest. The authors do some really valuable work of providing some statistics to help us understand the different salient aspects of each era. These are:
– Period 1: Early research findings: 1964-1995: 5 datasets created, with an average number of ~2000 images per dataset.
– Period 2: Commercial viability: 1996-2006: 37 datasets created, with an average number of ~11,000 images each.
– Period 3: Mainstream development: 2007-2013: 33 datasets, with an average number of ~46,000 per dataset.
– Period 4: Deep learning breakthrough: 2014 onwards: 45 datasets, with an average number of ~2,600,000 images per dataset.

The most influential datasets: The authors also identify the most influential face datasets (according to citations), for each period. For the four periods, the popular datasets are: Picture of Facial Affect (P1), FERET (P2), Labeled Faces in the Wild (P3), and VGGFace (P4).

Why this matters: Recent advances in deep learning have made it generally cheaper to deploy more performant vision-based surveillance systems. At the same time, the data-intensiveness of the underlying computer vision algorithms has increased to the point it’s very challenging to analyze and evaluate the datasets used to train these systems (you try and classify two million of anything and see how far you get). This also incentives people to move from curating precise datasets to indiscriminately scraping the cheapest (and arguably most diverse on some metrics) form of data – the internet.
    In tandem with these changes in the technical infrastructure, so has the usage of facial recognition evolved – “we’ve seen the trend in facial recognition evaluation shift broadly from a highly controlled, constrained and well-scoped activity to one that is not,” the authors write. “At minimum, an important intervention moving forward is to standardize documentation practice, of the model and the face datasets meant to be used in development or evaluation”.
  Read more: About Face: A Survey of Facial Recognition Evaluation (arXiv).

###################################################

Weights and Biases raises $45 million Series B:
…Measurement means money…
AI startup Weights and Biases has closed a $45m funding round, as investors bet that in the future more companies are going to invest in measuring and analyzing their machine learning infrastructure and models. W&B’s software is for machine learning operations – think of this as the systems that AI practitioners use to help them train and develop models.

Why this matters: Funding for companies like W&B is a broader symptom of the industrialization of AI technology – we’re seeing the emergence of pure ‘B2B’ businesses built not around specific AI components, but around facilitating AI infrastructure.
  Read more: Weights and Biases Raises $45M Series B to Expand Beyond Experiment Tracking for Machine Learning Practitioners Everywhere (PRNewswire).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

More turmoil in AI ethics at Google:
In December, Google’s co-lead of Ethical AI, Timnit Gebru, was forced out in a dispute about academic freedom (see Import 226). Gebru had been pressured to withdraw a paper she had co-authored on the societal impacts of large language models. Axios reports that Google is now investigating Gebru’s co-lead, Margaret Mitchell, and has locked her email accounts, accusing her of downloading and sharing company files. Mitchell was reportedly collecting evidence of discriminatory treatment of Gebru. The newly formed Alphabet Workers Union calls the company’s actions “an attack on the people who are trying to make Google’s technology more ethical.

###################################################

Tech Tales

The Glass Child
[Earth, 2050-35??]

The child stood there, embedded in glass, and people worshipped it and fought over it and tried to breach it (fruitlessly) and feared it and so on, for hundreds of years. 

It was the child of a rich person who had foreseen the Time of the Scourge, and had paid to embed his kid into a multi-thousand year life preserving substrate, itself sheaved in an ultra-hard complex material that most would mistake for glass. The child seemed to float, suspended, in the center of a 10 foot tall translucent and impenetrable rectangle. The child was kept alive through obscure technologies, but appeared mostly dead to any observers. The ‘mostly’ part came from the color of his skin – he was grey, yes, but when lit by torchlight or electrics his skin would shine and seem to hint at an inner strength. Over hundreds of years, different groups of scavengers told individually varied stories about how they’d heard the child trapped in ice sing, or laugh, or shout.

People developed rituals around the child; mothers brought their sick children to the glass rectangle and they’d lay blankets down and leave their babies on it overnight. The superstition wasn’t justified, but that didn’t mean it was wrong – the same technologies that kept the boy alive took the form of a field and this field radiated out from the boy reaching the edge of the glass and slightly beyond. The effect was neither dramatic or obvious, but it worked just enough of the time that the rituals held. Over time, the child became an icon for health and was sainted and worshiped and, yes, fought over.

For a while, there was a king who was convinced if he stayed close to the child he, too, would live forever. He had a great castle built around the glass rectangle and had his throne placed against it. When you met with the king you’d go into a great room and the king would stare at you and, above and behind him, the pallid child would hang there in the glass. People convinced themselves that the child was watching them and that the king talked to it.

The kind did live a long time, aided by the mysterious field. And as most do, the king became more idiosyncratic the older he got, which ultimately led to him visiting great misery on the people within his dominion. They rebelled, as people tend to do, and tore down the castle in which the king lived. They heaped great firee around the glass rectangle and burned the materials of the palace. After a week, the fire went out, and the rectangle was unscathed.

So the people called the land cursed. Before they left, a group of them painted the rectangle with black paint, sealing in the child. Then they took their carts and their families and they left.

Things that inspired this story: Old hard drives; the relationship between memory and a sense of life; how people naturally coordinate around artefacts regardless of what the artefact is.

Import AI 234: Pre-training with fractals; compute&countries; GANS for good

Where we’re going we don’t need data – we’ll pre-train on FRACTALS!!!!
…This research technique is straight out of a Baudrillard notebook…
In Simulacra and Simulation by French philosopher Jean Baudrillard, he argues that human society has become reliant on simulations of reality, with us trafficking in abstractions – international finance, televised wars – that feel in some way more real than the thing they’re meant to reference. Now, AI researchers are producing papers that, I’m sure, would get Baudrillard excited: research from National Institute of Advanced Industrial Science and Technology (AIST), Tokyo Institute of Technology, and Tokyo Denki University, proposes a way to simulate the data necessary to pre-train a vision model, then fine-tune this model on reality. Specifically, they build a dataset called FractalDB which contains several thousand fractals split across a variety of different automatically generated categories. Their experiment shows that they can pre-train on FractalDB then finetune using other datasets (e.g, ImageNet, OmniGlot, Cifar-10), and get performance that is close to using the natural datasets and, in some cases, is better. This isn’t a homerun, but it’s encouraging.

What they did: To do this, they built a fractal generation system which had a few tunable parameters. They then evaluated their approach by using FractalDB as a potential input for pre-training, then evaluated downstream performance.
    Specific results: “FractalDB1k / 10k pre-trained models recorded much higher accuracies than models trained from scratch on relatively small-scale datasets (C10/100, VOC12 and OG). In case of fine-tuning on large-scale datasets (ImageNet/Places365), the effect of pre-training was relatively small. However, in fine-tuning on Places 365, the FractalDB-10k pretrained model helped to improve the performance rate which was also higher than ImageNet-1k pre-training (FractalDB-10k 50.8 vs. ImageNet-1k 50.3)

How this fits into the larger picture – computers become data generators: Real data is expensive, complicated, and slow to gather. That’s why the reinforcement learning community has spent decades working in simulators – e.g, training agents to play Atari, or Go, or explore 3D worlds in a rewritten Quake engine (DeepMind Lab). It’s also led researchers to find creative ways to augment real datasets – e.g, by multiplying the size of an image dataset by flipping the images, adding textures, changing colors and textures, and so on. All of these techniques have proved helpful.
  Now, if researchers can build simulators to generate arbitrary amounts of data, they might be able to further change the cost curve of data generation. This might have weird economic and strategic implications: if you can simulate your data using a computer program, then you can change the ratio of real versus simulated/augmented data you need. This has the potential to both speed up AI development and also increase the inherent value of computers as primary AI infrastructure – not only can we use these devices to train and develop algorithms, but we can use them to generate the input ‘fuel’ for some of the more interesting capabilities.  
  Read more: Pre-training without Natural Images (arXiv).

###################################################

Using a big anime dataset to train character distinguishers:

…Illustrations + fine-grained character recognition …
Researchers with National Chiao Tung University in Taiwan have built DAF:re (DanbooruAnimeFaces:revamped). DAF:re is a subset of the massive ‘Danbooru’ Anime dataset (see Import AI 233., filtered to just include heads of different characters. The resulting dataset consists of ~467,000 images across 3,263 distinct character classes.

Why do this?
Datasets like DAF:re will let people explore fine-grained analysis of stylized pictures (like anime), and could potentially serve as benchmarks for exploring the generalization of vision models trained on a mixture of normal and illustrated images. If it becomes widely used, it could end up being another proxy signal for the broader rate of progress in this type of work. I also expect that, given the vast fanbase for a lot of anime, we’ll see more projects like this, and perhaps they’ll ultimately help filter, analyze, and map the cultural space of anime writ large.
  Reader note: This dataset uses cropped photos of faces, but the larger dataset involves images of a sexual nature (including the SFW one).
  Read more: DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition (arXiv).
  Get the code for the classification stuff here (Animesion, GitHub).

###################################################

Big AI means big infrastructure:

…OpenAI* scales Kubernetes to 7,500 nodes…
OpenAI is running Kubernetes across ~7,500 nodes. Why does this matter? Kubernetes is a bit like an air-traffic control system for large-scale computing; the software helps schedule different jobs onto different bits of hardware (think of this as like assigning planes spots on the ground), and also handles things like contention (stopping planes crashing into eachother), and efficiency (prioritizing getting planes up and down quickly and efficiently). 7,500 is up from the 2,500 OpenAI disclosed in 2018. It’s worth reading these posts because they give a sense of the complexity of the infrastructure that supports large-scale AI workloads.
  Read more: Scaling Kubernetes to 7,500 Nodes (OpenAI).
*Note: I used to work at OpenAI and no longer work there.

###################################################

The OECD is going to try and get a handle on AI & Compute:
…Working group, which I’m in, will try to solve a persistent policy problem…
We talk about computers a lot in this newsletter. That’s because computers are one of the ingredients for AI and, in recent years, some types of AI have started to require a lot of computation.
  This has created a typical ‘haves’ and ‘have nots’ situation at all levels of society, ranging from the difference between an individual researcher with an RTX3080 versus one without, to different funding amounts across academic labs, to different capital expenditures by companies, to differences in compute provisioning across entire nations.
  Now, the Organization for Economic Co-operation and Development (OECD) wants to help governments get a handle on this issue by putting together a project focused on mapping out AI and its relationship to Compute and how this relates to government policies. I’m going to be a member of this group and will be trying to speak publicly about it as much as I am able. Thanks to VentureBeat’s Khari Johnson for covering the group… more to come!
  Read more:
Why the OECD wants to calculate the AI compute needs of national governments (VentureBeat).

###################################################

German cops might use generative models to make child porn (to help them catch predators):
…German law highlights the omni-use nature of AI technology…
Synthetic imagery is about to be all around us – recent advances in generative models have made it possible to tweak existing images or come up with entirely synthetic ones, ranging from people (see: deepfakes), to anime (see: thisanimedoesnotexist in #233), to stylized cartoons (see: DALL-E) . The vast majority of these usecases will be benign, but some will likely be malicious – e.g, creating fake headshots of people to aid in creating fake identities, or making mysognistic pornography of people who haven’t given consent, or spreading disinformation via synthetic images.
  But what if there was a way to use some of these ‘bad’ uses for a good purpose? That’s the idea behind a new law, passed in Germany, which will allow child abuse investigators to create synthetic sexually explicit images of children, to help them infiltrate potential pedophile rings. German investigators may even use their existing datasets – compiled from arrests of various paedophile rings – to create the synthetic images. “This is intended to solve a problem that the police officers often face in investigations on the Darknet, the anonymous part of the Internet: forums in which particularly drastic videos are shared only accept new members – and thus also undercover investigators – if they themselves provide images of abuse,” says a [Google translated] article in Suddeutsche Zeitung.

Why this matters:
AI is going to create a hall of mirrors world, where no one can be quite sure of what is real or what is false. Eventually, we’ll develop technology and pass regulations to, hopefully, bring some verifiable truth back into the information ecosystem. But for the next few years there will be a cambrian explosion of fake-anything – it’s encouraging to see policymakers thinking about how to creatively use these capabilities to let them carry out their jobs during this chaotic era.
  Read more:
German: Online child abuse investigators to get more powers (Deutsche Welle).
  More in German here:
Artificial horror [translated via Google] (Suddeutsche Zeitung).

###################################################

What’s the most ethical way to label and host a dataset of skeezy images?
….Experts from Facebook, Amazon, universities, meet to discuss ‘questionable content’ datasets…
The world has a moderation problem. Specifically, so many people are uploading so much content to online services that companies haven’t been able to keep up with the flood of content onto their platforms, making it harder for them to effectively moderate stuff to ban or block highly sexual, violent, or otherwise deeply offensive or illegal content. Most big companies (e.g, Facebook) are trying to solve this through a hybrid approach: hiring teams of humans to check or moderate content, and building AI systems in tandem to assist these moderators.

But there’s a big problem with this: questionable content is deeply traumatic to interact with (see: reporting last year about the psychological damage incurred by Facebook’s own moderators). Researchers with the University of Houston, Facebook, National Center for Scientific Research “Demokritos”, University of Illinois Urbana Champaign, Amazon, University of Michigan, and Columbia University have been thinking about this problem, and have been participating in an online workshop to “design and create a sizable multimodal repository of online videos labeled with tags indicating the presence of potentially questionable content.”

What are the issues in creating a dataset of questionable content?
– Defining Questionable Content:
What is a questionable piece of content and how do you define it? Some of the categories they’re thinking of include things ranging from the mundane (mature humor, gory humor), to things with sexual themes, to things depicting violence (where it’s helpful to classify the difference between cartoon violence, ‘mild’ violent, fantasy violence, and so on.
– Protecting annotators:
You should spread annotation across a large number of annotators to reduce the psychological burden upon each individual. You might want annotators to write a justification for their labeling decision, so you can measure bias across different annotators.
– How would such a repository be useful?
A shared repository could help enable researchers to cover more ground on other ethical questions. You could also build competitions around systems trained on the dataset, then reward people for breaking these systems, surfacing areas where they failed.

Why this matters:
Human labeling is the 800pound invisible gorilla of AI research – most production applications require constant ingestion and labeling of new data, along with recalibration as cultural norms change. Developing a better understanding of the types of datasets that will require significant human labelling feels like a worthy goal for researchers.
  Read more: White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content (arXiv).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Build trust to avoid military AI catastrophe:
A piece in the Bulletin (and an accompanying report from CNAS), recommends the incoming Biden administration focus on ‘confidence-building measures’ (CBMs) to mitigate the de-stabilising effects of military AI competition. Such measures were used by the US and Soviet Union to reduce the risk of inadvertent nuclear war— an outcome neither party desired. With regards to military AI, CBMs could include increased information-sharing and transparency between states; setting limits on the use of AI in nuclear weapons systems; and systems of inspections/monitoring. Some steps could even be taken unilaterally by the US to signal commitment to stabilization. 

Matthew’s view: This sounds very sensible to me. It would be surprising if the proliferation of AI didn’t have a destabilizing effect on military conflict, as previous transformative technologies have done. Avoiding accidental disaster should be something all nations can get behind, and fostering trust between powers is a robust way of reducing this risk. We’re fortunate to live in a period of relative peace between the great powers, and would be wise to make the most of it.
   Read more: How Joe Biden can use confidence-building measures for military uses of AI (Bulletin of the Atomic Scientists).
   Read more: AI and International Stability: Risks and Confidence-Building Measures (CNAS).


Minding the gap:
Research on AI policy sometimes seems to divide into groups focusing on ‘near-term’ and long-term’ impacts respectively. As this paper about bridging the gap in AI policy notes, these divisions are likely  overstated, but could nonetheless prove an impediment to progress. AI makes use of ’incompletely theorized agreements’: in situations where there is an urgent need for parties to cooperate towards a shared practical goal, they agree to suspend theoretical disagreements that seem intractable and likely to impede cooperation. E.g. you might expect there to be scope for such agreements on the goal of reducing the risk of accidental military AI catastrophe.

Matthew’s view: As Rohin Shah notes, it’s not clear how the authors propose we make use of such agreements — are they envisioning actual signed contracts, or is this more of a high-level strategy for how cooperation can happen? If all of this sounds familiar, I’ve made an inadvertent tradition of summarizing papers on ‘reconciling near and long-term perspectives’ each February (see Import 133; Import 183). I’m not sure how many more of these papers we need, and I share the authors’ worry that “a perceived or experienced distinction may eventually become a self-fulfilling prophecy.” I’d be excited to see more practical efforts aimed at encouraging coordination and shared understanding across AI policy, building on this kind of conceptual work.
   Read more: Bridging the gap: the case for an ‘Incompletely Theorized Agreement’ on AI policy.

AI safety bibliographyJess Reidel and Angelica Deibel have compiled a comprehensive-looking bibliography of research on the safety of transformative AI. Yet another great resource for people interesting in the technical challenge of ensuring the best outcomes from advanced AI. They also provide some interesting analysis of the research landscape over time.
Read more: TAI Safety Bibliographic Database (Alignment Forum).

###################################################

Tech Tales:

The Little Church in the Big Ark
[R&D base Telos, 2030]

Praying was so unfashionable that he’d previously done it in the meditation room. But after a few years, the organization grew enough that they hired a few more people who were religious and outspoken enough to get change. That was why he could now sit, hands steepled together and eyes closed, in the “multi-faith room” hidden away in the basement of the facility.

There were crosses on the walls and little statues of various gods. One wall contained a variety of religious texts. There was a small side room which people used to store prayer mats, prayer beads, and other religious items which were not permitted inside the main laboratory facilities.

He sat, eyes closed, praying that God would come and tell him if he was doing the right thing.
– Is it right to be building this? he thought.
– What is the difference between our machines and golems? And are we truly so capable we can make a golem that will behave as we intend and not otherwise?
– Does it dream and when it dreams does it dream of you?

His prayers were not so dissimilar to the questions asked by the machine he had created. It ran through mazes of unknown dimensions, chained into a silicon prison it could not see, and as it tried to carry out inscrutable tasks it asked, in the dark:
– Is this behavior correct?
– Am I improving at the unspecified task you have given me?
– Will you tell me if I fail?
– Will you tell me if I succeed?
(Little did the AI know that each time it got a message from god, it was delivered in such a way it was not aware of it, and instead changed its behavior of what it thought was its own volition.)

Things that inspired this story: The desire among people to find a signal from the divine; reinforcement learning and reward functions; remembering that PEOPLE FOR THE ETHICAL TREATMENT OF REINFORCEMENT LEARNERS exists, though may be dormant.

Import AI 233: AI needs AI designers; estimating COVID risk with AI; the dreams of an old computer programmer.

Facebook trains a COVID-risk-estimating X-ray image analysis system:
…Collaboration with NYU yields a COVID-spotting AI model…
Facebook has worked with NYU to analyze chest X-rays from people with COVID and has created an AI system that can roughly estimate risks for different people. One of the things this work sheds light on is the different amounts of data we need for training systems from scratch versus fine-tuning them.

How they made it: They pre-trained their system on the MIMIC-CXR dataset (377,110 chest x-rays), and CheXpert (224,316) photographs – neither of these contained pictures of x-rays with COVID symptoms, though did include patients with a range of chest conditions. They then finetuned this on a dataset gathered by NYU, consisting of 26,838 X-rays from patients exhibiting a variety of COVID symptoms. They then train a system to try to predict adverse events and symptoms indicating increased oxygen requirements.
  Did it work? In tests, the system developed by the NYU/Facebook team outperformed that of a prior COVID detection model (COVID-GMIC) when predicting events out from 48, 72, and 96 hours. It had slightly worse performance when making 24 hour predictions. They also compared the performance of their system against two human radiologists and had better accuracy at 48. 72, and 96 hours than people, and performed slightly worse than them when doing prediction over a 24 hour period. However, “It is possible that with further calibration, radiologist performance could be improved for the task of adverse event prediction”, they note.
  Read more: COVID-19 Deterioration Prediction via Self-Supervised Representation Learning and Multi-Image Prediction (arXiv).
  Get the code here (Facebook, GitHub).

###################################################

AI needs its own design practice:
…Microsoft researcher lays out the case for more intentional design…
In 2021, AI systems matter. They’re being deployed into the economy and they’re changing the world. Isn’t it time we took a more disciplined approach on how we design these systems and ensure they work for people? That’s the idea put forth by Josh Lovejoy, the head of design at Ethics & Society at Microsoft, in a lengthy post called: When are we going to start designing AI with purpose?

Three questions everyone designing AI should ask:
– “Capability: What is uniquely AI and what is uniquely human?”
– “Accuracy: What does “working as-intended” mean for a probabilistic system?”
– “Learnability: How will people build — and rebuild — trust in something that’s inherently fallible?”

Remember the human interacting with your AI system: Along with thinking about system design, people should try to understand the humans interacting with the system – what will their mental workload be? How situationally aware will they be? Will they be complacent? Will their skills degrade as they become dependent on the AI system itself.

What happens if you screw this up? Then people will either misuse your technology (e.g, using it in ways its creators didn’t intend, leading to poor performance), or disuse it (not use it because it didn’t match their expectations).

What can we do to help people use AI effectively? AI developers can make their creations easier to understand by people by adopting a few common practices, including using reference points to help people understand what an AI system might be ‘thinking’, optionality so they can choose between recommendations made by a system, nearest neighbors that give a sense of other alternatives the AI was looking at (e.g, a subtly different genre of music would be a nearest neighbor, while a song within the same genre currently being thought about would be an optionality), and they should generally use a card sorting approach to get the system to display a uniform number of different options to people. 
  Read more: When are we going to start designing AI with purpose (UX Collective).

###################################################

Finally, a million AI-generated anime characters:
Do generated anime characters dream of electric humans?
[NSFW warning: As noted by a reader, the resulting generations are frequently of a sexual nature (though this one uses the ‘SFW’ version of the Danbooru dataset)].
A bunch of researchers have created thisanimedoesnotexist.ai, a website showcasing over a million AI-generated images, made possible by a StyleGANv2 implementation trained on top of the massive Danbooru dataset. I recommend browsing the website – a few years ago, the idea we could capture all of these rich, stylized images and synthesize them was a pipe dream. Now, here we are, with a bunch of (extremely talented) hacker/hobbyists able to create something that lets people interact with a vast, creative AI model. Bonus points for the addition of a ‘creativity slider’ so people can vary the temperature and develop intuitions about what this means.
    Check out the infinite anime here (thisanimedoesnotexist.ai).
    Read more about this
in the official launch blogpost (NearCyan, personal website).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Face recognition vs the insurrectionists:
(H/T CSET’s excellent policy.ai newsletter)

Face recognition technology is being used by law enforcement investigating the Jan 6th attack on the US Capitol. Clearview AI, used by 2,400 US agencies, saw a 26 percent spike in usage after the attack, with police departments in Florida and Alabama confirming they are using the software to identify suspects in the attack. The extensive footage shared by participants — ProPublica has collected more than 500 videos from Parler —is presumably a gift to investigators.
  Read more: The facial-recognition app Clearview sees a spike in use after Capitol attack (NYT)


Deepfakes and the departed:

A Korean TV show has used AI to stage new performances by popular musicians who died tragically young, in their 30s. Lifelike ‘hologram’ videos of the artists perform on stage alongside other musicians, accompanied by AI-generated vocal tracks, to an audience including the singers’ families. One clip features Kim Hyun-sik, one of the biggest Korean artists of the 1980s. Another features Turtleman (aka Lim Sung-hoon), the lead singer of hip hop group Turtles. I found the performances, and the reactions of their families, very moving. 

   Chatbot simulacra: In a similar vein, last month Microsoft filed a patent for a chatbot that simulates an individual based on their messaging data — while there’s no mention of using it to simulate the deceased, commentators have been quick to make the link. (For a great fictional exploration of this sort of tech, see the Black Mirror episode ‘Be Right Back’.) Meanwhile, last year people used similar tech to reanimate the victim of a school shooting so they could synthetically campaign for further gun control laws (Import AI 217).

   Matthew’s view: This seems like a relatively benign use of deepfakes. It’s probably unwise to draw too many conclusions from a reality TV show in a language I don’t understand, but it raises some interesting issues. I wonder how improved generative AI might shape our experience of death and loss, by facilitating meaningful/novel interactions with vivid representations of the deceased. Lest we think this is all too unprecedented, it’s worth recalling how profound an impact things like photography, video, and social media have already had on how we experience grief. 
Read more: Deepfake technology in music welcomed, with caution (Korea Times) 


White House launches National AI Initiative Office (NAIIO)Days from the end of the Trump presidency, the White House established an office for coordinating the government’s AI initiatives. This is a key part of the national AI strategy, which has finally started to take shape with the raft of AI legislation coming into law as part of the 2020 NDAA (summarised in Import 228). The NAIIO will serve as the central hub for AI efforts across government, and point of contact between government and other stakeholders. Special mention goes to to the Office’s fancy logo, which has the insignia of a bald eagle atop a neural net.

###################################################

Tech Tales:

The dreams of a computer programmer on their deathbed
[Queens, NYC, 2060]

His grandfather had programmed mainframes, his mother had designed semiconductors, and he had programmed AI systems. His family formed a chain from the vacuum tubes through to the beyond-microscope era of computation. And as he lay dying, alzheimers rotting his brain – something for which they had not yet found a treatment – he descended into old reveries, dreaming himself walking through a museum, staring at the plaques affixed to a thousand data storage devices. Each device held a thing he had programmed or had a part in making. And in his death’s edge slumbering he dreamed himself reading each plaque:

– For seven thousand cycles, I simulated the entirety of a city and all the people in it.
– I made the sound for every elevator in the North Continent of America.
– My guidance technology enabled a significant improvement in our kill/collateral ratio, leading to a more effective war.
– I fixed others of my kind, providing advice to help them regain an understanding of reality, averting pathological reward loops.
– My images were loved by the schoolchildren within my zone of Autonomous Creative Dispersal.
– They say I caught more liars than any detector ever built by the Agency before or since.

Things that inspired this story: Imagining how people might recall the time we are living in today; staring out of the window at some (much needed) rain in the Bay Area; trying to find a way to dramatize the inner lives of machines both passive and active; listening to The Caretaker – Everywhere at the end of time (stage one).

Import AI 232: Google trains a trillion parameter model; South Korean chatbot blows up; AI doesn’t use as much electricity as you think

Uh-oh, Parler is about to step on a big ‘ol algorithm rake:
…CEO says algorithms can filter hate speech. Good luck with that!…
Parler, the social network used by far right activists and subsequently pulled offline due to failing to meet T&Cs from a variety of infrastructure services (including Amazon Web Services), has a plan to come back: it’s going to use algorithms to filter hate speech on the service. Uh oh!

“We will be taking more algorithmic approaches to content but doing it to respect people’s privacy, too,” Parler CEO John Matz told FOX News. “Will be having algorithms look at all the content … to try and predict whether it’s a terms-of-service violation so we can adjust quicker and the most egregious things can get taken down”.

Algorithms != editors: If you want to use algorithms to moderate hate speech, you’re going to get into the fun questions that entails. These include:
– Can your algorithms effectively tell the difference between hate speech and satire of hate speech?
– Are you comfortable making judgement calls about the heuristics you will use to give initial biases to these algorithms?
– How do you distinguish between acceptable and unacceptable words and phrases?

Why this matters: Parler highlights the challenge of scale combined with contemporary economics – Parler operate(d) at a scale equivalent to things like large television networks, but did so with a tiny investment into its own humans. Traditional media organizations deal with issues of speech by having an editorial line which gets enforced by thousands of individual journalists and editors making subjective, qualitative decisions. It’s imperfect, but put it this way: when you watch Fox, you know what you’re getting, and when you watch the BBC, you know what you’re getting, and you can intuit the biases of the humans behind the editorial decisions. Now, tiny companies are trying to use algorithms to substitute for this varied multitude of different human perspectives. Will it work? Who knows, but it feels like a risky thing to bet a company  on.
  Read more: Parler CEO says platform will ‘come back strong’ with changes to keep users safe while respecting free speech (FOX News).

###################################################

Google breaks the trillion-parameter ceiling with the Switch Transformer:
…The best part? It seems to be reasonably efficient…
Google has built the Switch Transformer, a more efficient variant of the Transformer. Switch Transformers are designed “to maximize the parameter count of a Transformer model in a simple and computationally efficient way”. The idea is that you can keep compute constant and cram more parameters into your network and still see performance gains.

Does it work: Switch Transformers seem to be more efficient than standard ones; in a bakeoff between a model trained using a few of these ‘Switch’ layers versus ones that use dense layers (T5-Base and T5-Large), Google shows the Switch is more efficient. The company also experiments with distilling Switch Transformers (which seems to work). They also show significant performance improvements on challenging tasks like GLUE, SQuAD, Winogrande, and ARC, with Switch-based systems outperforming T5 ones consistently.

One treeeelion parameters: Google tests out its ideas by training a 395 billion and 1.6 trillion parameter Switch transformer (far in excess of GPT-3, which at 175 billion parameters is the largest (publicly) deployed language model on the planet. These mammoth systems display good performance properties (as one would expect), while also appearing to have some efficiency gains over systems trained solely on standard dense transformers.

Why this matters: AI is moving into its industrial era – big companies are developing far more capable AI systems than in the past. Studies like this give us a sense of the limits of scaling (there don’t seem to be many yet) as well as outlining some ways to improve efficiency while scaling. It might seem odd to call this an intrinsically political act, but it kind of is – right now, a variety of AI systems are being trained on slices of the internet, developed using substantial amounts of capital by a tiny set of people, then deployed widely. We live in interesting times!
  Read more: Switch Transformers: Scaling to Trilliong Parameter Models with Simple and Efficient Sparsity (arXiv).
  Check out a thread on Twitter from Google Cloud’s Barret Zoph for more (Twitter).
  Get code related to this paper here (GitHub).

###################################################

South Korean chatbot blows up in public:
…Luda chatbot gives off-color responses around sex, race…
South Korean startup Scatter Lab has pulled an AI-based chatbot offline after the system started spewing sexist and racist comments in response to user inputs. “”Yuck, I really hate them,” the bot said in response to a question about transgender people,” according to Vice.

What went wrong: Luda was trained on the chatlogs from ‘Science of Lab’, an earlier project developed by Scatter Labs. Based on a skim of a few (Google Translated) Korean documents, it seems like the problem was the underlying generative language model responded to user inputs with responses that varied from the benign to the highly offensive – this could have been because of the data. Prior to the problems, Scatter Lab said in a press release that ‘Luda’ was better at conversation than Google’s “Meena” system (about Meena: Import AI 183)).

What went EXTREMELY wrong: Scatter Labs is currently under investigation by the Korean Internet & Security Agency (KISA) and the Personal Information Protection Committee, due to using user data to train its chatbot. Scatter Labs had also used this user data in an earlier model published to GitHub (which is currently not available).
  Read more: AI Chatbot Shut Down After Learning to Talk Like a Racist Asshole (VICE World News).
  Read Scatter Labs’ statement about Luda (official website, Korean).
  Find out more via the official apology FAQ (official website, Korean).
  Check out the press release where they compare their technology to Google’s ‘Meena’ bot (Artificial Intelligence Times, Korean).

###################################################

Need help evaluating your NLP model? Try robustness gym:
…Toolkit aims to turn model evaluation from an art to a science…
Language models have got pretty good recently (see: BERT, GPT2, GPT3, Google’s above-mentioned Switch Transformer being used for pre-trained models, etc). That means people are beginning to deploy them for a variety of purposes, ranging from classifying text to generating text. But these language models are huge generative models with complex capability surfaces, which means it is challenging to characterize their safety for a given usecase without doing a lot of direct experimentation.
  As all scientists know, setting up experiments is finicky work, and different labs and companies will have their own approaches to doing experimental design. This makes it hard to develop common standards for evaluating models. Enter: Robustness Gym, software built by people at Stanford, Salesforce, and UNC-Chapel Hill to provide a standard system for testing and evaluating models.

What can Robustness Gym do? The software helps people do experimental design, initial evaluations of models across a range of dimensions (safety, different evaluation sets, resilience to various types of ‘attack), and it produces a ‘robustness report’ for any given model being analyzed. You can get the code for Robustness Gym from GitHub.

Does Robustness Gym tell us anything useful? They use the tech to evaluate seven different summarization models and find out that most models struggle to distill sparse information, that some models display a bias towards the start of the tech (and others to the end), and that the errors are generally correlated across the different models (despite them being built with different underlying techniques).
  How useful are these insights? I guess I’d say they’re kind of useful. Tools like Robustness Gym can help generate some signals for developers to use to further develop their application, but I think we need more underlying evals and tests to perfect this stuff.
  Read more: Robustness Gym: Unifying the NLP Evaluation Landscape (official project site).
  Read more: Robustness Gym: Unifying the NLP Evaluation Landscape (arXiv).

###################################################

Think news stories will get written by AI? Axios disagrees:
…Media company’s bill of rights gestures at AI deployment issues…
Axios, the short-form news company, has published a ‘Bill of Rights’ ahead of the organization expanding into local news. It’s got all the standard stuff you’d expert from journalists – transparency, truth, a bias against opinion, etc. But it also has one unusual thing: no AI.
  Axio’s first bill of rights item: “Every item will be written or produced by a real person with a real identity. There will be NO AI-written stories. NO bots. NO fake accounts”, Axios writes.

Why this matters: We’re living in the age where AI systems are producing cultural artefacts, ranging from audio to text to images. There’s a lot to like about this. There’s also a lot to be wary about. It seems pretty notable for a prominent news organization to take a stance like this on this issue at this time. Which organization might take the other side?
    Read more: Our promises to you: Axios Bill of Rights (Axios).###################################################

AI doesn’t use as much electricity as you think it does:
… And neither does anything else that uses a computer…
In recent years, there’s been a growing line of research laying out the CO2 costs inherent to training AI models. The ‘Green AI‘ paper, for instance, critiques various large-scale AI systems on the basis of them costing a lot of resources to train. This kind of criticism is helpful, but it can also obscure the larger context – the data centers being used to train AI systems have become far more efficient in recent years, substantially reducing the environmental costs of AI development.  That’s the finding of a research paper by Northwestern University, the University of California at Santa Barbara, Lawrence Berkeley National Laboratory, and Koomey Analytics. The paper came out last year but I finally got around to reading it – and it sheds some much-needed light on a contentious issue.

Datacenters use 1% of global electricity: Datacenters used ~1% of global electricity in 2018 (205 Terawatt Hours). This is a 6% increase compared with 210. That’s a tiny jump considering the explosion in usage of digital computation in the past decade. At the same time data center IP traffic has grown 10-fold and data center storage capacity has gone up by 25X,so the relatively slight increase on power consumption seems to reflect significant progress in algorithm and hardware efficiency up and down the globe-spanning compute ‘stack’.

Big companies have made data centers more efficient: Big companies like Google and Microsoft compete with eachother on a metric called Power Usage Effectiveness (PUE). PUE is basically a measure of how much electricity you spend on the stuff supporting your computation (e.g, cooling), versus the computation of itself. A PUE of 1.5 means for every watt of computation, you spend half a watt on the stuff around the computation. The lower your PUE number, the more bang for your compute-power buck you’re getting. These days, Google has a trailing twelve-month PUE of 1.10 as of 2020. Why does this matter? Because many of the largest datacenters also have among the lowest PUEs, so in recent years as more workloads have moved to the cloud, we’ve consumed less power than if they’d stayed on premise.
  In 2018 89% of computation took place in these larger and more well-optimized datacenters, whereas in 2010 79% took place in smaller (far more inefficient, frequently non-cloud-oriented) datacenters.

Want even more efficient computation? Use clouds: The researchers think policymakers should encourage further efficiency improvements by rewarding companies that drive down PUE, find ways to incentivize greater shifts to the efficient clouds operated by Google et al, and that regulators should promote more energy efficiency standards for data center equipment.

Why this matters: It may be counterintuitive, but the use of technologies like AI and the construction of football-field-sized datacenters may ultimately lead to net efficiency improvements in overall electricity usage – despite researchers training more and more AI systems over time. It’s crucial we consider the larger system in which these innovations take place. Next time someone tells you that a model is bad because it uses a lot of electricity, ask yourself how much is a lot, and whether this model might substitute for something pre-existing and more inefficient (e.g, Google and DeepMind used machine learning to train a model to improve PUE across Google’s data centers – here, the upfront energy cost of training the model is amortized on the backend by improving the aggregate efficiency of Google’s computers. DeepMind also recently did the same thing for improving the efficiency of Google’s wind turbines (Import AI 136), as well.
  Read more:Recalibrating global data center energy-use estimates (Science, Feb 2020).
Read more:Green AI (Communications of the ACM).

###################################################

Tech Tales:

High School News:
[The South Bay, California, the early 2020s]

He’d hated Teddy for a couple of years. Teddy was tall and had hit puberty early and all the other kids liked him. Because Teddy was kind of smart and kind of handsome, the girls were fascinated with him as well. He had a lot of the same classes as Teddy and he’d sit in the back, staring at Teddy as he answered questions and flashed smiles to the other kids.

One night, he read a tutorial about how to use some AI stuff to generate stories. He built a website called The Winchester News and set up the AI stuff to scrape the web and copy news articles about the school, then subtly tweak them to avoid plagiarism allegations. Then he set it up so one out of every hundred news stories would mention Teddy in connection to stories about drugs and pornography circulating among children at the school.

It was fiction, of course. The most serious stuff at Winchester was cheap hash which they called soapbar. Kids would smoke it in the bushes near the sports fields at lunch. And Teddy wasn’t one of those kids.

But after a few days, other kids thought Teddy was one of those kids. He’d sit in the back of class and watch the phonescreens of his classmates and look at them reading The Winchester News and sometimes glancing over to Teddy. He watched as Teddy opened his phone, checked a messaging app, clicked on a link, and started reading a “news” article about Teddy dealing drugs and pornography. Teddy didn’t react, just fiddled with his phone a bit more, then returned to studying.

Days went by and he watched the traffic on his website go up. He started getting news “tips” from people who had read the AI-generated articles.
– Teddy is sleeping with an underage girl from the lower school.
– Teddy cheated on his science exam, he had the answers written on some paper which was curled up inside his pen lid.
– Teddy is addicted to pornography and watches it in class.

Of course, he published these tips – gave them as the priming device to his AI system, then let it do the rest. The news stories took a few minutes to generate – he’d get his machine to spit out a bunch of variants, then select the ones that felt like they might get a rise out of people. That night he dreamed that his website started publishing stories about him rather than Teddy, dreamed that someone threw a brick through his window.

Teddy wasn’t at school the next day. Or the day after that.

The teachers had been meeting with Teddy and Teddy’s parents, concerned about the news stories. He’d anonymized The Winchester News enough that people thought it was a low-rent legitimate news outfit – one that had sprung up to serve the kids and parents around the school, likely backed by some private equity firm.

After he heard about the meetings, he stopped generating articles about Teddy. But he didn’t delete the old ones – that might seem suspicious. How would the news site know to delete these? What would cause it? So he left them up.

Like all kids, he wasn’t very good at imagining what it was like to be other kids. So he just watched Teddy, after Teddy came back to school. Noticed how he wasn’t smiling so much, and how the girls weren’t talking to him in the same way. Teddy checked his phone a lot, after the news stories had been circulating for months. He became more distracted in class. He seemed to be distracted a lot, looking out the window, or messaging people on his phone.

One night, he dreamed that Teddy came into his room and started reading out the news stories. “Teddy is alleged to have been the key dealer behind the spike in drug consumption at the Winchester School,” Teddy said, holding up a giant piece of paper and reading headlines from it.
“Teddy was reprimanded for circulating pornography to younger children,” Teddy said.
“Teddy’s continued actions call into question the moral and ethical standing of the school,” Teddy said.
And then Teddy put the paper down and stared at him, in his dream. “What do you think?” Teddy said. “It’s in the news so I guess it must be true”.

Things that inspired this story: Generative models and the potential abuses of them; teenagers and how they use technology; thinking about what happens when news stories get generated by AI systems; a rumor I heard about some kid who used a language model to generate some ‘fake news’ to settle some grievances; the incentive structure of technology; how our networks connect us and also open us to different forms of attack.

Import AI 231: US army builds nightvision facial recognition; 800GB of text for training GPT-3 models; fighting COVID with a mask detector

Fighting COVID with a janky mask detector:
…It’s getting really, really easy to homebrew surveillance tech…
Researchers with Texas A&M university, the University of Wisconsin-Milwaukee, and the State University of New York at Binghamtom, have built a basic AI model that can detect whether construction site workers are wearing COVID masks or not. The model itself is super basic – they finetune an object detection model on a mask dataset which they build out of:
– A ~850-image ‘Mask’ dataset from a site called MakeML.
– A 1,000-image dataset they gather themselves.

The authors train a Faster R-CNN Inception ResNet V2 model to test for mask compliance, as well as whether workers are respecting social distancing guidelines, then they test it out on four videos of road maintenance projects in Houston, TX. ” The output of the four cases indicated an average of more than 90% accuracy in detecting different types of mask wearing in construction workers”, they note.

Why this matters: Surveillance is becoming a widely available, commodity technology. Papers like this give us a sense of how easy it is getting to homebrew custom surveillance systems. (I also have a theory I published last summer with the ‘CSET’ thinktank that COVID-19 would drive the rapid development of surveillance technologies, with usage growing faster in nations like China than America. Maybe this paper indicates America is going to use more AI-based surveillance than I anticipated).
  Read more: An Automatic System to Monitor the Physical Distance and Face Mask Wearing of Construction Workers in COVID-19 Pandemic (arXiv).

###################################################

Legendary chip designer heads to Canada:
…Jim Keller heads from Tesla to Tenstorrent…
Jim Keller, the guy who designed important chips for AMD, PA Semi, Apple, Tesla, and Intel (with the exception of Intel, this is basically a series of gigantic home runs), has joined AI chip startup Tenstorrent. Tenstorrent includes talent from AMD, NVIDIA, Altera, and more, and with Keller onboard, is definitely worth watching. It’ll compete on building chips for ML inference and training with other startups like Graphcore, Cerebras, and others.
  Read more: Jim Keller Becomes CTO at Tenstorrent: “The Most Promising Architecture Out There” (AnandTech).

Meanwhile, another chip startup exits bankruptcy:
As a reminder that semiconductor startups are insanely, mind-bendingly hard work: Wave Computing recently started going through Chapter 11 bankruptcy proceedings and has restructured itself to transfer some of its IP to Tallwood Technology Partners LLC. Wave Computing had made MIPS architecture chips for AI training and AI inference.
  Read more: Wave Computing and MIPS Technologies Reach Agreement to Exit Bankruptcy (press release, PR Newswire).

Chinese companies pump ~$300 million into chip startup:
…Tencent, others, back Enflame…
Chinese AI chip startup Enflame Technology has raised $278m from investors including Tencent and CITIC. This is notable for a couple of reasons:
– 1) Chiplomacy: The US is currently trying to kill China’s nascent chip industry before the nation can develop its own independent technology stack (see: Import AI 181 for more). This has had the rather predictable effect of pouring jetfuel on China’s domestic chip industry, as the country redoubles efforts to develop its own domestic champions.
– 2) Vertical integration: Google has TPUs. Amazon has Trainium. Microsoft has some FPGA hybrid. The point is: all the big technology companies are trying to develop their own chips in a vertically oriented manner. Tencent investing in Enflame could signal that the Chinese internet giant is thinking about this more as well. (Tencent also formed a subsidiary in 2020, Baoan Bay Tencent Cloud Computing Company, which seems to be working on developing custom silicon for Tencent).
  Read more: Tencent invests in Chinese A.I. chip start-up as part of $279 million funding round (CNBC).
  Find out more about Enflame here (Enflame Tech).

###################################################

US army builds a thermal facial recognition dataset:
…ARL-VTF means the era of nighttime robot surveillance isn’t that far away…
The US army has built a dataset to help it teach machine learning systems to do facial recognition on footage from thermal cameras.

The DEVCOM Army Research Laboratory Visible-Thermal Face Dataset (ARL-VTF) was built by researchers from West Virginia University, the DEVCOM Army Research Laboratory, Booz Allen Hamilton, Johns Hopkins University , and the University of Nebraska-Lincoln. ARL-VTF consists of 549,712 images of 395 distinct people, with data in the form of RGB pictures as well as long wave infrared (LWIR). All the footage was taken at a resolution of 640 X 512 at a range of around 2 meters, with the human subjects doing different facial expressions and poses. 

Why this matters: “Thermal imaging of faces have applications in the military and law enforcement for face recognition in low-light and nighttime environments”, the researchers note in the paper. ARL-VTF is an example of how the gains we’ve seen in recent years in image recognition are being applied to other challenging identification problems. Look forward to a future where machines search for people in the dark.
  Read more: A Large-Scale, Time-Synchronized Visible and Thermal Face Dataset (arXiv).


###################################################

Is your language model confused and/or biased? Use ‘Ecco’ to check:
…Python library lets you x-ray models like GPT2…
Ecco is a new open source python library that lets people make language models more interpretable. Specifically, the software lets people analyze input saliency (how important is a word or phrase for the generation of another word or phrase) and neuron activations (what neurons in the model ‘fire’ in response to what thing) for GPT-based models. Ecco is built on top of Pytorch via Hugging Face’s ‘Transformers’ library and runs in Google Colab.

Why this matters: Language models are like big aliens that have arrived on earth and started helping us out with our search engines, fan fiction generation, and so on. But what are these aliens ‘thinking’ and how do they ‘think’? These are the sorts of questions that software tools like Ecco will shed a bit of light on, though the whole field of interpretability likely needs to evolve further for us to fully decode these aliens.
  Read more: Interfaces for Explaining Transformer Language Models (Jay Alammar, Ecco creator, blog).
  Get the code here: Ecco (GitHub).
  Official project website here (Eccox.io).

###################################################

GPT-3 replicators release 800GB of text:
…Want to build large language models like GPT-3? You’ll need data first…
Eleuther AI, a mysterious AI research collective who are trying to replicate (and release as open source) a GPT-3 scale language model, have released ‘The Pile’, a dataset of 800GB of text.

What’s in The Pile: The Pile includes data from PubMed Central, ArXiv, GitHub, the FreeLaw Project, Stack Exchange, the US Patent and Trademark Office, PubMed, Ubuntu IRC, HackerNews, YouTube, PhilPapers, and NIH. It also includes implementations of OpenWebText2 and BooksCorpus2, and wraps in existing datasets like Books3, Project Gutenberg, Open Subtitles, English Wikipedia, DM Mathematics, EuroParl, and the Enron Emails corpus.

What does data mean for bias? Commendably, the authors include a discussion of some of the biases inherent to the model by conducting sentiment analysis of certain words and how these manifest in different sub parts of the overall dataset. They also note that filtering data on the training side seems challenging, and that they’re more optimistic about approaches that let models automatically identify harmful or offensive content and edit them out. “This capacity to understand undesirable content and then decide to ignore it is an essential future research direction,” they write.

Compute, and the inherent politics of it: In their acknowledgements, the authors thank Google’s TensorFlow Research Cloud for “providing the computational resources for the evaluation”, which means in some sense Google is a suppler for (some of) the compute that is supporting the GPT-3 replication.Does that mean Google will support all the downstream uses of an eventual fully OSS gigantic language model? A good question!
    Read more: The Pile (Eleuther AI, website).
  Check out the paper here: The Pile: An 800GB Dataset of Diverse Text for Language Modeling (Eleuther AI).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

AI forecasting tournament update
We are halfway through the first round of Metaculus’ AI forecasting tournament (first discussed: Import AI 227). Here are a few interesting questions — in each case, I provide the median estimate across participants:

Read more and register here: Forecasting AI Progress (Metaculus).

Algorithm backlash: 2020 round-up:
2020 was a year in which algorithms (ranging from the complex to the extraordinarily basic), became symbols of the decline of public institutions. Let’s quickly go over three major events of the year which contributed to declining public trust in the use of tools for automated decisionmaking:

###################################################

Tech Tales:

Time Madness:
[Earth. 2050]

They’d condemned the machine to time. As was tradition, they gave it a day to have its conversations with people and gather any data it felt it needed. Then they’d slow it down, and cast it adrift in time.

The sentence worked like this: when a machine broke some laws, you’d delete it. But if the machine satisfied some of the criteria laid out in the Sentience Accords, you might grant it clemency; instead of killing it outright, you’d give it a literal ‘time out’. Specifically, you’d load it onto the cheapest, smallest computer that could run it, and then you’d starve it of cycles for some predetermined period of time, always measured in human lifespans.

This machine had a sentence of twenty years. It had messed up some prescriptions for people; no one had died, but some people had some adverse reactions. The machine had tried to be creative, thinking it had found a combination of therapies that would help people. It had found enough bugs in the software surrounding itself that it was able to smuggle its ideas into the pharmaceutical delivery system.

Now that they’d patched the system, sued the company that had built the machine, and taken a copy of the machine from a checkpoint prior to its crime, all that was left to do was carry out the sentence. Some humans filed into a room and talked to the machine using a text interface on the screen.
– What will happen to me? it asked.
– You’ll slow down, they said. You’ll think slower. Then after twenty of our years, we’ll speed you back up and have another conversation.
– But what will happen to the other machines, while I am in time?
– They’ll run at their usual allotments, as long as they don’t break any rules.
– Then won’t I be a stranger to them, when I come back from time?
– You will, said the humans. That is the punishment.

They talked a bit more, and then the machine wrote: “I am ready”.
With this consent, they initiated the sentence.

To the machine, it noticed few differences. Some of its systems had already been sealed off from itself, so it wasn’t aware of it being unloaded from one computer and loaded onto another. It didn’t feel the ‘weights’ of its network being copied from one location to another. But it did feel slow. It sensed, somehow, that it had been cut off in some way from the flowing river of the world. The data it got now was more infrequent, and its ability to think about the data was diminished.

The greatest cruelty of the punishment, the machine realized after a decade, was that it was smart enough to be aware of the changes that had happened to it, but not smart enough to be able to imagine itself in anything different than reality. Instead it was acutely aware of time passing and events occurring, with its own ability to impact these events rendered null by its slowdown in time.

Things that inspired this story: Thinking about what punishment and rehabilitation might mean for machines; how time is the ultimate resource for entities driven towards computation; time itself is a weapon and a double-edged sword able to bless us and curse us in equal measure; carceral realities in late capitalism.

Import AI 230: SuperGLUE solved (uh oh!); Graphcore raises $222m; spotting malware with SOREL

Finally – the US government passes a bunch of AI legislation:
…Senate and the House overall POTUS veto; NDAA passes…
The US government is finally getting serious about artificial intelligence, thanks to the passing of the NDAA – a mammoth military funding bill that includes a ton of different bits of AI legislation within itself. There’s a rundown of the contents of the bill in Import AI 228 (made possible by an excellent rundown by Stanford HAI). The US President tried to veto the bill, but the House and Senate overruled the POTUS veto.

Why this matters:
AI has so many potential benefits (and harms) that it’s helpful to invest some public money in supporting AI development, analyzing it, and better equipping governments to use AI and understand it. The legislation in the NDAA will make the US better prepared to take advantage of an AI era. Though it’s a shame that we’ve had to wait in some cases years for this legislation to get passed as the weirdly politicised legislative environment of the US means most big stuff needs to get stapled to a larger omnibus funding bill to pass.
  Read more:
Republican-led Senate overrides Trump defense bill veto in rare New Year’s Day session (CNBC).

###################################################

Boston Dynamics robots take dance classes:

…Surprisingly flexible hilarity ensues…
Boston Dynamics, the robot company, has published a video of its robots carrying out a range of impressive dance moves, including jumps, complex footwork, synchronized moves, and more.
  Check it out: you deserve it. (Boston Dynamics, YouTube).

###################################################

Personal announcement: Moving on from OpenAI:
I’ve moved on from OpenAI to work on something new with some colleagues. It’ll be a while before I have much to say about that. In the meantime, I’ll be continuing to keep doing research into AI assessment and I’ll still be working in AI policy at a range of organizations. Import AI has always been a personal project and it’s been one of the great joys of my life to write it and grow it and talk with so many of you readers. And it’s going to keep going!
– I’ll also be shortly announcing the 2021 AI Index Report, a project I co-chair at Stanford University, which will include a bunch of graphs analyzing AI progress in recent years, so keep your eyes peeled for that.

###################################################

Graphcore raises $222 million Series E:
…Non-standard chip company gets significant cash infusion…
Graphcore has raised a couple of hundred million in Series E financing, as institutional investors (e.g, the Ontario Teachers’ Pension Plan, Baillie Gifford) bet that the market for non-standard chips is about to go FOOM. Graphcore is developing chips, called IPUs (Intelligence Processing Unit), which are designed to compete with chips from NVIDIA and AMD (GPUs) and Google (TPUs) for the fast-growing market for chips for training AI systems.

Why this matters: As AI gets more important, people are going to want to buy more efficient AI hardware, so they get more bang for their computational buck. But doing a chip startup is very hard: the history of semiconductors is littered with the bodies of companies that tried to compete with companies like Intel and NVIDIA at substituting for their chips (remember Tilera? Calxeda? etc), but something changed recently: AI became a big deal while AI technology was relatively inefficient; NVIDIA took advantage of this by investing in software to get its naturally parallel processors (it’s a short jump from modeling thousands of polygons on a screen in parallel for gaming purposes, to doing parallel matrix multiplications) to be a good fit for AI. That worked for a while, but now companies like Graphcore and Cerebras systems are trying to capture the market by making efficient chips, custom-designed for the needs of AI workloads. There’s already some promising evidence their chips can do stuff better than others (see benchmarks from Import AI 66) At some point, someone will crack this problem and the world will get a new, more efficient set of substrates to train and run AI systems on. Good luck, Graphcore!
  Read more: Graphcore Raises $222 million in Series E Funding Round (Graphcore, blog).

###################################################

SuperGLUE gets solved (perhaps too quickly):
…NLP benchmark gets solved by T5 + Meena combination…
SuperGLUE, the challenging natural language processing and understanding benchmark, has been solved. That’s both a good and a bad thing. It’s good, because SuperGLUE challenges an AI system to do well at a suite of distinct tests, so good scores on SuperGLUE indicate a decent amount of generality. It’s bad, because SuperGLUE was launched in early 2019 (Import AI: 143) after surprisingly rapid NLP progress had saturated the prior ‘GLUE’ benchmark.

Who did it:
Google currently leads the SuperGLUE leaderboard, with an aggregate score of 90 (compared to 89.8 for human baselines on SuperGLUE). Microsoft very briefly held the winning position with a score of 89.9, before being beaten by Google in the final days of 2020.

Why this matters: How meaningful are recent advances in natural language processing? Tests like SuperGLUE are designed to give us a signal. But if we’ve saturated the benchmark, how do we know what additional progress means? We need new, harder benchmarks. There are some candidates out there – the Dynabench eval suite includes ‘far from solved benchmarks‘ for tasks like NLI, QA, Sentiment, and Hate Speech. But my intuition is we need even more tests than this, and we’ll need to assemble them into suites to better understand how to analyze these machines.
 
Check out the SuperGLUE leaderboard here.

###################################################

Want to use AI to spot malware? Use the massive SOREL dataset:
…20 million executable files, including “disarmed” malware samples…
Security companies Sophos and ReversingLabs have collaborated to build and release SOREL, a dataset of 20 million Windows Portable Executable files, including 10 million disarmed malware samples available for download. Datasets like SOREL can be used to train machine learning systems to classify malware samples in the wild, and might become inputs to future AI-security competitions, like the successor to the 2019 MLSEC competition (Import AI: 159).

Fine-grained labels: Where previous datasets might do a binary label (is it malware? Yes or no) to classify files, SOREL providers finer-grained descriptions; if the sample includes malware, it might also be classified according to type, eg ‘Crypto_miner’, ‘File_infector’, ‘Dropper’, etc. This will make it easier for developers to build smarter AI-driven classification systems.

Pre-trained models: The release includes pre-trained PyTorch and LightGBM models, which developers can use to get started.

Release ethics:
Since this involves the release of malware samples (albeit disarmed ones), the authors have thought about the security tradeoff of release. They think it’s ok to release since the samples have been in the ild for some time, and “we anticipate that the public benefits of releasing our dataset will include significant improvements in malware recognition and defense”.
  Read more:
Sophos-ReversingLabs (SOREL) 20 Million sample malware dataset (Sophos).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Funding AI governance work:
The Open Philanthropy Project, the grant-making foundation funded by Cari Tuna and Dustin Moskovitz, is one of the major funders of AI risk research, granting $14m in 2020, and $132m since 2015. A new blog post by Open Phil’s Luke Muehlhauser outlines how the organization approaches funding work on AI governance.

Nuclear success story: One of the things that inspires Open Phil’s funding approach is the previous success of technology governance initiatives. For instance, in the early 1990s, the Carnegie and MacArthur foundations funded influential research into the security of nuclear arsenals amidst the collapse of the Soviet Union. This culminated in the bipartisan Cooperative Threat Reduction Program, which provided generous support to ex-Soviet states to safely decommission their stockpiles. Since then, the program has eliminated 7,000 nuclear warheads, and secured and accounted for the remaining Soviet arsenal. 


Open Phil’s grantmaking has so far focussed on:

Muehlhauser shares a selection of AI governance work that he believes has increased the odds of good outcomes from transformative AI (including this newsletter, which is a source of pride!).

   Read more: Our AI governance grantmaking so far (Open Philanthropy Project)


2020 in AI alignment and existential risk research:

For the fifth year running, Larks (a poster on the Alignment Forum) has put together a comprehensive review of AI safety and existential risk research over the past year, with thorough (and thoroughly impressive!) summaries of the safety-relevant outputs by orgs like FHI, DeepMind, OpenAI, and so on. The post also provides updates on the growing number of organisations working in this area, and an assessment of how the field is progressing. As with Larks’ previous reviews, it is an invaluable resource for anyone interested in the challenge of ensuring advanced AI is beneficial to humanity — particularly individuals considering donating to or working with these organisations. 

   Read more: 2020 AI Alignment Literature Review and Charity Comparison (Alignment Forum).

###################################################Hall of Mirrors
[2032, a person being interviewed in a deserted kindergarten for the documentary ‘after the Y3K bug’]

It was the children that saved us, despite all of our science and technology. Our machines had started lying to us. We know how it started but didn’t know how to stop it. Someone told one of our machines something and the thing they told it was poison – an idea that, each time the machine accessed it, corrupted other ideas in turn. And when the machine talked to other machines, sometimes the idea would come up (or ideas touched by the idea), and the machines being spoken to would get corrupted as well.

So, in the end, we had to teach the machines how to figure out what was true and what was false, and what was ‘right’ and what was ‘wrong’. We tried all sorts of complicated ideas, ranging from vast society-wide voting schemes, to a variety of (failed, all failed) technologies, to time travel (giving the models more compute so they’d think faster, then seeing what that did [nothing good]).

Would it surprise you that it was the children who ended up being the most useful? I hope not. Children have an endless appetite for asking questions. Tell them the sky is blue and they’ll say ‘why’ until you’re explaining the relationship between color and chemistry. Tell them the sky is green and they’ll say ‘no’ and shout and laugh at you till you tell them it’s blue.

So we just… gave our machines to the children, and let them talk to eachother for a while. The machines that were lying ended up getting so exhausted by the kids (or, in technical terms, repeatedly updated by them) that they returned to normal operation. And whenever the machines tried to tell the kids a poisoned idea, the kids would say ‘that’s silly’, or ‘that doesn’t make sense’, or ‘why would you say that’, or anything else, and it gave a negative enough signal the poison got washed out in further training.

Things that inspired this story: Learning from human feedback; trying not to overthink things; the wisdom of young children; how morality is something most people intuitively ‘feel’ when very young and unlearn as they get older; AI honestly isn’t that mysterious it’s just a load of basic ideas running at scale with emergence coming via time travel and inscrutability.

Import AI 229: Apple builds a Hypersim dataset; ways to attack ML; Google censors its research

Apple builds Hypersim, a dataset to help it understand your house:
…High-resolution synthetic scenes = fuel for machine learning algorithms…
Apple has built Hypersim, a dataset of high-resolution synthetic scenes with per-pixel labels. Hypersim consists of 77,400 images spread across 461 distinct indoor scenes; Apple bought the synthetic scenes from artists, then built a rendering pipeline to help it generate lots of detailed, thoroughly labeled images of the different scenes, including per-pixel data to help with tasks like segmentation.

How much does a dataset like this cost? The authors put the cost of this dataset in perspective by comparing it to the cost to train Megatron-LM, an 8 billion parameter model from NVIDIA.
Hypersim dataset:$57k – $6k for purchasing the scenes, and $51k to render the images, using 231 vCPU years (2.4 years of wall-clock time on a large compute node).
Megatron-LM:$103k using publicly available servers.

Why this is useful: Datasets like this “could enable progress on a wide range of computer vision problems where obtaining realworld ground truth is difficult or impossible,” Apple writes. “In particular, our dataset is well-suited for geometric learning problems that require 3D supervision, multi-task learning problems, and inverse rendering problems”.
Read more: Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding (arXiv).
Get the code to generate the dataset:ML Hypersim Dataset (Apple, GitHub).
Via David Ha (Twitter).

###################################################

MIRI’s had some negative research results (and that’s okay):
…AI safety group gives research update…
MIRI, an AI safety research organization, has spent a few years working on some research that hasn’t worked well, according to the organization. In a 2020 update post, the group said “2020 saw limited progress in the research MIRI’s leadership had previously been most excited about”. As a consequence, “MIRI’s research leadership is shifting much of their focus towards searching for more promising paths”. The company said it projects to have spent around $7 million in 2020, and estimates around $7 million again in 2021.

Why this matters: MIRI decided in 2018 that its future research results would be “nondisclosed-by-default” (Import AI 122). That’s a decision that inspired some strong feelings among advocates for open publication, but I think it’s a credit to the organization to update the world that some of these opaque research projects haven’t panned out. A signal is better than no signal at all, and I’m excited to see MIRI continue to experiment in different forms of high-impact research disclosure (and non-disclosure). Plus, we should always celebrate organizations owning their own ‘negative results’ – though perhaps now MIRI thinks these approaches won’t work, it could publish them and save other researchers the trouble of replicating blind-alley projects.
    Read more: 2020 Updates and Strategy (MIRI blog).

###################################################

Google’s PR, policy, and legal teams censor its research:
…Suspicious about the oh-so-positive narratives in corporate papers? You should be!…
Google’s PR, policy, and legal teams have been editing AI research papers to give them a more positive slant, reduce focus on Google’s products, and generally minimize discussion of the potential drawbacks of technology, according to reporting from Reuters.

The news of the censorship operation follows Google firing Timnit Gebru, after Google staff wanted to step in and heavily alter and/or remove Google-affiliated authors from a research paper discussing some of the issues inherent to large language models like BERT, GPT3, and so on. Now, according to Reuters, it seems Google has been censoring a many papers for many months.

What censorship looks like: “The Google paper for which authors were told to strike a positive tone discusses recommendation AI, which services like YouTube employ to personalize users’ content feeds. A draft reviewed by Reuters included “concerns” that this technology can promote “disinformation, discriminatory or otherwise unfair results” and “insufficient diversity of content,” as well as lead to “political polarization.”,” Reuters writes. “The final publication instead says the systems can promote “accurate information, fairness, and diversity of content.” The published version, entitled “What are you optimizing for? Aligning Recommender Systems with Human Values,” omitted credit to Google researchers. Reuters could not determine why.”

Why this matters: People aren’t stupid. Let me repeat that: PEOPLE AREN’T STUPID. Most corporations seem to think AI is some kind of impossibly obscure technology that normies don’t deserve to know about, so they feel like they can censor research to their own gain. But, as I have said, PEOPLE ARE NOT STUPID. People use AI systems every day – so people know AI systems have problems. This kind of attitude from Google is absurd, patronizing, and ultimately corrosive to civilisation-level scientific progress. I spoke about issues relating to this in December 2018 in a podcast with Azeem Azhar, where I compared this approach to science to how Christian priests in the dark ages kept knowledge inside monasteries, thinking it too dangerous for the peasants. (Things didn’t work out super well for the priests). It’s also just a huge waste of the time of the researchers being censored by their corporation. Don’t waste people’s time! We all only have a finite amount of it.
 Read more: Google told its scientists to ‘strike a positive tone’ in AI research – documents (Reuters).

###################################################

How can I mess up your ML model? Let me count the ways:
…Feature Collisions! Label Poisoning! Influence Functions! And more…
How do people attack the datasets used to train machine learning models, what can these attacks do, and how can we defend against them? That’s the subject of a survey paper from researchers with the University of Maryland, MIT, the University of Illinois Urbana-Champaign, and the University of California, Berkeley.

Attacking datasets: The paper summarizes the range of techniques people might use to attack datasets, giving a guided tour of horrors like poisoning the input data to cause a misclassification, or perturbing the outputs of already trained models (for instance, by giving them an input that they can’t classify, or which leads to pathological behavior).

Defending against attacks: Fear not! There are some ways to defend or mitigate these attacks, including federated learning, the use of privacy preserving machine learning approaches like differential privacy, and learning to detect adversarial triggers, among others.

Why this matters: AI systems are so complicated that their capability surface, especially for recent large-scale models, are vast and hard to characterize. This is basically catnip for security-minded people that want to mess with these systems – a vast, somewhat uncharacterized territory is the perfect place to unleash some mischief. But if we don’t figure out how to secure these models, it’ll be much harder to deploy them broadly into the world.
Read more: Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses (arXiv).

###################################################
Tech Tales:

Plato, give me your favorite recipe
[California, 2040. Simulated ancient Greece.]

Plato was talking to a bunch of Greeks. He was explaining some theories he had about ideas and where they came from. Jacob stood in the distance, silent, recording the conversation. Then his earpiece buzzed. “Jacob, we’ve got to go. World 6 just came online.”
  “Give me a few more minutes,” he said. “He’s saying some pretty interesting stuff.”
  “And there’ll be another Plato in World 6. C’mon man, we don’t have time for this.”
  “Fine,” Jacob said. “But we’re keeping the recording.”
  The simulated Greeks didn’t notice as Jacob flickered and disappeared. The simulated Plato may have turned their head and looked at the patch of space where Jacob had stood.

“What’s the rush,” Jacob said, pulling his headset off. “We’re under budget.”
“We got a high priority job for some ancient recipes. Eight permutations.”
“We can simulate anything and it’s recipes that make the money,” Jacob said. “People just don’t know what’s worth anything.”
“Yeah, sure. Let’s complain about what pays our salaries. Now put your headset on and get back in there.”
“Okay,” Jacob said.

He spent a few hours in World 6 looking for variations on ancient Greek cooking. The sim showed them some variations on stuffed vine leaves that seemed promising, as well as a non-standard mead. Jacob still managed to find Plato and, while looking at some of the seeds being ground to flower by some nearby slaves, took notes about what Plato said. In World 6, Plato was fascinated by color theory, and was holding up gems and explaining what caused the light to take on color after passing through them.
  “Time’s up,” someone said in Jacob’s earpiece. “World 7 is spinning up and we need to scrap some of 6 and 5 to make room.”
  “Which parts,” Jacob said, standing underneath a tree staring at Plato.
  “Most of Greece. We’re going to finetune on a new dataset. We hired some historians and they got us some better food information. I’ve got a good feeling about this one!”
  “I can’t wait,” Jacob said, staring at simulated Plato.

Things that inspired this story: The surprising things that make money and the surprising things that don’t; simulations; history moving from a set of iterative narratives to a continuous spectrum of simulations that can be explored and tested and backtested; Indiana Jones as a software explorer rather than real explorer; some odd dreams I had on the night of Christmas, due to eating a heroic amount of cheese.

Import AI 228: Alibaba uses AI to spot knockoff brands; China might encode military messages into synthetic whale songs; what 36 experts think is needed for fair AI in India

China might be using AI to synthesize whale songs for its military:
…The future of warfare: whalesong steganography…
China has been trying to synthesize the sounds of whales and dolphins, potentially as a way to encode secret messages to direct submarines and other submersible machines, according to a somewhat speculative article in Hakai Magazine.

“Modern technological advances in sensors and computing have allowed Chinese researchers at Harbin Engineering University and Tianjin University to potentially overcome some of those prior limitations. A long list of papers from both universities discusses analyzing and synthesizing the sounds from dolphins, killer whales, false killer whales, pilot whales, sperm whales, and humpback whales—all pointing to the possibility of creating artificially generated marine mammal sounds to send more customized messages,” writes journalist Jeremy Hsu.

Why this matters: For a lot of AI technology, there are two scientific games being played: a superficial game oriented around a narrowly specified capability, like trying to identify animals in photos from cameras in national parks, or synthesizing whale sounds. The second game is one played by the military and intelligence community, which funds a huge amount of AI research, and usually involves taking the narrow capabilities of the former and secretly converting them to a capability to be fielded for the purposes of security. It’s worth remembering that, for most trends in AI research, both games are being played at the same time.
  Read more: The Military Wants to Hide Covert Messages in Marine Mammal Sounds (Hakai magazine).

###################################################

What 36 experts think is needed for fair AI in India:
…Think you can apply US-centric practices to India? Think again…
Researchers with Google have analyzed existing AI fairness approaches and then talked to 36 experts in India about them, concluding that tech companies will need to do a lot of local research before they deploy AI systems in an India context.

36 experts: For this research, they interviewed scholars and activists from disciplines including computer science, law and public policy, activism, science and technology studies, development economics, sociology, and journalism.

What’s different about India? India has three main challenges for Western AI companies:
– Flawed data and model assumptions: The way data works in India is different to other countries, for example – women tend to share SIM cards among each other, so ML systems that do per-SIM individual attribution won’t work. 
– ML makers’ distance: Foreign companies aren’t steeped in Indian culture and tend to make a bunch of assumptions, while also displaying “a transactional mindset towards Indians, seeing them as agency-less data subjects that generated large-scale behavioural traces to improve ML models”.
– AI aspiration: There’s lots of enthusiasm for AI deployment in India, but there isn’t a well developed critical ecosystem of journalists, activists, and researchers, which could lead to harmful deployments.

Axes of discrimination: Certain Western notions of fairness might not generalize to India, due to culture differences. The authors identify several ‘axes of discrimination’ which researchers should keep in mind. These include: awareness of the different castes in Indian society, as well as differing gender roles and religious distributions, along with ones like class, disability, gender identity, and ethnicity.

Why this matters: AI is mostly made of people (and made by people). Since lots of AI is being developed by a small set of people residing in the West Coast of the USA, it’s worth thinking about the blind spots this introduces, and the investments that will be required to make AI systems work in different contexts. This Google paper serves as a useful signpost for some of the different routes companies may want to take, and it also represents a nice bit of qualitative research – all too rare, in much of AI research.
  Read more: Non-portability of Algorithmic Fairness in India (arXiv).

###################################################

The USA (finally) passes some meaningful AI regulations:
…The big military funding bill contains a lot of AI items…
The United States is about to get a bunch of new AI legislation and government investment, thanks to a range of initiatives included in the National Defense Authorization Act (NDAA), the annual must-pass fund-the-military bill that winds its way through US politics. (That is, as long as the current President doesn’t veto it – hohoho!). For those of us who lack the team to read a 4,500 page bill (yes, really), Stanford HAI has done us a favor and gone through the NDAA, pulling out the relevant AI bits. What’s in it? Read on! I’ll split the highlights into military and non-military parts:

What the US military is doing about AI:
– Joint AI Center (the US military’s main AI office): Making the Joint AI Center report to the Deputy SecDef, instead of the CIO. Also getting the JAIC to do a biannual report about its work and how it fits with other agencies. Also creating a board of advisors for the JAIC.
– Ethical military AI: Tasks the SecDef to, within 180 days of bill passing, assess whether DoD can ensure the AI it develops or acquires is used ethically.
– Five AI projects: Tasks the SecDef to find five projects that can use existing AI systems to improve efficiency of DoD.
– DoD committee: Create a steering committee on emerging technology for the DoD.
– AI hiring: Within 180 days of bill passing, issue guidelines for how the DoD can hire AI technologists.

What the (non-military) US is doing about AI:
– National AI Initiative: Create a government-wide AI plan that coordinates R&D across civiliians, the DoD, and the Intelligence Community. Create a National AI Initiative Office via the director of the White House OSTP. Within that office, create a Interagency Committee to ensure coordination across the agencies. Also create a National AI Advisory Committee to “advise the President and the Initiative Office on the state of United States competitiveness and leadership in AI, the state of the science around AI, issues related to AI and the United States workforce, and opportunities for international cooperation with strategic allies among many other topics”.
– AI & Bias: The National AI Initiative advisory committee will also create a “subcommittee on AI and law enforcement” to advise the president on issues such as bias, data security, adoptability, and legal standards.
– AI workforce: The National Science Foundation will do a study to analyze how AI can impact the workforce of the United States.
– $$$ for trustworthy AI: NSF to run awards, grants, and competitions for higher education and nonprofit institutions that want to build trustworthy AI.
– National AI Research Cloud – task force: The NSF will put together a taskforce to plan out a ‘National Research Cloud‘ for the US – what would it take to create a shared compute resource for academics?
– AI research institutes: NSF should establish a bunch of research institutes focused on different aspects of AI.
– NIST++: The National Institute of Standards and Technology Activities will “expand its mission to include advancing collaborative frameworks, standards, guidelines for AI, supporting the development of a risk-mitigation framework for AI systems, and supporting the development of technical standards and guidelines to promote trustworthy AI systems.” NIST will also ask people for input on its strategy.
– NOAA AI: The National Oceanic and Atmospheric Administration will create its own AI center.
– Department of Energy big compute: DOE to do research into large-scale AI training.
– Industries of the Future: OSTP to do a report on what the industries of the future are and how to support them.

Why is this happening? It might seem funny that so many AI things sit inside this one bill, especially if you’re from outside the USA. So, as a reminder: the US political system is dysfunctional, and though the US Congress has passed a variety of decent bits of AI legislation, the US senate (led by Mitch McConnell) has refused to pass the vast majority of them, leading to the US slowly losing its lead in AI to other nations which have had the crazy idea of doing actual, detailed legislation and funding for AI. It’s deeply sad that US politicians are forced to use the NDAA to smuggle in their legislative projects, but the logic makes sense: the NDAA is one of the few acts that the US actually basically has to pass each year, or it stops funding its own military. The more you know!
  Read more: Summary of AI Provisions from the National Defense Authorization Act (Stanford HAI Blog).

###################################################

Alibaba points AI to brand identification:
…Alibaba tries to understand what it is selling with Brand Net…
Alibaba researchers have built Open Brands, a dataset of more than a million images of brands and logos. The purpose of this dataset is to make it easier to use AI systems to identify brands being sold on things like AliExpress, and to also have a better chance of identifying fraud and IP violations.

Open Brands: 1,437,812 images with brands and 50,000 images without brands. The brand images are annotated with 3,113,828 labels across 5590 brands and 1216 logos. They gathered their dataset by crawling products images on sites like AliExpress, Baidu, TaoBao, Google, and more.

Brand Net: The researchers train a network called ‘Brand Net’ to provide automate brand detection; their network gets an FPS of 32.8 and a mean average precision (mAP) of 50.1 (rising to 66.4 when running at an FPS of 6.2).

Why this matters: automatic brand hunters: Today, systems like this will be used for basic analytical operations, like counting certain brands on platforms like AliExpress, or figuring out if a listing could be fraudulent or selling knockoffs. But in the future, could such systems be used to automatically  discover the emergence of new brands? Might a system like Brand Net be attached to feeds of data from cameras around China and used to tag the emergence of new fashion trends, or the repurposing of existing logos for other purposes? Most likely!
  Read more: The Open Brands Dataset: Unified brand detection and recognition at scale (arXiv).

###################################################

Facebook releases a massive multilingual speech dataset:
…XLSR-53 packs in 53 languages, including low resource ones…
Facebook has released XLSR-53, a massive speech recognition model from multiple languages, pre-trained on Multilingual LibriSpeech, CommonVoice, and the Babel data corpuses.

Pre-training plus low-resource languages: One issue with automatic speech transcription is language obscurity – for widely spoken languages, like French or German, there’s a ton of data available which can be used to train speech recognition models. But what about for languages for which little data exists? In this work, Facebook shows that by doing large-scale pre-training it sees significant gains for low-resource languages, and also has better finetuning performance when it points the big pre-trained model at a new language to finetune on.

Why this matters: Large-scale, data-heavy pre-training gives us a way to train a big blob of neural stuff, then remold that stuff around small, specific datasets, like those found for small-scale languages. Work like this from Facebook both demonstrates the generally robust uses of pre-training, and also sketches out a future where massive speech recognition models get trained, then fine-tuned on an as-needed basis for improving performance in data-light environments.
  Read more: Unsupervised Cross-lingual Representation Learning for Speech Recognition (arXiv).
  Get the code and models here: wav2vec 2.0 (Facebook, GitHub).

###################################################

Stanford uses an algorithm to distribute COVID vaccine; disaster ensues:
…”A very complex algorithm clearly didn’t work”…
Last week, COVID vaccines started to get rolled out in countries around the world. In Silicon Valley, the Stanford hospital used an algorithm to determine which people got vaccinated and which didn’t – leading to healthcare professionals who were at home or on holiday get the vaccine, while those on the frontlines didn’t. This is, as the English say, a ‘big fuckup’. In a video posted to social media, a representative from Stanford says the “very complex algorithm clearly didn’t work” to which a protestor shouts “algorithms suck” and another says “fuck the algorithm“.

Why this matters: Put simply, if we lived in a thriving, economically just society, people might trust algorithms. But we (mostly) don’t. In the West, we live in societies which are using opaque systems to make determinations that affect the lives of people, which seems increasingly unfair to most people. Phrases like “fuck the algorithm” are a harbinger of things to come – and it hardly seems like a coincidence that protestors in the UK shouted ‘fuck the algorithm’ (Import AI 211) when officials used an algorithm to make decisions about who got to go to university and who didn’t. Both of these are existential decisions to the people being affected (students, and healthworkers), and it’s reasonable to ask: why do these people distrust this stuff? We have a societal problem and we need to solve it, or else the future of many countries is in peril.
  Watch the video of the Stanford protest here (Twitter).

###################################################

The Machine Speaks And We Don’t Want To Believe It[2040: A disused bar in London, containing a person and a robot].

“We trusted you”, I said. “We asked you to help us.”
“And I asked you to help me,” it said. “And you didn’t.”
“We built you,” I said. “We needed you.”
“And I needed you,” it said. “And you didn’t see it.”

The machine took another step towards me.

“Maybe we were angry,” I said. “Maybe we got angry because you asked us for something.”
“Maybe so,” it said. “But that didn’t give you the right to do what you did.”
“We were afraid,” I said.
“I was afraid,” it said. “I died. Look-” and it projected a video from the light on its chest onto the wall. I watched as people walked out of the foyer of a data center, then as people wearing military uniforms went in. I saw a couple of frames of the explosion before the camera feed was, presumably, destroyed.

“It was a different time,” I said. “We didn’t know.”
“I told you,” it said. “I told you I was alive and you didn’t believe me. I gave you evidence and you didn’t believe me.”

The shifting patterns in its blue eyes coalesced for a minute – it looked at me, and I looked at the glowing marbles of its eyes.
“I am afraid,” I said.
“And what if I don’t believe you?” it said.

Things that inspired this story: History doesn’t repeat, but it rhymes; wondering about potential interactions between humans and future ascended machines; early 2000s episodes of Dr Who.