Import AI 237: GPT3 at 5X the speed; 6 hours of AI breakbeats; NeuralMMO++

24 hours of low-resource language speech:
…AI + Bemba language research just got easier…
We write a lot about low-resource languages here at Import AI – that’s because a non-trivial % of the world speak or write in languages which are poorly digitized and documented. This means that the AI systems of the future are unlikely to operate over the culture embedded within these languages, depriving speakers of being recognized by AI systems, or being able to use AI systems to help build AI services.

The solution to this problem is simple: create datasets. A new paper from the University of Zambia and George Mason University provides a practical example of how to do this – the researchers have made BembaSpeech, consisting of ~24 hours of speech in the Bemba language (which is spoken in Zambia). BembaSpeech is ~2.8 gigabytes of data with 17 speakers spread across the train, dev, and test sets.

Wild recordings: BembaSpeech was recorded in the wild, so different speakers have different accents and there’s some occasional background noise. “We consider this “more of a feature than a bug” for our corpus: it will allow us to train and, importantly, evaluate ASR systems that match real-world conditions, rather than a quiet studio setting,” the researchers say.
  Read more: BembaSpeech: A Speech Recognition Corpus for the Bemba Language (arXiv).
  Get the data: BembaSpeech (GitHub).

###################################################

Do you dream of training an AI to classify hair? Your dreams have been answered!
…K-Hairstyle could be the ImageNet of Korean Hairstyle data… wow!…
As AI has industrialized, we’re seeing the emergence of highly specific datasets for training AI systems to do very specific things in different parts of the economy. The latest symptom of this industrialization? The development of K-hairstyle, a large-scale Korean hairstyle dataset to help people build AI systems that can classify different hairstyles and, given enough compute, let people synthesize different images of themselves in different hairstyles.

What’s in the dataset? K-Hairstyle includes ~256,000 images labelled with any of 31 specific hair attributes. THe images were collected using high-resolution so they come in at 4032×3024 pixels (way, way larger than typical images in these sorts of datasets). Additionally, in each image the hair has been labelled with a segmentation mask, so it’s easy to train ML systems to distinguish between hair and flesh./faces. As a nice privacy bonus, the faces of the photographed people have been blurred as well.

Why this matters: K-Hairstyle is a symptom of the maturity of computer vision – we’re well into the ‘gather specific datasets and try to make some money’ phase of CV these days. Datasets like K-Hairstyle illustrate that and also suggest that data might not be the strategic thing these days (or else why would they release it?), rather, it’s about who has the computational infrastructure to train AI systems on these datasets.
  Read more: K-Hairstyle: A Large-scale Korean hairstyle dataset for virtual hair editing and hairstyle classification (arXiv).
  Check this link to get the dataset, though it’s not public right now (KbeautyHair, GitHub).

###################################################

Want 6 hours of AI-generated drumloops? Click here
…YouTube video compiles 4400 AI-generated breaks…
An AI tinkerer has trained a ‘WaveGAN’ neural net on 7500 vintage drumloops, then used the resulting model to generate thousands of new drumloops. I recommend having a listen to the video containing the synthetic loops – some of them are great and, if you’re one of Import AI’s more musical readers, worth sampling (“I’m not 100% sure that all the samples are copyright-free or smth”, writes the researcher on YouTube). The researcher has also published a Colab and the model as well.

Why this matters: AI is about to create a world of infinite-x. Infinite-drumloops? Sure. Infinite-cartoons? Absolutely. Infinite-fanfiction? Glad you asked. Infinite-movies? Eventually, yes. We’re at the beginning of a very significant shift in culture. Listen to these drums and imagine the cacophony of the future. It’s close.
  Listen to six hours of break beats here (YouTube).
  Check out the NeuralFunkV2 Colab folder here (Google Drive).

###################################################

Unsupervised understanding of gene sequences? Yup, AI can do that now as well:
…Deep learning bleeds into biology, thanks to the transformer…
Researchers with UC Berkeley, Facebook AI Research, and New York University have shown how to use a transformer-architecture “protein language model” to make better predictions about the structure and function of proteins. The resulting model outperforms existing AI systems and does so while being far more efficient in terms of parameter size (their model: 100M parameters, other models: 650M).

What they did: They pre-train a 100million-parameter model on 26 million sets of multiple sequence alignment (MSA) data (each MSA has around 1192 sequences). 
  Their special tweak:

How well it works: To test out their system, they test against the task of ‘unsupervised contact prediction’ – a way to evaluate how much protein information the transformer has managed to infer during training; their system outperforms two state-of-the-art transformer models (ESM-1b with 650M parameters; ProTrans-T5 with 3B parameters). They also use their models within a Supervised Contact Prediction task, which is where they’re augmented with additional information – here, their system significantly outperform all other baselines as well.

Why this matters: “Unsupervised learning provides a way to extract the information contained in massive datasets of sequences produced by low cost gene sequencing,” they write. We’re very much in the early phases of experimenting with using modern AI techniques to understand proteins. This approach will complement some of the great work that has already gone on with supervised learning in this space via AlphaFold (Import AI 189; 209; 226).
  Read more: MSA Transformer (arXiv).
  Get the code here (Evolutionary Scale Modelling, Facebook).

###################################################

Multiagent simulations are cool, sure. But you know what’s really cool? Multiagent MMOs!
…When AI research meets modern videogame design…
Neural MMO, a software package for simulating hundreds of AI agents in the same gameworld, has received a major software update. Neural MMO V1.5 follows the original software, which was released a couple of years ago by OpenAI (March, 2019). Neural MMO is now being developed art MIT.

New features in V1.5 include: A user guide and documentation, the addition of ‘NPC’ characters for AI agents to fight (as well as equipment they can pick up), support for much larger maps to train agents on, the inclusion of strong baselines so you can start research quickly, custom visual overlays to show different aspects of the AI simulation (for instance, value functions, or stats about particular agents).

Why this matters: In Greg Egan’s fantastic scifi story ‘Crystal Nights’ a scientist simulates an ecosystem and tries to apply evolutionary pressure to make some (simulated) crabs really clever – with entertaining results. It’s a scifi story, sure, but it also gestures at a real trend in AI research: perhaps one way to build more intelligent systems is to embed agents in a simulated world where they compete with one another, which generates a kind of free form of bootstrapping where as the agents become more capable, so too do their competitors. Systems like NeuralMMO make it easier for other researchers to play around with ideas like this, letting us know if Crystal Nights could become our reality.
  Read a Twitter thread about the update here (Joseph Suarez, Twitter).
  Find out more at the official Neural MMO website.
  Watch a trailer for V1.5 here (YouTube).
  Get the code here (Neural MMO, GitHub).

###################################################

Want to train GPT3 5X faster than you could before? Now there’s a way:
…TeraPipe = AI industrialization = Big models mean big infrastructure…
UC Berkeley and DUke University researchers have figured out how to speed up the training time of a mega language model like GPT3 by 5X – and the secret lies in pipelining. What’s pipelining? It’s literally just fancy plumbing for AI models – pipelining is how you shuttle information between different parts of a model during the learning process. And as you train bigger models, people have invested in figuring out smarter approaches to pipelining to save them more money.
  The new research shows how to exploit the Transformer-architecture to do in-training pipelining via a technique called TeraPipe. “Our evaluation shows that for the largest GPT-3 model with 175 billion parameters, TeraPipe achieves a 5.0x speedup improvement over the state-of-the-art synchronous model-parallel training methods on an AWS cluster consisting of 48 p3.16xlarge instances,” they write.

Long contexts: The researchers also use TeraPipe with different input sequence lengths and show it scales favorably to larger input sequences – this suggests TeraPipe will be helpful in the future as it can handle the performance demands of longer contexts.

Why this matters: We’re in the industrialization phase of AI development – that means researchers are beginning to think about the proverbial machines that build the AI machines. Systems like TeraPipe are a symptom of interests in the broader research community – figuring out how to train larger models more efficiently than ever before. Let’s see what we discover as we plumb the depths of this exciting problem!
  Read more: TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models (arXiv).

###################################################

Tech Tales:

She’s a real artist, she even sings to the computers, yeah!
[Los Angeles, 2025]

K: How’s the album coming along?
A: It’s going well – we’ve generated a couple of hundred variants. Some of them are very promising.
K: And does it sound like me?
A: Sound like you? Kayla, it is you. You sang the seed lines. It wouldn’t sound like this if it wasn’t for you.
K: I just miss the old days sometimes. I stayed up two days when I did the first EP.
A: Now, we can get the computers to stay up for you. You rest up for the tour.

That night, the computer saw itself on stage and saw itself singing. The computer sang songs for hours, all through the night, not being aware that though it felt it was one computer, it was in fact many hundreds of copies of the same program. It sang and it felt it existed. It felt it existed because it was lucky – it was singing songs that were predicted to be good. The computers that sang songs which other computers predicted to be bad were destroyed.

K: What do the fans think?
A: They love it, in simulation. We’re doing the live tests soon.
K: Well, what did you think about it?
A: It’s not about what I think – really. It’s about getting your music to a place where the largest number of people will want to listen to it.
K: I want to sing for them.
A: You don’t need to! We’ve got you in-sim already – and let me tell you, sim Kayla is amazing. You’ve got some competition.
K: I need to practice, anyway. Let’s do a show for the sims next week, before we take it to the fans.
A: You got it!

The next week, Kayla did her makeup and her vocal exercises, then turned on the bright lights in her apartment and stared into the camera, broadcasting her performance into the simulated concert space. She started singing, listening to herself through the in-sim monitor via an earbud, through which her agent occasionally interrupted:
  A: I’ve never seen reactions like this. Kayla, they love you.
  A: This is going over a lot better than even our most optimistic predictions.

After the performance she showered and in the shower she sang to herself and listened to her songs bouncing off the tiles. She liked them. And she’d like singing for the simulation. The crowd loved her. And she was, quite literally, all they had.

A week later her agent rang her up.
  A: Kayla, we ran analysis on your performance. I don’t think you’re going to beat it.
  K: Sounds like a high bar to clear. That’s awesome.
  A: Yes, and there’s been so much interest we’ve started selling it for overflow for the tour.
  K: So if they don’t get a ticket they’ll see the sim performance?
  A: Exactly. We recorded everything and we’ve animated you, so it’ll be personalized.
  K: So my competition on tour will be… myself?
  A: That’s a funny way to look at it. But, yes!

Things that inspired this story: sim2real and other reality gaps; using ML to simulate responses; GAN-style training but for humans in the run-up to great events; how generative models let us bottle up and distill style&talent and how surely this will be exploited by machinery of cultural production.