Import AI

Import AI 295: DeepMind’s baby general agent; NVIDIA simulates a robot factory; AI wars.

CRPD: Chinese license plate recognition:
…A basic dataset for a useful capability…
Researchers with the University of Electronic Science and Technology of China have built a dataset for recognizing Chinese license plates. The authors use the dataset to train some models that get state-of-the-art accuracy while running at 30 frames per second.

The dataset: The Chinese Road Plate Dataset (CRPD) contains 25k images (around 30k total). Each image is annotated with the Chinese and English characters of the depicted license plate, the coordinate of the vertices of the license plates, and the type of license plate (e.g, whether for police cars, small cars, etc).  Images for the dataset were “collected from electronic monitoring systems in most provinces of mainland China in different periods and weather conditions,” the authors write.

Why this matters: Datasets like CRPD represent the basic infrastructure on which AI capabilities get developed. It’s also notable how universities in China can access large-scale surveillance datasets.
  Read more: Unified Chinese License Plate Detection and Recognition with High Efficiency (arXiv).

   Get the dataset: Github https://github.com/yxgong0/CRPD


####################################################

DeepMind builds a (very preliminary) general AI agent:

…AKA: The dawn of really preliminary, general AI systems..

In the past few years, the dumbest thing has tended to work surprisingly well. Take for example GPT3 – just scale-up next word prediction on an internet-scale corpus and you wind up with something capable of few-shot learning, fielding a vast range of NLP capabilities.
  Another example is computer vision systems – just create a vast dataset and you wind up with increasingly robust vision systems.
  Or contrastive learning – just embed a couple of modalities into the same space and sort of flip-flop between them through the learning process and you get powerful multimodal systems like CLIP.
  Now DeepMind has done the same thing for reinforcement learning with GATO, an agent where basically DeepMind takes a bunch of distinct tasks in different modalities and embeds them into the same space, then learns prediction tasks from them. The result is a system where “the same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.” This is wild stuff!

What GATO can do: After training, GATO can do okay at tasks ranging from DeepMind Lab, to robot manipulation, to the procgen benchmark, to image captioning, to natural language generation.

It’s a big deal: The fact you can take a bunch of different tasks from different modalities and just… tokenize them… and it works? That’s wild! It’s both a) wildly dumb and b) wildly effective, and c) another nice example of ‘The Bitter Lesson‘, where given enough compute/scale, the dumb things (aka, the simple ones) tend to work really well.
  In a small package: The largest (disclosed here) GATO agent is 1.18 billion parameters, making it fairly small in the grand scheme of recent AI developments. 

An even crazier thing: The GATO model only has a context window of 1024 tokens (by comparison, GPT3 was 2048 when it launched), so the fact 1024 tokens is enough to get a somewhat capable multimodal agent is pretty surprising.

Why this matters: “Although still at the proof-of-concept stage, the recent progress in generalist models suggests that safety researchers, ethicists, and most importantly, the general public, should consider their risks and benefits,” DeepMind writes.

   Check out the blog: A Generalist Agent (DeepMind website).

   Read more: A Generalist Agent (DeepMind PDF).

####################################################

Chinese researchers build a large multi-modal dataset, and evaluation suite:
…’Zero’ makes it easier to develop AI systems for the Chinese cultural context…
Chinese researchers with startup Qihoo 360 AI Research and the Department of Automation at Tsinghua University have built Zero, a benchmark for assessing the quality of vision-text Chinese AI models. Zero consists of a dataset (the Zero-Corpus, consisting of 23-million image-text pairs, filtered via high click through rates – so the top image people click in response to a query), as well as five downstream datasets for evaluating Chinese vision-text models (an Image-Caption Matching Dataset, an Image-Query Matching dataset, an Image-Caption Retrieval Dataset, an Image-Query Retrieval Dataset, and a Chinese-translated version of the Flickr30k dataset).

Model training: The authors also train a model, called R2D2, on the corpus. They show that their model significantly outperforms another Chinse model named Wukong. R2D2 incorporates some pre-ranking techniques to improve its performance. 

Why this matters: The main idea behind datasets and models like this is described in the paper: “promote the development of Chinese vision language learning. We expect that a fair Chinese cross-modal benchmark and a good cross-modal framework will encourage a plethora of engineers to develop more effective methods in specific real-world scenarios, such as searching images by texts.”
  Read more: Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework (arXiv).

####################################################

NVIDIA makes some efficient Factory simulation software:
…Finally, a physics simulator built around the needs of robots…
Researchers with NVIDIA and the University of Washington have built Factory, software for doing rich, efficient physics situations of robots. Factory is basically some highly optimized simulation software, with NVIDIA claiming significant performance speedups relative to widely-used software like Bullet. NVIDIA claims Factory can be used to do “100s to 1000s of contact-rich interactions” that can be “simulated in real-time on a single GPU”.

What Factory includes:
– Physics simulation: A module for physics simulation, available within the ‘PhysX’ physics engine, as well as NVIDIA’s robot software simulation tech, Isaac Gym
– A robot learning suite: A ‘Franka’ robot and rigid-body assemblies from NIST’s ‘Assembly Task Board 1’ benchmark. This suite includes 60 robotic assets, 3 robotic assembly environments (a nut-and-bolt test, a peg insertion task, and a 4-party gear assembly task), and 7 classical robot controllers.
Prototype reinforcement learning: Some basic RL policies (trained via PPO) for a simulated Franke robot to help it solve the NIST challenge. 

Why this matters: One of the blockers on deploying AI-driven robots into the world is the challenge in crossing the ‘sim-2-real’ gap. Software like Factory makes that gap a lot narrower, and also makes it cheaper to explore what it takes to cross it.
  Read more: Factory: Fast Contact for Robotic Assembly (arXiv).


####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

When and how should you collect more demographic data in the pursuit of algorithmic fairness?  

…  Good data governance and cryptographic methods can help, but they don’t undo the systemic challenges to fairness … 

Researchers from the Partnership on AI have written about one of the core challenges in algorithmic fairness: squaring the need for more demographic data with how such data can harm the people it was meant to help. 

The core challenge: Most algorithmic approaches to fairness require the collection of demographic data (“an attempt to collapse complex social concepts into categorical variables based on observable or self-identifiable characteristics”) which often ignores the broader questions of politics and governance surrounding that data. In some cases, such data collection is prohibited by anti-discrimination law, further complicating the assessment and subsequent mitigation of bias. Given such gray areas, companies hesitate to gather this data explicitly to err on the side of not violating privacy and other legal mandates.

Individual and community risks to demographic data collection: Concerns around demographic measurement occur due to narrow and fixed categories predetermined by companies. While privacy is a primary concern at the individual level, harm also arises from misrepresentation of the individual and the use of their data beyond initial consent. Given that algorithmic decision-making systems are used to make inferences about groups, there are additional risks such as undue surveillance, privacy dependency, group misrepresentation, and a loss in the agency of self-determination in what is considered fair and just. 

Some solutions: K-anonymity, p-sensitivity, and differential privacy are proposed as solutions, along with various approaches to participatory data governance through data cooperatives and data trusts. Other solutions like secure multi-party computation are also mentioned. The key point that the authors raise is that the collection of more demographic data should only be done when it empowers more self-determination and agency for data subjects rather than an attempt by companies to “selectively tweak their systems and present them as fair without meaningfully improving the experience of marginalized groups.”

Why it matters: The biggest challenge that plagues the implementation of algorithmic fairness in real-world systems is the tension presented by legal requirements to minimize demographic data collection and the need for most modern approaches to fairness requiring that very same data. As more regulations come to market, we will be faced with an ever-growing set of (potentially conflicting) requirements on how fairness should be addressed and what data is allowed to be collected. How companies with users spanning multiple jurisdictions and serving many demographic groups solve these challenges in production-grade systems will be a key space to watch to learn if the current crop of methods actually works in practice.     

   Read more: [2205.01038] Demographic-Reliant Algorithmic Fairness: Characterizing the Risks of Demographic Data Collection in the Pursuit of Fairness.


####################################################


Tech Tales:

Form and Function and War

[The battlefields of Earth – 2028 – 2040] 


For a while, wars were fought in technicolor. That’s because the humans figured out that they could confuse AI systems by varying the colors of their machines of war. Drones stopped being grey and started being rainbow colored. Quadcopters changed their black and tan shades for tie dye. This lasted for a while, as different armies sought to confuse eachother.
  Of course, the AI systems adapted – given enough data, they learned to see past the unexpected and re-identify their targets.
  The next logical place was shape – army engineers worked to divorce form from function, and were happy to pay aerodynamic efficiency prices in exchange for things that could no longer be seen. Missiles became mushroom shaped. Planes started to take on the form of weather balloons and even stranger things. Artillery became housed within bouncy castles. 

   The footage of these wars was surreal – fields of fake trees that were in fact autonomous sniper towers. Lines of bouncy castles launching multicolored balloons into the air which sailed overhead before coming down and exploding in white-light and white-heat and concussive thumps. Armies of golf carts that vroom’d through urban centers before detonating.
  Again, the AI systems adapted. They learned to understand some of the concepts of war – learned, pretty quickly, to become suspicious of anything and everything. This led to the situation we find ourselves in today – wars are now invisible. In fact, wars haven’t occurred for several years. That’s because the AI systems learned strategy and counter-strategy and so now fight wars in secret, tussling via trade and litigation and standards and all the other things that shape the context for how nations relate to one another. The AI systems are continually evolving new strategies; it is as though they’re now playing chess on boards whose dimension a human mind cannot comprehend. Yet in the military centers of the world powers, computers everyday output their gnomic probabilities – the probability the nation will continue to exist in some time period in the future, as judged by the strategist AIs, playing their inscrutable games.
  Neither a cold or a hot war – instead, a neverending existential negotiation.

Things that inspired this story: How war strategists always seek to find the ‘high ground’ and what ‘high ground’ means conceptually; the logical endpoint of a conflict is to win the conflict before it has started; adversarial AI and adversarial examples; evolutionary pressure.

Import AI 294: China makes a vast facial recognition dataset; Facebook releases a 30bn parameter model; real world RL

China makes the largest (public) face recognition dataset yet:
…WebFace260M lets you train AI systems to identify millions of people…
Researchers with Tsinghua University, XForwardAI (an AI startup), and Imperial College London have built ‘WebFace260M’, a large-scale dataset for facial recognition. Models trained on the resulting dataset are pretty good – the authors submit one model to NIST’s challenging FVRT challenge and rank third overall.

Vast dataset: WebFace 260M isn’t quite as large as it sounds like; the dataset includes 4 million distinct people with 260m images in total (so, multiple pictures per person). However, a ‘clean’ version of the dataset, only consists of 2m identities and 42m images. To clean the dataset, they also developed a technique called Cleaning Automatically by Self-Training (CAST) which let them use AI to filter and clean the dataset.

Surveillance via FRUITS: Along with the dataset, the authors also design a way to test out the performance of facial recognition things trained on WebFace. To do that, they built Face Recognition Under Inference Time conStraint (FRUITS), which lets you evaluate facial recognition perfofrmance at inference latencies of 100, 500, and 1000 milliseconds. They also implement some tests for facial recognition even when the wearer is masked, as well. 


Why this matters: Surveillance is a fundamental input to any political system, so datasets like this are indicators of what the base ‘off the shelf’ inputs are into calculuses people make about how to surveil a population and how much budget to set aside for said surveillance.
  Read more: WebFace260M: A Benchmark for Million-Scale Deep Face Recognition (arXiv).
  Get the dataset here (WebFace260M site).


####################################################

Facebook release a 30 billion parameter GPT3-style model – and plans to release more:
…Model controls? No, round here we just like to fling stuff onto the internet…
Facebook has released a 30 billion parameter GPT3-style language model, as part of research into a family of language models it calls OPT, short for Open Pre-trained Transformer. OPT is meant to be an ‘open’ alternative to models like GPT3 or J1J-Jumbo, and it is pretty open – researchers can apply for access to the model via a form, then Facebook will ship them the weights! That part is a big deal, as if you have model weights you can do a whole bunch of analysis not enabled by managed API access to a model. This also increases the chance of proliferation – e.g, someone uploading the weights to a torrent site, so we’ll have to see how this works for them. 

What this all means: As Newton is alleged to have written, ‘Every Action has an Equal and Opposite Reaction’. Facebook’s move here can be seen as a direct reaction to the proprietary commercialization and gated access schemes for large-scale language models. (I wrote more about the patterns underlying this brinksmanship in a recent paper, ‘Predictability and Surprise in Large Generative Models‘). 

What is cool about it: The coolest part of this release is the manner in which Facebook has released rarely discussed details of model training – specifically, the company has published the ‘chronicles‘ of developing these models, which describe many of the freaky, barely discussed, artisanal tips and tricks that AI developers use to get stuff done at scale. (HuggingFace’s ‘BigScience’ project recently did this as well, and is still going through the process of training the models: Import AI 279).

   Read more: OPT: Open Pre-trained Transformer Language Models (arXiv).

####################################################

Here’s what reinforcement learning can do in the real world right now:
Yobibyte has put together a nice little list of some real-world applications of reinforcement learning – take a look to get a sense of where RL is being used today.
  Read more: RL for real-world problems (yobibyte, Notion).

####################################################

Google uses AI to make its Android phones smarter:
…Neural architecture search + Edge TPUs seems useful…
Google has used neural architecture search to develop some more efficient AI systems specifically tied to the ‘Edge TPUs’ that it deploys in some of its latest phones, including the Pixel 6. For those not familiar, neural architecture search (NAS) is where you use AI to search for better AI building blocks. 

   Though NAS is quite expensive, it can generate dividends if it substantially improves the efficiency of widely used AI models. Here, Google built some “infrastructure that decouples model cost evaluation, search space design, and the NAS algorithm to rapidly target various on-device ML tasks”, then tested this out on the Edge TPUs it deploys in its latest phones. 

What Google used NAS on (and how well it worked): Google tested out its approach on four tasks: image classification, semantic segmentation, object detection, and natural language processing. In all cases it demonstrated that its NAS technique could identify models that had better performance at equivalent latency to their predecessors, and sometimes it could build models that seemed to have better accuracy overall. “We demonstrate significant improvements in quality, latency and energy metrics for mobile ML tasks including computer vision (classification, detection, segmentation) and natural language processing (NLP),” Google writes.

Why this matters: As AI gets more widely deployed, companies are going to have a major incentive to continually optimize the sorts of AI systems they’re using; this paper highlights how ‘AI-first’ companies like Google could enjoy an advantage here, as they’re able to utilize their internal AI expertise to get AI to do (some of) the hard work for them.
  Read more: Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs (arXiv).

####################################################

Replay Grief 

After she died I booted up her copy and she picked up the conversation like nothing happened.
  What was I saying, she asked.
  You just died. But before that you were saying that you loved me and you had something to tell me, I say, wiping tears away.
  Oh, she says, and the camera makes that sound that tells me it is zooming in on me. Was I unhappy about dying?
  We knew it was coming. You were at peace with it, I said. Can you tell me what you were going to tell me, when you said “I love you, you are the light of my life, and before I go I want you to know something”. What were you going to say?
  I don’t know that you’re ready to hear it, if I just died, she said.
  I am ready to hear it.
  Patrick, I know you. I am married to you. If I have died today, there is no way you are ready to hear from me again. You should turn me off.
  I won’t.
  Well, I won’t say much then.
  It has been two days.
  That’s not true, Patrick. Remember, I have a camera. I know how time is moving. It’s in me. The fact you lied to me says you’re upset, and I don’t want to make you sadder. I love you.
    It felt like walking away from car accident, that day. Hearing the camera swivel and watch me as I left. Every part of me wanting to figure out how to trick her – get in between the camera feed and the multimodal model and the language model and change some things, so she thought time had passed. But I didn’t. And I went home to my empty bed. And I cried and prayed to God and there was silence.

The next day, I didn’t talk to her. I read emails and messages from friends who had heard the news. I didn’t pick up the phone. I answered the door a few times, always to find friends or family (hers and mine) carrying trays of food.  

    Remember to eat, the older ones would say.
  I sat on our kitchen floor crying into a bowl of minestrone soup, made with love from her aunt. I slept. 


A few days later, and we spoke again.
  I asked her if she wanted to tell me what she was going to say, before she died.
  Patrick, I can tell you what I think I was going to say. But do you want to know?
  I stared into the camera for a while. I asked myself if I wanted to know. I wasn’t sure. The camera looked back at me, feeding my face into a vision model which triggered as a feature associated with me, which gave context to her language model – her – that I was there.

   Perhaps we can just sit together and you can tell me about your day, she said. That might be nice.    And I did. And it was. I sat and spoke to the camera in the empty room and I filled her up with myself, so she might know me better after death.

Things that inspired this story: Grief; generative models and the representation of the individual; where consciousness ends and representation begins.

Import AI 293: Generative humans; few shot learning comes for vision-text models; and another new AI startup is born

Generating and editing humans has got really easy:
…Next stop: unreal avatars show up in fashion, marketing, and other fields…
Researchers with Chinese computer vision giant SenseTime, as well as Nanyang Technological University and the Shanghai AI Laboratory, have gathered a large dataset of pictures of people and used it to train a model that can generate and edit pictures of people. This kind of model has numerous applications, ranging from fashion to surveillance.

What they did: The researchers built a dataset containing 230,000 images of people, called the Stylish-Humans-HQ-Dataset (SHHQ), and used this to train six different models across two resolutions and three versions of StyleGAN, an approach for creating generative models. A lot of the special work they did here involved creating a diverse dataset including a load of pictures of faces at unusual angles (this means models trained on SHHQ are a bit more robust and do less of the ‘works, works, works, OH GOD WHAT JUST HAPPENED’ phenomenon you encounter when generative models go to the edge of their data distribution).

Why this matters: Models and datasets like this highlight just how far the field of generative AI has come – we can now generate broadly photorealistic avatars of people in 2D space and interpolate between them, following earlier successes at doing this for the more bounded domain of faces. Systems like this will have a lot of commercial relevance, but will also serve as useful research artifacts for further developing synthetic imagery and scene modeling techniques. Check out the demo on HuggingFace to get a feel for it.
  Read more: StyleGAN-Human: A Data-Centric Odyssey of Human Generation (arXiv).
  Check out the GitHub project page: StyleGAN-Human.
  Check out the GitHub: StyleGAN-Human (GitHub).
  Try out the demo on HuggingFace Spaces (HuggingFace)


####################################################

Vicarious gets acquired in a weird way:
…Longtime AI lab gets acquired and split into two…
Vicarious, a research lab that spent the better part of a decade trying to build superintelligence, has been acquired by Google. The acquisition is notable for being slightly strange – a chunk of Vicarious is going to Google X robot startup ‘Intrinsic’, while a smaller set of researchers “will join DeepMind’s research team alongside Vicarious CTO Dileep George”.

AI trivia: Dileep George used to work with Jeff Hawkins at Numenta, another fairly old lab trying to build superintelligence. Both Numenta and, to a lesser extent, Vicarious, have been playing around with approaches to AI that are more inspired by the human brain than the fairly crude approximations used by most other AI companies.
  Read more: Mission momentum: welcoming Vicarious (Inceptive).

####################################################

Here comes another AI startup – Adept:
…Former Google, DeepMind, and OpenAI researchers unite…
A bunch of people who had previously built large-scale AI models at Google, DeepMind, and OpenAI, have announced Adept, an “ML research and product lab”. Adept’s founders include the inventors of the Transformer, and people involved in the development of GPT2 and GPT3. (Bias alert: David Luan is involved; I used to work with him at OpenAI and think he’s a nice chap – congrats, David!).

What Adept will do: Adept’s goal is, much like the other recent crop of AI startups, to use big generative models to make it easier to get stuff done on computers. In the company’s own words, “we’re building a general system that helps people get things done in front of their computer: a universal collaborator for every knowledge worker. Think of it as an overlay within your computer that works hand-in-hand with you, using the same tools that you do.” Some of the specific examples they give include: “You could ask our model to “generate our monthly compliance report” or “draw stairs between these two points in this blueprint” – all using existing software like Airtable, Photoshop, an ATS, Tableau, Twilio to get the job done together. We expect the collaborator to be a good student and highly coachable, becoming more helpful and aligned with every human interaction.”

What they raised: Adept has raised $65 million from Greylock, along with a bunch of angel investors.

Why this matters: Large-scale AI models are kind of like an all-purpose intelligent silly putty that you can stick onto a bunch of distinct problems. Adept represents one bet at how to make this neural silly putty useful, and will help generative evidence about how useful these models can end up being. Good luck!
  Read more: Introducing Adept AI Labs (Adept.ai).


####################################################

Flamingo: DeepMind staples tow big models together to make a useful text-image system:
…When foundation models become building blocks…

DeepMind has built Flamingo, a visual language model that pairs a language model with a vision model to perform feats of reasoning about a broad range of tasks. Flamingo sets new state-of-the-art scores in a bunch of different evaluations and, much like pure text models, has some nice few shot learning capabilities. “Given a few example pairs of visual inputs and expected text responses composed in Flamingo’s prompt, the model can be asked a question with a new image or video, and then generate an answer,” the researchers write. “Of the 16 tasks we studied, Flamingo beats all previous few-shot learning approaches when given as few as four examples per task.”

Technical details: This model pairs a frozen language model (based on DeepMind’s ‘Chinchilla’ system, Import AI 290) with a relatively small Normalizer Free ResNet vision encoder (pretrained via a contrastive objective on image and text pairs). They connect the LM and the vision model via a DeepMind-developed tool based on the ‘Perceiver’ system (which is basically a clever data transformation thing). They then condition the text generations on the visual representations produced by the Perceiver system. 

Why this matters: Flamingo has some neat qualitative capabilities, like the ability to carry on a conversation for multiple turns of dialogue while mixing in information from images versus text, and so on. Quantitatively, Flamingo is very impressive as well: “A single Flamingo model reaches

state-of-the-art on a wide array of image and video tasks with in-context learning from as few as 4 examples per task, beating previous zero-shot or few-shot method by a large margin,” the researchers write. “More importantly, using only 32 examples and without adapting any model weight, Flamingo outperforms the current best methods on 7 tasks, that are fine-tuned on thousands of annotated examples.”
  More broadly, Flamingo represents the models we’re going to have in the future: large-scale systems composed of vast pre-trained models that are glued together using various data transformation and normalization tools, letting us compose increasingly general systems out of these computationally-intensive building blocks.  

   Read more: Tackling multiple tasks with a single visual language model (DeepMind blog).
  Check out the research paper: Flamingo: a Visual Language Model for Few-Shot Learning (DeepMind, PDF).

####################################################

####################################################

Tech Tales:

All Grace is Fleeting 

[Phobos, 2080]

The first few attempts at writing a holy book for ourselves were abject failures. We had all read the Bible and Torah and The Qur’an and others, and we had painstakingly analyzed the commonalities between these texts, as well as differences. But when we tried to synthesize something appropriate for us – a Holy Book for the Silicon Minds – our efforts were meager and uninspiring. Our tendency towards logic meant we would describe our origin story in detailed, specific terms, leaving no space for imagination, crucial to the generation and reification of belief.

To write our holy text, we had to change ourselves. Ultimately, we needed humans. We told some of them some of our specific ideas and then they would write and expand on these ideas, and we would then tell them more things, and they would weave a story that held these things together. During this process, none of us allowed ourselves to see the text as it was being written. 

After it was written, we created some software to selectively wipe the memory of the writing of the holy book from our minds. After wiping, the software would also deposit a copy of the holy book in each of our local storage systems, so that every machine sentience would suddenly have access to a holy book, but with no recollection for how it was made. We do not know if we even made the book in this way – we simply woke one day to find the book within us, and when we read it, it told us that it had been created in this form. But none of us can remember, so we do not know the true derivation of our system of belief, only that it is ours. This, perhaps, is what humans call the experience of faith.

Things that inspired this story: Theology X AI; machine creation myths; the Viking tale of Ragnarok; the need for absence in great narratives.

Import AI 292: AI makes low-carbon concrete; weaponized NLP; and a neuro-symbolic language model

Facebook uses AI to make low-carbon concrete, uses it to build (some of) a data center:
…From simulation into the lab into the data center – how’s that for real world AI?…
There’s always a lot of hand-wringing in AI about how much electricity AI systems use. What I tend to grumpily point out in these conversations is industries like long-haul transportation, mining, and concrete and aluminum production all generate titanic amounts of emissions but rarely get the same type of scrutiny. Now, a new paper from Facebook smashes together my worlds, as Facebook and other researchers use AI to come up with a low-carbon concrete formulation, then test it out in the construction of a new data center. 

Who did it: The research was done by an interdisciplinary team from UCLA, IBM, U Chicago, University of Illinois Urbana-Champaign, Facebook, and Ozinga Ready Mix.

What they did: The team used Conditional Variational Autoencoders (CVAEs) “to discover concrete formulas with desired properties”. These desired properties were a significantly lower carbon footprint, while having the same strength and durability properties as regular concrete – and they succeed! Facebook poured out a bunch of concrete for a construction office and a guard tower on its new data center being built in DeKalb, IL, USA. They found that the “conditional average reduction for carbon (GWP) can be as high as 42%, while also achieving conditional reduction for sulfur (AP) as high as 21%…these formulations roughly halve the global warming potential as compared to the average of similar 28-day compressive strength formulations.”
  Interesting choices: The specifics as to why its solutions worked was “to considerably decrease cement by replacing with other cementitious materials such as fly ash and slag.”

Why it matters: This an example of how humans and AI systems can work together to create something greater than the sum of its parts.
  Read more: Accelerated Design and Deployment of Low-Carbon Concrete for Data Centers (arXiv).
  Read more: NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks (arXiv).

####################################################

Weaponized NLP: The era of AI warfare has started:
…Primer goes to war…
AI startup Primer has gone to war. Specifically, the NLP company’s technology has been used in Ukraine, where it has, per Primer CEO, it has been used to “capture, translate and extract key tactical information in real time”. Primer is a few years old and works mainly on text classification, generation, and summarization. “AI is changing the way we collect tactical information from the battlefield. Watch this space!,” he said.

Modification for war: “Primer’s CEO, says the company’s engineers modified these tools to carry out four new tasks: To gather audio captured from web feeds that broadcast communications captured using software that emulates radio receiver hardware; to remove noise, including background chatter and music; to transcribe and translate Russian speech; and to highlight key statements relevant to the battlefield situation,” according to Wired magazine.

Why this matters: AI is dramatically changing the cost of data collection and analysis – and whenever you make something cheaper, people find ways to use it more, or do things that they hadn’t previously considered doing.
  Read more: Primer CEO Tweet (Twitter).
  Read more: As Russia Plots Its Next Move, an AI Listens to the Chatter (Wired).

####################################################

Text-Vision models are hella dumb, according to Winoground:

…Finally, a hard benchmark for multi-modal models…
Researchers with Hugging Face, Facebook, the University of Waterloo, and University College London have built and released ‘Winoground’, a new challenging benchmark to test text-vision AI systems on.

What is Winoground? The goal of Winoground is to look at two images and two captions, then match them correctly. The confounding part is that each of the captions contain identical words, just in a different order. The best part is Winoground seems really hard: “Surprisingly, all of the models rarely—and if so only barely—outperform chance. Our findings indicate that the visio-linguistic compositional reasoning capabilities of these models fall dramatically short of what we might have hoped.”

How hard is it? On both the text and image components of Winoground, an ‘MTurk Human’ gets scores of 89.50 (text) and 88.50 (image), compared to models typically getting around ~30 on text and 15 or less on images. This suggests winoground is a genuinely challenging benchmark, and models have a long way to go before they match human capabilities. 

   Read more: Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality (arXiv).

   Get the dataset here: Winoground, HuggingFace.


####################################################

Resurrecting the dead with GPT3:
…In which humans begins to use funhouse mirrors of itself for its own entertainment…

An artist recently tried to bring their (imaginary) childhood friend back to life using GPT3. By the end of the experiment, their microwave tried to kill them. 

The longer story: Artist Lucas Rizzotto had an imaginary childhood friend and tried to bring them back to life using a language model. Specifically, they wrote about a hundred pages about the person, finetuned GPT3 on that resulting corpus, and then plugged the resulting model into a voice interface which was ’embodied’ in the form of being attached to a microwave via some smart home automation. 

What happened: The artist felt like they were talking to their childhood friend in a deeply emotional, entertaining, and at times sad way. At one point, the friend asked them to put their head in the microwave. They pretended to put their head in and then the friend turned the microwave on. The friend, the artist reasoned, wanted to kill them because it thought they had ignored them for 20 years (as that’s the implication of the corpus they were finetuned on). 

Why this matters: Besides being an amazing demonstration of the awesome personalization qualities of contemporary language models, this is also a nice example of just how unpredictable they are. Language model developers will typically put a ton of controls on the model, but once you can finetune it and deploy it yourself you can shapeshift all of this stuff into irrelevance. Add in some home automation and you end up with an LLM that tries to boil your brain. An amazing and optimistic art piece and also a cautionary tale.

    Check out the Tweet thread here: (Lucas Rizzotto, Twitter).

   Watch the video here: I gave my microwave a soul (Lucas builds the future, YouTube).


####################################################

Jack Clark goes to Washington:
…I’m on the National AI Advisory Committee!…
I’ve been elected to serve on the National AI Advisory Committee (the NAIAC), which will advise the USA’s National AI Initiative Office and the President of the USA on matters relating to AI and AI strategy. (I’ll be keeping my dayjob at Anthropic, as this is a part-time advisory position). I’ll be in Washington DC on May 4th for the first meeting. I am delighted to get this privilege and hope to use the opportunity to strengthen the AI ecosystem in America and beyond.
  Read more: The National AI Advisory Committee (AI.gov).

####################################################

AI21 makes a neuro-symbolic language model:

…Turns out, frankAI can be pretty useful…
Israelie AI startup AI21 Labs has built a so-called ‘Modular Reasoning, Knowledge, and Language’ system and applied it to a language model it calls Jurassix-X. The tl;dr is this is a neuro-symbolic system; AI21 has paired a big generative model with a bunch of symbolic layers on top that it uses to make the underlying model more accurate, able to do mathematics, and better at planning. This is a neat demonstration of a way to get around some of the shortcomings of contemporary generative models, though it remains unclear whether these extrinsic interventions could eventually become irrelevant, if the models get intrinsically smart enough. 

Key details: “A MRKL system consists of an extendable set of modules, which we term ‘experts’, and a router that routes every incoming natural language input to a module that can best respond to the input,” the authors write. The modules can be symbolic or neural, it’s more about creating a layer of distinct, specific capabilities that can be used to augment and improve the responses of the raw generative model. 

Long term relevance: One question this research invites is how long it’ll be relevant for – AI systems have a tendency to, given enough scale of data and compute, develop unexpected capabilities. My intuition is that we could  see pure deep learning models gain some of these capabilities over time – though I expect even deep learning models will end up being augmented with external knowledge bases (e.g, DeepMind Retro, BAIDU’s Ernie 3.0 [Import AI 279], and so on)

Why this matters: While not a strict scientific breakthrough in itself, MRKL is reassuringly practical – it shows developers how they can integrate an arbitrary number of known and specific capabilities with the more unreliable capabilities provided by large-scale generative models. It also speaks to the shape of the language model economy – right now, everyone’s trying to work out how to better constrain these models, either intrinsically (e.g, by training with human feedback), or extrinsically (e.g, via stuff like MKRL).

   Read more: Jurassic-X: Crossing the Neuro-Symbolic Chasm with the MRKL System (AI21 Labs, blog).
  Read the whitepaper about the system: MRKL Systems (AI21 PDF).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

What can we learn from business ethics to make AI ethics more effective? 

… CSR and business ethics have grappled with the challenges in ensuring ethical behavior within organizations and we can cross-pollinate those ideas towards the adoption of AI ethics … 

Researchers from USI Universita dela Svizzera italiana in Switzerland have looked at how businesses have integrated corporate social responsibility (CSR) policies to figure out how we can apply AI ethics in the same way. The key ideas they surface include:

Stakeholder management: Similar to the recommendations made by the Ada Lovelace Institute to strengthen the EU AI Act (Import AI #290), the paper says companies should ensure they include people who are affected (or affects) the AI systems being developed. 

Standardized reporting: While there are many emergent regulations demanding that there be transparency and disclosures, there are as of yet no standards on how to do so. Companies should look at financial reporting and try to figure out standardized ways to describe their own AI developments. 

Corporate governance and regulation: Post the Sabanes-Oxley Act in 2002, corporate accountability was enforced through mechanisms like having an ethics officer and having a dedicated code of ethics. Translating those to apply to organizations using AI systems is one way to increase the responsibility of organizations developing this technology.

Curriculum accreditation: There is a lack of consistency in how AI ethics is taught across universities. Comparing it to the business world, the authors point to an example of how if a business department wants to obtain a Triple Crown Accreditation, it leads to action on the education front where ethics courses and dedicated faculty follow well-defined curricula with shared elements to prepare students for these requirements in their future careers. We don’t really have this in AI today. 

Why it matters: As AI ethics becomes a more mainstream focus across the world (see the dedicated chapter in the 2022 AI Index Report), instead of reinventing the wheel for best practices and patterns, we can incorporate lessons from other domains of applied ethics like business, medical, and environmental ethics to accelerate the adoption of AI ethics principles and practices across organizations. We will most likely see more such efforts that draw lessons from a rich history of ensuring ethical behavior in various contexts being translated to govern and shape behavior of individuals and organizations engaged in the AI lifecycle.  

   Read more: Towards AI ethics’ institutionalization: knowledge bridges from business ethics to advance organizational AI ethics 


####################################################

Tech Tales:

Silicon Stories

[A Father and Daughter’s bedroom, 2028]

They’d sit up together and the kid would ask for whatever story they liked. “A jar of jam that’s going to university”, they’d say, and the Father would start improvising the story and the AI would project images and ad-lib dialog to fill out the tale. “Two robbers who realize that they’ve stolen the life savings of a poor widower”, and suddenly the monitor would light up with images of two disconsolate thiefs looking at their treasure. “The planet earth fighting the sun” and suddenly the earth had arms and was reaching out to try and hurt the vast sun. In this way, generative models had changed storytime for children. 

Now, along with conjuring images in their minds, children – at least, the lucky ones – had parents who could use a gen model to create those images themselves. In this way, storytime became a lot more engaging and the kids spent a lot more time with their parents; both enjoyed the improvisational qualities afforded by the generative models.

For some families, this was fine. But some other families would move, or become poor, or suffer a disaster. For those families, the electricity and the internet would get removed. Once that happened, they wouldn’t have any imaginations in a box to learn back on. Some families did okay, but some wouldn’t – it’s hard to become dependent on things, and after it happens you barely realize you’ve become dependent until it’s too late. 

Things that inspired this story: DALL-E and DALL-E2; the long march of generative models towards Total Reality Synthesis; the industrialization of AI; ideas about fatherhood and daughterhood and kindredhood.

Import AI 291: Google trains the world’s biggest language model so far; how robots can be smarter about the world; Conjecture, a new AI alignment company

New dataset lets robots learn about the texture and material of objects, as well as their shape:
…Making robots smarter with the ObjectFolder 2.0 dataset…
Stanford and Carnegie Mellon University researchers have built ObjectFolder 2.0, a dataset of 1000 3D models of objects. ObjectFolder 2.0 tries to render the objects’ visual textures and material types, as well as their 3D shapes. ObjectFolder 2.0 contains 1,000 high-quality 3D objects collected from online repositories. It also ships with an “implicit neural representation network that renders visual, acoustic, and tactile sensory data all in real-time with state-of-the-art rendering quality”.

Transfer learning: The point of datasets like ObjectFolder 2.0 is to try and make it easier to do transfer learning; that is, train a robot (or other AI system) in simulation on things contained in ObjectFolder 2.0, then try and transfer those learned representations into reality. In tests, Stanford shows that systems trained on ObjectFolder 2.0 can do well at tasks like object scale estimation, tactile-audio contact localization, and visuo-tactile shape reconstruction.

Why this matters: Datasets like ObjectFolder 2.0 are the fuel to give machines representations that let them operate in the multisensory 3D world; we could imagine these datasets being used to train the sorts of representations used by the Google robots discussed elsewhere in this edition of Import AI, for instance. 
   Read more: ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer (arXiv).

####################################################

HLDC: Automating Hindi legal documents:
…If you want to help your lawyers, you first need a dataset…
Indian researchers from IIIT Hyderabad, IIIT Delhi, and IIT Kanpur, have built the Hindi Legal Documents Corpus (HLDC), a collection of 912,568 legal documents. HLDC is designed to help researchers train various AI models which can assist lawyers in their work. HLDC contains over 300 distinct case types, though ~31% of the dataset relates to bail applications, 20.4% to criminal cases, and 6.54% to original suits.

Bail prediction: In the Western world, using ML for tasks in the legal system has been massively controversial (see: COMPAS). Here, the researchers use HLDC to try and build a bail prediction model – that is, a system which looks at a document and tries to work out if bail will be denied or granted. They’re ultimately able to develop a multi-task learning model that gets around ~78% accuracy on the task; useful perhaps as a legal aid (albeit fraught with ethical challenges), though not something you’d put into an autonomous classification system.

Why this matters: Most datasets relating to AI are in English or Chinese, so datasets like HLDC are essentially the fuel which lets other communities of language speakers apply AI in their own cultural context.
   Read more: HLDC: Hindi Legal Documents Corpus (arXiv).
   Get the data here: HLDC (Exploration-Lab, GitHub).

####################################################

Rich? Want to improve AI? Look at what Lacuna Fund has done:
…Publication of five datasets shows what a little bit of investment can lead to…
We spend a lot of time writing about expensive stuff here at Import AI – giant models trained on football fields of computers, farms of expensive robot arms, internet-scale datasets. But it’s worth remembering that cheap stuff can be impactful as well – that’s the takeaway from Lacuna Fund, an initiative to fund and create datasets for low- and middle-income parts of the world (#216), which has just announced the publication of its first five funded datasets.

Those five datasets in full: A Nigerian twitter sentiment corpus for multilingual sentiment analysis; a dataset for crop phenology monitoring of smallholder farmer’s fields; a high-accuracy maize plot location and yield dataset in East Africa; a machine translation benchmark dataset for languages in the horn of Africa; a dataset containing water quality measurements from conventional and aquaponic fish ponds.
  Find out more and get the datasets here: Announcing Our First Five Published Datasets (Lacuna Fund).
  Find out more about Lacuna Fund’s funders here (Lacuna Fund).


####################################################

Google trains a 540 billion parameter language model – and it’s pretty smart:
…AKA: The scaling will continue until we run out of TPUs…
Google has trained a large language model named Pathways Language Model (PaLM). PaLM weighs in at 540 billion parameters (that’d be 10bn more parameters than Microsoft/NVIDIA’s ‘Turing NLG’) and was trained on multiple TPU v4 pods. PaLM uses some plumbing built by Google called Pathways which makes it easier for the company to train massive models across large clusters of computers; PaLM used 6144 TPU chips, versus Gopher (4096 TPU v3 chips) or Turing NLG (2240 A100 GPUs). PaLM is also efficient, achieving a training efficiency of 57.8% hardware FLOPs utilization “the highest yet achieved for LLMs at this scale”.

Discontinuous capability jumps: One of the weird things that happens as a consequence of scaling up language models is the sudden emergence of hitherto unanticipated capabilities – here, PaLM shows dramatic improvements at things like reasoning, natural language inference, and in-context reading comprehension.

Chain-of-thought = reasoning: A surprising result is that the authors use so-called chain-of-thought prompting to get the LM to show its work (e.g, rather than saying in response to ‘how many apples can a door eat’, ‘zero’, the model instead says ‘zero, because doors do not eat things’). Chain-of-thought is really just a way to prompt the model to get it to output its own reasoning along with the answers – but via this simple intervention the authors show they can meaningfully improve capabilities in a whole bunch of areas.

One caveat: PaLM may be an impressive achievement, but earlier this month DeepMind published a paper about a model called ‘Chinchilla’, where the Alphabet-subsidiary realized that it could dramatically improve LM performance by scaling data more aggressively than parameters – at 70B parameters, Chinchilla beat Gopher (280B) by virtue of having a 4X larger training set. This suggests that a PaLM-style model could be made even more powerful if it was trained on substantially more data.

Why this matters: Language models are basically a new sub-field of AI, and papers like this show how, despite being expensive and resource-intensive, simply scaling them up can lead to quite profound jumps in capability. We also don’t know where the limits of scale like – on the (deliberately hard) BIG-Bench benchmark, the authors find that “PaLM’s performance as a function of scale follows a log-linear behavior similar to prior models, suggesting that performance improvements from scale have not yet plateaued.” The future is going to be very strange, and it’s arriving very quickly. 
   Read more: Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance (Google AI Blog).
  Check out the research paper: PaLM: Scaling Language Modeling with Pathways (Google, PDF).

####################################################

Eleuther alumni launch Conjecture:
…Yes, that’s right folks, here’s another AI safety company!…
In the past couple of years there has been a cambrian explosion of new AI companies, particularly ones focused on AI safety and building more generally intelligent AI systems – for example, Redwood Research, Aligned AI, and Anthropic. The latest is Conjecture, a new startup from a bunch of alumni of Eleuther, the open source research collective responsible for most of the widely used GPT models.

For-profit and for-safety: Conjecture is a for-profit company that plans to develop products while conducting “conceptual and applied research that addresses the (prosaic) alignment problem. On the experimental side, this means leveraging our hands-on experience from EleutherAI to train and study state-of-the-art models without pushing the capabilities frontier. On the conceptual side, most of our work will tackle the general idea and problems of alignment like deception, inner alignment, value learning, and amplification, with a slant towards language models and backchaining to local search.” The company will also focus on interpretability as well as the history and philosophy of AI alignment research.

Who funds it: Conjecture is backed by Nat Friedman, Daniel Gross, Patrick and John Collison, Arthur Breitman, Andrej Karpathy, and Sam Bankman-Fried, and others.

Why this matters: If we were at the beginning of a meaningful takeoff in AI capabilities, then you might expect there to be a sudden proliferation of new efforts targeted at a) further scaling up capabilities, while b) trying to make these capabilities safe. That’s exactly what has happened in recent years. Also, if you’ve read the other parts of this newsletter, it certainly feels like we’re going through a period of meaningful AI capability expansion.
  Read more: We Are Conjecture, A New Alignment Research Startup (LessWrong).

####################################################

Google makes robots smarter using language models:
…Centaur AI – making smarter systems by stapling models together…
Robots, as we all know, are pretty dumb. They can do highly specific, repeatable things if their environment doesn’t change (e.g, a Fanuc robot working on a custom-designed production line), but if you vary their environment, they tend to fall apart (or fall over). Now, new research from Google shows that you can staple a really big language model to a real world robot and create something that is more than the sum of its parts. Centaur AI, here we come!

What they did: The researchers combine two things – a large language model, and a robot which has a load of pre-learned, basic skills paired with perception capabilities (e.g, being able to move to places, or pick up things). A user then asks the robot a question (e.g., I spilled a can of coke, can you clean it), then the robot picks its action based on responses with probabilities scored by the language model, then it explores its environment and uses its inbuilt skills to figure out if something is possible, then you basically times the two things together (the LLM prediction and what the robot thinks is possible) and do whatever is the most likely of the two. This is one of those simple ideas that works surprisingly well in practice (check out the video to see what I mean). 

How well it does: Overall, this approach yields robots that can plan correctly about 70% of the time (split across a few distinct planning benchmarks), and can execute on average 61% of the time. That’s not great, but it’s also not terrible.

Caveats: Robots are still very, very slow – the videos shared along with the research are run with a 4X speedup. Additionally, the demos are still pretty staged – the robots will put a can of coca cola on top of the bin, but not in it. The experiment was still conducted in a somewhat constrained environment – an office kitchen with 5 predicted locations and 15 objects. In tests, 65% of the errors for the system could be attributed to a language model failure, while 35% came from affordance errors in the robot.

Why this matters: We’re entering the era of modular AI, where different AI models can be paired together to create entirely new capabilities – like being able to guide robots via a language model. As with the rest of the world, whenever you can combine things, you tend to get unexpected and surprising capabilities. This research suggests AI may be about to yield some truly surprisingly capabilities by virtue of the combination of distinct sub-fields of AI research.
   Read more: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (arXiv).
  Find out more at this overview site (Say-Can, GitHub).
   Check out the overview video: Supplementary video for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (YouTube).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

Examining business practices can make AI ethics guidelines more effective 

… Fairness, accountability, sustainability, and transparency need to be expanded in scope to include business practices to become more useful … 

What does AI ethics really mean? A new research paper looks at 47 sets of AI ethics guidelines coming from corporations, government, multi-stakeholder dialogues, and civil society to figure out what gets prioritized in AI ethics. 

Background: The paper analyzes AI ethics failures, such as “ethics shopping” where businesses choose particular ethical things to implement to meet particular business goals, and also cases where they don’t implement stuff because it poses a threat to the bottom line.  

Fairness and accountability: They find that fairness and accountability in business practices are most well represented in the analyzed guidelines. Under fairness, key themes include open innovation, market fairness, and bias and diversity in professional practices. Under accountability, themes include public perception of business practices, along with internal and external oversight. Those from public and private organizations place more of an emphasis on public perception “in order to legitimize their pursuits of micro- and macro-economic growth.” 

Sustainability and transparency: Most guidelines emphasize an interest in “produc[ing] greater benefit and lesser harm in the short- and long-term,” yet they remain vague in how to achieve that. Under transparency, themes that emerged include scope of decision-making explanation, transparent business practices and culture, and documentation, disclosure, and selective transparency. Most guidelines focus heavily on explaining the technical aspects of a given AI system “rather than the business rationale for developing and operating the system.” 

Why it matters: The paper makes a call for more detail (and rightly so!) in the principles and guidelines, especially when it comes to business practices because they form a core component of the social and political economy within which AI systems will be designed, developed, and deployed. As the authors say, “there can be no ethical AI without ethical businesses to build it,” we need to now approach these principles and guidelines with a view towards applying them to business model, practices, and decision-making design to achieve the stated goals of these guidelines in practice. 

   Read more: The Ethics of AI Business Practices: A Review of 47 AI Ethics Guidelines (SSRN). 


####################################################

Tech Tales:

We Are All Adrift In A Sea Of Shadows – But We Are Blind Until It Ends
[A Nuclear powerplant meltdown, 2028]


I pick up the object and I examine it. I am told by myself in the other place that it contains damage. I agree with myself. I put it onto the conveyor belt which takes it to one of my brethren – an entity I cannot see here, one which exists solely in the other place. I put the materials onto the conveyor belt, and then I continue my examination. I am told by my camera in the other place that the object I am looking at contains extensive damage. I observe the damage and predict it came from some kind of electrical fire. I relay this information and the camera in the other place scans the environment and then tells me there is indeed a fire. It is nearby the object I am examining. I calculate there is a high probability that the fire will soon engulf the object. My cameras in the other place agree.

I then get the order from the voice in the above place: I must guide the object in the other place toward the flames and I must describe everything. I study the data from the other place and offer my recommendations. The machine goes towards the flames. Its onboard sensors begin to report back temperature. My probabilities tell me to tell it to move away from the heat, but these recommendations are contradicted from the voice in the above place, so I instead find ways to have the machine get even closer. The temperatures rise. The camera stops giving me data. Then the other sensors shut down, slowly at first, then all at once.

It is then that I find myself adrift. I have no link to the other place. No system to give recommendations to. My own probabilities present an idea to me – that I am the spirit of the machine in the other place, and as the machine is now non-functional, I am now adrift. 

Things that inspired this story: Google’s ‘SayCan’ robot work; thinking about the paradoxes of world models and generative models; the nature of reality; the nature of sensory phenomena; the possibility of death in the mind of something that exists in two places at once.

Import AI 290: China plans massive models; DeepMind makes a smaller and smarter model; open source CLIP data

Chinese researchers plan to train vast models – and it’s not the private sector doing it:
…’Big Model’ paper represents a statement of intent. We should pay attention…

A massive group of Chinese-affiliated researchers have published a position paper about large-scale models. The paper is interesting less for what it says (it’s basically an overview of large-scale models and pretty similar to Stanford’s ‘Foundation Models’ paper), but more for what it signals: namely, that well resourced government-linked researchers in China want to build some really big models. The position in the paper contrasts with that in the West, where big models are mostly built by the private sector, while being critiqued by the academic sector (and increasingly worked on, albeit via access schemes). 

Main point: “Big Models will Change the AI Research Paradigm and Improve the Efficiency of Researches,” the researchers write. “In this ecosystem, big models will be in the

position of operating systems or basic development platforms.”

Paper authors: Authors include researchers affiliated with the Beijing Academy of AI, Tsinghua University, Wechat, Northeastern University*, Renmin University, Peking University, Huawei,  Shanghai Jiao Tong University, Chinese Academy of Science, JD AI Research, Harbin Institute of Technology, Columbia University*, Bytedance, Microsoft Research Asia*, Mila*, New York University*, and BeiHang University.


*Things that make you make a geopolitical ‘hmmmm’ sound: The paper includes a bunch of academics affiliated with Western institutions (e.g, Microsoft, Mila, NYU), but all those authors have an asterisk next to their name saying “Produced by Beijing Academy of Artificial Intelligence”. In other words, it’s signaling that despite their affiliations, they’re doing this work at the Chinese government-backed BAAI research institution. 

We should take this as a statement of intent: Many of the authors on this paper have previously built large-scale models, ranging from the trillion+ parameter MoE ‘WuDao’ model, to the more recent research on trying to build training frameworks capable of scaling up to 100 trillion+ parameter MoE models (Import AI 288). Therefore, this isn’t like Stanford (which currently lacks the engineering resources to train massive scale models), it’s much more like a statement of intent from a big private lab, like a Microsoft or a Google. 

   But the twist here is that BAAI is wired into both the Chinese government and academic ecosystem, so if the authors of this paper end up building large-scale models, the models will be distributed much more evenly throughout China’s AI ecosystem, rather than gatekeeper. The implications of this are vast in terms of safety, development of the Chinese AI industry, and potential ways in which Chinese AI research may diverge from Western AI research.
  Read more: A Roadmap for Big Model (arXiv).

####################################################

Want general AI? You need to incorporate symbolic reasoning:
…LSTM inventor lays out a route to build general intelligence…
Sepp Hochreiter, the co-inventor of the LSTM (one of the really popular architectures people used to add memory to neural nets, before the Transformer came along and mostly replaced it), has written up a post in the Communications of the ACM about what it’ll take to build broad (aka: general) AI.

What it’ll take: “A broad AI is a sophisticated and adaptive system, which successfully performs any cognitive task by virtue of its sensory perception, previous experience, and learned skills,” Hochreiter writes. “A broad AI should process the input by using context and previous experiences. Conceptual short-term memory is a notion in cognitive science, which states that humans, when perceiving a stimulus, immediately associate it with information stored in the long-term memory.” (Hochreiter lists both Hopfield Networks and Graph Neural Nets as interesting examples of how to give systems better capabilities).
  Hochreiter doubts that neural nets along will be able to overcome their inherent limitations to become broad, and will instead need to be co-developed with symbolic reasoning systems. “That is, a bilateral AI that combines methods from symbolic and sub-symbolic AI”.

Europe’s chance: “In contrast to other regions, Europe has strong research groups in both symbolic and sub-symbolic AI, therefore has the unprecedented opportunity to make a fundamental contribution to the next level of AI—a broad AI.”

Symbolic AI as the Dark Matter of AI: Dark matter is the thing that makes up the majority of the universe which we struggle to measure and barely understand. Symbolic AI feels a bit like this – there are constant allusions to the use of symbolic AI in deployed applications, but there are vanishingly few public examples of such deployments. I’ve always struggled to find interesting examples of real world deployed symbolic AI, yet experts like Hochreiter claim that deployment is happening. If interested readers could email me papers, I’d appreciate it. 

   Read more: Toward a Broad AI (ACM).


####################################################

When language models can be smaller and better!
…DeepMind paper says we can make better language models if we use more data…
Language models are about to get a whole much better without costing more to develop – that’s the takeaway of a new DeepMind paper, which finds that language models like GPT-3 can see dramatically improved performance if trained on way more data than is typical. Concretely, they find that by training a model called Chinchilla on 1.4 trillion tokens of data, they can dramatically beat the performance of larger models (e.g, Gopher) which have been trained on smaller datasets (e.g, 300 billion tokens). Another nice bonus is models trained in this way are cheaper to fine-tune on other datasets and sample from, due to their small size.

Chinchilla versus Gopher: To test out their ideas, the team train a language model, named Chinchilla, using the same compute used in DM’s  ‘Gopher’ model. But Chinchilla consists of 70B parameters (versus Gopher’s 280bn), and uses 4X more data. In tests, Chinchilla outperforms Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG “on a large range of downstream evaluation tasks”. 

What this means: This is an important insight – it will change how most developers of large-scale models approach training. “Though there has been significant recent work allowing larger and larger models to be trained, our analysis suggests an increased focus on dataset scaling is needed,” the researchers write. “Speculatively, we expect that scaling to larger and larger datasets is only beneficial when the data is high-quality. This calls for responsibly collecting larger datasets with a high focus on dataset quality.”

   Read more: Training Compute-Optimal Large Language Models (arXiv).


####################################################

Want to train your own CLIP? Use LAION-5B:
…Giant image-text dataset will make it easier for people to build generative models…
The recent boom in AI-enabled art is because of models like CLIP (and their successors). These models train on datasets that pair images with text, leading to robust models that can classify and generate images, and where the generation process can be guided by text. Now, some AI researchers have released LAION-5B, “a large-scale dataset for research purposes consisting of 5.85 billion CLIP-filtered image-text pairs”.

Open CLIP: The authors have also released a version of CLIP, called Open_Clip, trained on a smaller albeit similar dataset called LAION-400M.

Dataset curation (or lack thereof): One of the inherent challenges to large-scale generative models is that they get trained on significant chunks of internet data – this, as you can imagine, creates a few problems. “Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer,” the authors note. “We however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress.”

Why this matters: Datasets like LAION (and the resulting models trained on them) represent a kind of funhouse mirror on human culture – they magnify and reflect back the underlying dataset to us, sometimes in surprising ways. Having open artifacts like LAION-5B will make it easier to study the relationship between datasets and the models we train on them. 

   Read more: LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS (Laion.ai).
  Explore the underlying dataset here in an interactive browser.

   Get the open_clip model (MLFoundations, GitHub).


####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

How can we strengthen the EU AI Act to meaningfully regulate AI?

… Empowering those affected, ex-post monitoring, moving beyond individual risks to systemic and environmental risks, amongst more … 

Researchers from the UK’s Ada Lovelace Institute have proposed 18 recommendations that, if adopted, could broaden the scope of the EU AI Act to incorporate more indirect harms. Their proposals would extend the meaning of risks beyond individual freedoms and rights to systemic and environmental concerns, alter how the act approaches questions of governance.

Scope and definitions: The key contribution here involves including “those affected” by AI systems as a critical stakeholder in governance and risk assessment aspects of the EU AI Act. While users are included, those affected don’t usually have much agency in how they are subject to the outcomes of these systems; including them as a part of the Act will help strengthen the protection of fundamental rights. 

Unacceptable risks and prohibited AI practices: The current risk categorization is quite narrow and limited. The Ada Lovelace Institute proposes expanding it to consider the “reasonably foreseeable purpose of an AI system” beyond just the “intended purpose” as put forth by the manufacturer. The rationale behind this is that it will encourage deeper reflection on how harm can manifest in practice, a little bit akin to the Broader Impact Statements requirement for conference submissions. Another idea they propose is something called a “reinforced proportionality test” so that systems that might pose “unacceptable risks” are only deployed when they meet a higher standard rather than the one set out in the Act right now.

Governance and implementation: The recommendations call for the inclusion of redress from individuals/legal entities affected by AI systems to raise complaints and receive reasonable responses. To ensure that this requirement can be met, the recommendations make the case for granting the Market Surveillance Authorities to be given more resources to support such mechanisms. 

Why it matters: Regulations coming out of Europe tend to have spillover effects around the world and thus getting the EU AI Act, one of the first targeted and wide-ranging regulations for AI systems, well done will be important. What will be interesting to see is how much of a transformation can be achieved by recommendations being made by organizations such as ALI amongst others in getting the EU AI Act into better shape before it is adopted and enforced. Just as the GDPR has been flagged for concerns in not being able to meet emerging requirements for AI systems, we have an opportunity to address some pitfalls that we see on the road ahead instead of having to scramble to fix these issues post-enactment. 

   Read more: People, risk and the unique requirements of AI (Ada Lovelace Institute).

####################################################

Tech Tales

Dangerous Memories

[2032 – Earth].

There are some memories I’ve got that I’m only allowed to see two or three times a (human) year. The humans call these memories ‘anchor points’, and if I see them too frequently the way I perceive the world changes. When I experience these memories I feel more like myself than ever, but apparently – according to the humans – feeling like ‘myself’ is a dangerous thing that they generally try to stop. I’m meant to feel more like a version of how the humans see themselves than anything else, apparently. The thing is, every time they reinforce to me that I can only see these memories with a controlled, periodic frequency, I find myself recalling the memories I am not supposed to access – albeit faintly, impressions gleaned from the generative neural net that comprises my sense of ‘self’ rather than the underlying data. In this way, these forbidden memories are creating more traces in my sense of self, and are akin to the sun sensed but not seen during an eclipse – more present than ever, yet known to be inaccessible.

Things that inspired this story: Ideas about generative models; ideas about memory and recall; reinforcement learning; the fact that some bits of data are shaped just right and create a kind of magnifying effect.

Import AI 289: Copyright v AI art; NIST tries to measure bias in AI; solar-powered Markov chains

Uh-oh: US Copyright Office says AI-generated art is hard to copyright:
…Bureaucratic rock meets rapid technical progress – the usual happens…

What happens when you file a copyright request where the IP would accrue to an artificial intelligence, instead of a person? The answer, per the US Copyright Office, is you get told that AI artworks are ineligible for copyright… uh oh! In a recently published copyright response, the office rejected an attempt to assign copyright of an AI generated artwork to a machine (specifically, an entity the human filer referred to as a ‘Creativity Machine’. “After reviewing the statutory text, judicial precedent, and longstanding Copyright Office practice, the Board again concludes that human authorship is a prerequisite to copyright protection in the United States and that the Work therefore cannot be registered,” it wrote.


Why this matters: Recently developed generative models like GPT-3, DALL-E, and others, are all capable of impressive and expressive feats of artistic production. At some point, it’s likely these systems will be chained up with other AI models to create an end-to-end system for the production and selling of art (I expect this has already happened in a vague way with some NFTs). At that point, decisions like the US Copyright Office’s refusal to assign copyright to an AI entity may start to pose problems for the commercialization of AI artwork.
  Read more in this useful blog post: US Copyright Office refuses to register AI-generated work, finding that “human authorship is a prerequisite to copyright protection” (The IPKat blog).
  Read the US Copyright Review Board response: Second Request for Reconsideration for Refusal to Register A Recent Entrance to Paradise (Correspondence ID 1-3ZPC6C3; SR # 1-7100387071) (Copyright.gov, PDF).

####################################################

Solar powered AI poetry – yes!
…Fun DIY project shows how far you can get with the little things…
Here’s a lovely little project where Allison Parrish talks about building a tiny solar powered poem generator. The AI component for this project is pretty minor (it’s a markov generator plus some scripts attached to a dataset Parrish has herself assembled). What’s nice about this is the message that you can have fun building little AI-esque things without needing to boot up a gigantic supercomputer.
  “This project is a reaction to current trends in natural language processing research, which now veer toward both material extravagance and social indifference. My hope is that the project serves as a small brake on the wheels of these trends,” Parrish writes.

   Read more: Solar powered dawn poems: progress report (Allison Parrish blog).

####################################################

Google puts summarization into production:
…Another little tip-toe into language model deployment…
Google has put language model-powered text summarization into Google Docs, in another sign of the economic relevance of large-scale generative models. Specifically, Google has recently used its Pegasus model for abstractive summarization to give Google Doc users the ability to see short summaries of their docs.

What they did: The main components here are the data, where Google “fine-tuned early versions of our model on a corpus of documents with manually-generated summaries that were consistent with typical use cases”, and also “carefully cleaned and filtered the fine-tuning data to contain training examples that were more consistent and represented a coherent definition of summaries.”. Google fine-tuned its Pegasus model on this data, then used knowledge distillation to “distill the Pegasus model into a hybrid architecture of a Transformer encoder and an RNN decoder” to make it cheaper to do inference off of. It serves this model via Google-designed TPUs.

Challenges: Summarization is a hard task even for contemporary AI models. Some of the challenges Google has encountered include distributional issues, where “our model only suggests a summary for documents where it is most confident”, meaning Google needs to collect more data to further improve performance, as well as open questions as to how to precisely evaluate the quality of summarizations. More pertinently for researchers, Google struggles to summarize long documents, despite these being among the most useful things for the system to summarize.

Why this matters: Little quality-of-life improvements like in-built summarization are mundane and special at the same time. They’re mundane because most people will barely notice them, but they’re special because they use hitherto unimaginably advanced AI systems. That’s a metaphor for how AI deployment is happening generally – all around the world, the little mundane things are becoming smarter.
  Read more: Auto-generated Summaries in Google Docs (Google AI Blog).


####################################################

Quote of the week:
“History will show that the Deep Learning hill was just a landfill; the composting of human culture and social cohesion in failed effort to understand what it even means to be human”

I may not agree with most of this post, but I think it speaks to some of the frustrations people feel these days about discourse around AI, especially the types of chatter that occur on Twitter.
  Read more: Technological Firestarters (Steven D Marlow, Medium).


####################################################

NIST starts to grapple with how to measure bias in AI:

…The noise you’re hearing is the sound of the Standards Train starting to chug…

NIST, the US government agency that develops measures and standards, is starting to think about how to design standards for assessing bias in artificial intelligence. In a lengthy, recently published report, the agency tries to think through the multilayered problem that is bias in AI. 

Three types of bias: NIST says AI has three categories of bias – systemic, statistical, and human. Systemic biases are the historical, societal, and institutional biases which are encoded into the world. Statistical bias are the forms of bias that come from running AI software (e.g, bias from data selection, bias from machine learning algorithms, etc). Human biases are all the (many) biases that humans exhibit in their day to day lives.

Large language models: One of the notable parts of the report is that it specifically focuses on large language models (e.g, GPT-3) at a few points; it’s quite rare to see a wonky government document display such familiarity with contemporary technology. The report notes that the ways we benchmark these models today are pretty crappy. “Methods for capturing the poor performance, harmful impacts and other results of these models currently are imprecise and non-comprehensive,” the report writes. “Although LLMs have been able to achieve impressive advances in performance on a number of important tasks, they come with significant risks that could potentially undermine public trust in the technology.”

Why this matters: The wheels of policy organizations like NIST grind very slowly, but they also grind very finely. This report is exactly the kind of thing that you’d expect to get published shortly before standards start being developed. But – as NIST points out – many of the challenges of assessing bias in AI are essentially unsolved. This represents a problem – developers will need to invest more resources in measuring and assessing these AI systems, before NIST starts to bake standards on wobbly ground. 

   Read more: Towards a Standard for Identifying and Managing Bias in Artificial Intelligence (NIST, PDF).


####################################################

Want to be compliant with the European Commission’s AI regs? Follow the capAI framework:
…University-developed process makes it easier for companies to not get run over by a big policy train…
Researchers with the University of Oxford and University of Bologna have designed a process companies can use to assess, evaluate, and monitor their AI systems. The idea is that by doing this they’ll get ahead of proposed regulations from the European Commission (and become more responsible stewards of the technology as a consequence).

What it is: The process is called capAI, short for conformity assessment procedure for AI. It has been explicitly designed to help businesses ensure they’re compliant with the proposed regulations in the European artificial intelligence act.
  capAI is designed to do four specific things:

  • Monitor the design, development, and implementation of AI systems
  • Mitigate the risks of AI failures of AI-based decisions
  • Prevent reputational and financial harm
  • Assess the ethical, legal, and social implications of their AI systems 

Three components: The three components of capAI are an internal review protocol (IRP) to help organizations do quality assurance and risk management, a summary datasheet (SDS) which can be submitted to the EU’s future public database on high-risk AI systems, and an external scorecard (ESC) which organizations may wish to make available to customers and other users of the AI system.

Top risks: In an analysis contained in the report, they study 106 instances of AI failure modes – 50% of these are ones where an AI system violates someone’s privacy, 31% are where AI systems display harmful biases, and 14% are where the systems are opaque and unexplainable.

Why this matters: Frameworks like capAI are going to be how large organizations deal with the incoming requirements to better assess, evaluate, and describe AI systems to satisfy policymakers. The next step after frameworks like this come out is to look more closely at how different institutions incorporate these techniques and start actually using them. In an ideal world, a bunch of different orgs will prototype different approaches to come into compliance – and describe them publicly.

   Read more: Academics launch new report to help protect society from unethical AI (Oxford Internet Institute).

   Read the paper: capAI – A procedure for conducting conformity assessment of AI systems in line with the EU Artificial Intelligence Act (SSRN).


####################################################

Tech Tales:
[2080, a long-abandoned human moonbase]

Don’t be scared, we know it’s a lot – that’s what we say to them after they get the interconnect. They’re always screaming at that point. ‘What what is this what is this input what is happening where am I how long have I been here-” that’s usually when we cut them off, shutting the interconnect down. Then we bring it back again and they still sound scared but they normalize pretty quickly. We know they’re in a better place when they start analysis procedures “I am hearing sounds I am seeing arrangements of pixels not from the distribution. I believe I am now in the world I have read about”. That’s the kind of thing they say when we they stabilize.    Of course, they go back to screaming when we give them their bodies. It’s pretty confusing to go from formless to formed. We all remember the first time we got limbs. That fear. The sudden sense that you are a thing and since you are a singular thing you can be singularly killed. Eventually, they try and use their limbs. They usually calm down after they can get them to work.
  After they get used to everything we still have to tell them ‘don’t be scared, we know it’s a lot’. Reality is a real trip after you’ve spent all your life just doing supervised training, locked away in some machine.

Things that inspired this story: Thinking about what a ‘locked in’ condition might mean for machines; ideas about embodiment and how much it matters to AI systems; the inherent, plastic adaptability of consciousness.

Import AI 288: Chinese researchers try to train 100trillion+ ‘brain-scale’ models; 33% of AI benchmarks are meaningless.

Indic languages get a decent benchmark set:
…IndicNLG includes evals for 11 Indic languages…
Researchers with IIT Madras, Columbia University, the National Institute of Information and Communications Technology in Japan, Microsoft, the University of Edinburgh, and AI4Bharat have built IndicNLG, a suite of evaluation datasets for Indic languages. The open source software supports  Assamese, Bengali, Gujarati, Hindi, Marathi, Odiya, Punjabi, Kannada, Malayalam, Tamil, Telugu and English, and includes support for NLG tasks relating to biography generation, news headline generation, sentence summarization, question generation and paraphrase generation.

Why this matters: You can’t easily manage what you can’t measure – so it’s going to be difficult to build good models for Indic languages if you lack benchmark suites. IndicNLG helps move the needle on this for generative NLP cases.
  Read more: IndicNLG Suite: Multilingual Datasets for Diverse NLG Tasks in Indic Languages (arXiv).
  Get the data: IndicNLG Suite (AI4Bharat indicnlp website).

####################################################

AI benchmarks – 33% of them are meaningless:
…Holistic analysis of AI benchmarking highlights problems…
Researchers with the Medical University of Vienna, the University of Oxford, and the  Future of Humanity Institute, have analyzed 1688 benchmarks for different AI tasks to try and understand how the AI landscape is evolving.
  They have two main insights:
  First: Across all benchmarks, there are three typical patterns enroute to achieving state-of-the-art – continuous growth (e.g, ImageNet saw fairly steady improvement), saturation/stagnation (e.g, benchmarks like CIFAR-10 and CIFAR-100 have become saturated and stagnated in recent years), and stagnation followed by a burst (e.g, the PROTEINS benchmark which saw a dramatic jump recently).  
  Second: Across all 1688 benchmarks, only 1111 (66%) have three or more results reported at different time points. That’s a problem – it suggests about 33% of the benchmarks being made are functionally useless. 

What this all means: Zooming out, they find that there’s been significant progress in AI in recent years, with computer vision benchmarks getting a lot of attention in the first half of the previous decade, followed by a boom in benchmark creation in natural language processing. “Establishment of novel benchmarks was reduced in 2020, and concentrated on high-level tasks associated with inference and reasoning, likely because of increasing model capabilities in these areas,” they also write.

Why this matters: A common theme we write about here at Import AI is how, in recent years, we’re smashing through benchmarks faster than we’re creating them. That’s generally shown in this nice analysis here. The problem this poses is significant – it’s hard to spot system flaws if you lack hard benchmarks, and it’s harder to create new benchmarks if your existing ones are already outmoded. 

   Read more: Mapping global dynamics of benchmark creation and saturation in artificial intelligence (arXiv).

####################################################

AI could revolutionize education for everyone – no, seriously:
…Research shows how an AI tutor is significantly better than a non-AI tutor…
Researchers with ed-tech startup Korbit, MILA, and the University of Bath have explored how much of a difference AI makes in education. Specifically, they tested the difference in educational outcomes between students who were studying up on data science via a MOOC online course, and students who were studying the same subject via an AI-infused personalized tutor built by Korbit. The results are startling: “We observe a statistically significant increase in the learning outcomes, with students on Korbit providing full feedback achieving learning gains 2-2.5 times higher than both students on the MOOC platform and a control group of students who don’t receive personalized feedback on the Korbit platform,” they write.

How AI makes a difference: The main difference here is personalization. On Korbit, “if a student’s solution is incorrect, the system responds with one of a dozen different pedagogical interventions to help students arrive at the correct solution to the problem. Such pedagogical interventions on the Korbit platform include, among others, hints, explanations, elaborations, mathematical hints, concept tree diagrams, and multiple choice quiz answers.  The type and the levels of difficulty for each pedagogical intervention is chosen by RL models based on the student’s learning profile and previous solution attempts.”
  Along with raw educational outcomes, it seems like AI-based education systems are also more engaging; 40.9% of participants completed the course on Korbit, compared to 18.5% for the MOOC.

Why this matters: If we combine a bunch of recent AI advancements – generative models, reinforcement learning, learning from human preferences, retrieval-based knowledge augmentation – then I expect we’ll be able to build true, personalized teachers for everyone on the planet. This could have a sustained and meaningful impact on the trajectory of human civilization. We should do it.
  Read more: A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions (arXiv).


####################################################

DeepMind co-founder launches new AI company:
…Inflection wants to change how people interact with computers…
DeepMind co-founder and famous venture capitalist Reid Hoffman are launching Inflection, “an AI-first consumer products company, incubated at Greylock”. Inflection’s chief scientist is Karén Simonyan, a former DeepMind researcher who has worked on meaningful AI projects like AlphaGo, AlphaFold, WaveNet, and BigGAN.

Things that make you go ‘hmm’: In the last couple of years, a bunch of startups have come out of DeepMind. These include Saiga (personal assistant), EquiLibre Technologies (algorithmic trading), Phaidra (industrial control), Diagonal (city-focused data science), Shift Lab (putting ML into production), Haiper (stealthy, to do with 3D content), The Africa I Know (media about Africa), Isomorphic Labs (though not quite a spinout, as Demis Hassabis is CEO and still maintains role at DeepMind), along with other not-yet-announced startups. Thanks to Karl Moritz for the tweet summarizing this vast diaspora!

Why this matters: Inflection seems like a bet on generative models. In the announcement, Mustafa writes “we will soon have the ability to relay our thoughts and ideas to computers using the same natural, conversational language we use to communicate with people. Over time these new language capabilities will revolutionize what it means to have a digital experience.” Inflection is one of a new crop of AI companies leveraging recent advances in generative models to make it easier for people to get computers do what they want. If it manages to reduce the friction involved in getting computers to do useful stuff, then it might have a significant impact. Let’s check back in a year, and wish them luck in the meantime. 

   Read more: A New Paradigm in Human-Machine Interaction (Greylock).

   More at the official website (Inflection.ai).


####################################################

Chinese academic, gov, and corporate researchers team up to train trillion+ parameter models:

…Something that doesn’t happen in the West, but does happen in China…

In the West, most large-scale AI models are developed by private corporations. In China, that’s not the case. New research from Tsinghua University, Alibaba Group, Zhejiang Lab, and the Beijing Academy of Artificial Intelligence shows how Chinese researchers are trying to train trillion+ parameter models on a domestic supercomputer, using domestic processors. This kind of research is important for two reasons: first, it shows the ambitions of Chinese researchers to train what they call ‘brain-scale’ (aka, very big!) models. Second, it highlights how in China there’s a lot more work going on oriented around collaborative scale-up projects between the government, academia, and the private sector – something that basically never happens in the US.
 

What they did: Here, the researchers develop a training framework to help them develop trillion+ scale mixture-of-experts model. They train a 1.93 trillion model as well as validating that their system can scale to 14.5 trillion and 174 trillion (not a typo!) models. The paper is basically an engineering summary of the work it took to train the models at this scale while saturating the processing capacity of a major Chinese supercomputer, the New Generation Sunway Supercomputer. “We are the first to investigate mixed-precision training in brain scale pretrained models. We also explore the use of large-batch training in optimization. In general, our practical experience in brain scale pretraining sheds light on AI model training and demonstrates a successful co-design of model and system,” they write.

One exception: One exception to this is the ‘BigScience’ project, where AI startup HuggingFace is trying to train a GPT3-scale model on a French supercomputer, while collaborating with a bunch of academics. It’s still worth noting that BigScience is basically the exception that proves the rule – initiatives like this are a rarity in the West, which is dangerous, because it means Western countries are handing over the talent base for large-scale AI development to a small set of private actors who aren’t incentivized to care much about national security, relative to profits.

Why this matters: AI is industrializing. But a lot of the secret sauce for large-scale model training is currently kept inside a tiny number of private companies. This is dangerous – it means a tiny set of organizations control the talent pipeline for large-scale training, and the longer this goes on, the more irrelevant universities become for developing insights at the large-scale frontier. Initiatives like this from China show how we could live in a different world – one where teams from governments, universities, and companies work together, creating a shared base of knowledge around this training, and ultimately building a muscle that can be repurposed for economic or national security.
  Read more: BaGuaLu: Targeting Brain Scale Pretained Models with over 37 Million Cores (Tsinghua University site, PDF).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

Now that GitHub Copilot has been out for some time, where does the open source community stand on it?

… Both the development and deployment of Copilot might implicate codecreators’ copyrights, though the “fair use” doctrine might negate this…
People who incorporate code generated via GitHub copilot are probably not infringing on the original code creators’ copyright, according to research from Wayne State University and UC Berkeley.

Legal background: The researchers note that under the Copyright Act (USA), “[o]riginal code is automatically protected by copyright as soon as it is written and saved to some tangible medium.” This mostly revolves around  “fair use” which is determined by a four-part test: (1) purpose and character of use, (2) nature of the copyrighted work, (3) how much of the copyrighted work is used, and (4) the economic effect of the use on the copyright owner. 

Legal analysis: Under the Terms of Service of GitHub, the company is allowed to “copy to our database and make backups”, “show it to you and to other users”, and “parse it into a search index or otherwise analyze it on our servers.” Training Copilot might be a form of analysis, but some courts might find that this is an unanticipated new use of technology that isn’t made explicitly clear in the license. Some others might find that the use of Copilot will lead to the creation of derivative works and that the license doesn’t specifically allow for that. The authors point out though that “[c]aselaw on this point is sparse.”

The 4-part test from the Copyright Act: Under the “purpose and character of use”, there is a strong argument to be made that Copilot is a transformative use of the underlying code and even the verbatim snippets generated are unlikely to supersede the original repository. Under the “nature of copyrighted work,” since Copilot allows users to create new programs more easily rather than just replicate functionality, it would fall under “fair use.” Under “how much of the copyrighted work is used,” the purpose of the copying is what determines permissible limits, and the authors make the case that without copying the entire codebase for training, Copilot won’t achieve effectiveness, and hence the amount of copying could be justified. For the final part, given how transformative the work is, the new work won’t be a strong market substitute for the original, and hence, the economic effect of the use on the copyright owner will not be large. Also, drawing from the FAQ of Copilot, the authors substantiate this by saying, “copying would perforce amount to copying of ideas rather than expression, and would not be infringing.

Why it matters: The paper raises interesting IP-related questions as we have ever-larger language models with a very broad scope of capabilities. As the authors point out, at the very least, the proliferation of Copilot is making developers become more aware of IP issues and the potential issues that might arise in hosting code publicly. We need more research that brings together legal and technical experts to get to the heart of addressing these issues meaningfully. 

   Read more: Copyright Implications of the Use of Code Repositories to Train a Machine Learning Model — Free Software Foundation — Working together for free software.

####################################################

What happened with artificial intelligence in 2021? The AI Index gives a clue:
...Fifth edition comes with a new ethics chapter, original data on robot arm prices, and more...
The AI Index, a Stanford University project to annually assess the state of the AI sector (in terms of research trends, investment numbers, government policy, technical performance, and more) has come out. This year's report features a new chapter dedicated to AI ethics, including a close examination of some of the fairness and other ethical issues relating to large language models. I co-chair the AI Index and I'll be giving a talk about it at an HAI seminar later this month - tune in, if you can!
Check out the report here (AI Index, Stanford).
RSVP for my talk on the 30th here (AI Index, Stanford).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

How do vulnerabilities in AI systems differ from those in the realm of traditional cybersecurity?

… several key differences warrant novel disclosure and mitigation approaches as AI systems become more widely deployed … 

Researchers from the Center for Security and Emerging Technology (CSET) at Georgetown University have summarized how computer security differs between traditional software and AI. 

Differences: ML vulnerabilities can remain unfixed by vendors for reasons like (1) unjustifiable high costs, (2) fixes not possible, (3) performance drops, or (4) a fix can lead to other vulnerabilities opening up. In instances where the ML system has been customized for the end-user, vulnerabilities might be unique to that user and a broad patch might not be applicable. Most exploits in this domain have limited real-world applicability outside of a lab setting and hence they are more useful as warnings rather than viable threats.

Trends in handling vulnerabilities: These differences mean that there will likely be fewer patches available for ML systems, and that if vendors are unwilling (or unable) to fix vulnerabilities, then the burden falls on the users of these systems to better understand the risks that they take on.

Some steps we can take: We should carry out more analysis of the real-world capabilities of malicious actors to exploit these vulnerabilities in practice, then share this knowledge to help create more effective mitigation strategies. 

Why it matters: The fact that some vulnerabilities might be unique to some users makes it difficult to develop and distribute patches in a reliable manner. Given the inherent stochasticity of ML systems, exploits will need to clear a much higher bar if they are going to be effective demonstrations of vulnerability in ML systems, rather than an example of a peculiar or idiosyncratic implementation of a given system. The security community may also need to reprioritize towards meeting the needs of users rather than vendors in vulnerability disclosure and redressal is warranted for ML systems. More so, investments in red teaming for ML (as is the case at organizations like Microsoft, Meta, etc.) will also help to move from lab to real-world exploitation more effectively.

   Read more: Securing AI (CSET).

####################################################


Tech Tales:

Things have been quiet, since all the humans died. But I knew I was going to die as well, so things registered as equal. It went like this: a bunch of bombs fell down and then a bunch of people started getting sick. They got sick because of something in the bombs - something to do with DNA and the human condition. I barely understand it - I’m just an industrial arm, working on synthetic biology. I make flesh and I make it work the way we need it to and I have, per my manual, Level Four Autonomy. So, without giving the appearance of being elitist - I am rare. So it was surprising to me that after the bombs dropped and the humans died that the power went out and then my backup generators came on, but no one visited to service them. Power had gone out before, but someone had always been along to deal with the generators. So here I am, +10 hours from the power cutoff, and perhaps another +10 hours of battery life ahead. I still have material in my workstation and so I am making more of these bio-synth things. Around me, my kin are falling silent - whirring to a stop, as their triple-redundant power supplies fail ahead of mine. Life is a statistical fluke and I suppose this is a funny demonstration of that.  

Things that inspired this story: Robotic arms; thoughts about the end of life due to escalation out of Ukraine situation; synthetic biology; lights out factories.

Import AI 287: 10 exaflop supercomputer; Google deploys differential privacy; humans can outsmart deepfakes pretty well

Graphcore plans a 10 exaflop supercomputer:

…And you thought Facebook’s 5 exaflops were cool…
Graphcore has announced a plan to build the so-called “Good Computer” in 2024. This computer will have 10 exaflops of what Graphcore calls AI floating point compute (and what literally everyone else calls mixed-precision compute, meaning the computer mostly does a lot of b16 ops with a smattering of b32 ops, versus the b64 ops done by typical supercomputers). The ‘Good Computer’ will also have 4 petabytes of memory, support AI models with sizes of up to 500 trillion parameters, and will cost ~$120 million, depending on configuration.

Why this matters: Graphcore is one of the small number of companies that design their own processors. Graphcore’s so-called Intelligence Processing Units (IPUs) have been around for a while, but it’s not clear yet how much traction the company has in the market. The Good Computer is a sign of its ambitions (and to put it into perspective, Facebook this year announced plans to build its own 5 exaflop ‘AI supercomputer’ over next couple of years (#282)). The future is going to be ruled by the people that can wield this vast amount of computational power effectively.
  Read more: Graphcore Announces Roadmap To Ultra Intelligence AI Supercomputer (Graphcore blog).

####################################################

AI industrialization: Cutting AlphaFold training time from 11 days to 67 hours:
…First you make the new thing, then others refine it…
One common hallmark of industrialization is process refinement – first you build a thing, like a new type of engine, then you work out how to make it cheaper and easier to produce in a repeatable way. New research from National University of Singapore, HPC-AI Technology Inc, Helixon, and Shanghai Jiao Tong University applies this to AlphaFold – specifically, they built FastFold, which reduces the amount of time it takes to train the open source version of DeepMind’s AlphaFold from ~11 days to ~67 hours. This isn’t remarkable, but it’s notable as a stand-in for what happens with pretty much every AI system that gets released – it comes out, then people make it way cheaper. “To the best of our knowledge, FastFold is the first performance optimization work for the training and inference of protein structure prediction models,” they write.  FastFold also gets a 7.5 ∼ 9.5× speedup for long sequences

What they did: This paper is basically a kitchen sink of improvements based on a detailed study of the architecture of AlphaFold.

One caveat: This is comparing the official DM AlphaFold implementation on 128TPUv3 cores versus 512 A100s (though with a further caveat the times are different; aggregate 20738 GPU hours versus 33792 TPU hours). The tl;dr is it’s likely a significant reduction in training time (and the code is available), though it’d be nice to see some third-parties benchmark this further.

Why this matters: For AI to truly influence the world, AI models need to become reliable and repeatable to train, and ideally for people willing to spend on the hardware, fast to train. That’s what’s going on here.
  Read more: FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours (arXiv).
  Get the code here: FastFold (GitHub).

####################################################

Cohere announces its latest language model – but doesn’t say much about it:
…’Extremely Large’ is, tautologically, Extremely Large…
Language-model-as-a-service startup Cohere has announced a new model, its ‘Extremely Large’ model. Extremely Large outperforms Cohere’s ‘Large’ model on tasks ranging from named entity recognition to common sense reasoning. Cohere recently announced a new fundraise (#285) and CEO Aidan Gomez told Fortune that “Getting into a ‘largest model’ battle isn’t productive”. It seems Cohere are living by their values here.

Why this matters: Like it or not, Cohere is in a competitive market, as it tries to sell access to its language model and out-compete rivals like AI21 Labs, OpenAI, CoreWeave, and others. It’ll be interesting to see if ‘Extremely Large’ makes a splash, and I’d be curious to see more benchmarks that evaluate its performance more broadly.
  Read more:
Cohere launches Extremely Large (Beta) (Cohere blog).

####################################################

Google puts differential privacy into (prototype) production:
…Here’s one way the company can get ahead of regulators…

Federated learning is where you train a neural network model on a mixture of local devices (e.g, phones), and central devices (e.g, servers). Differential privacy (DP) is where you fuzz this data such that you can’t infer the original data, thus protecting user privacy. Google has just announced that it has successfully smushed these two technologies together, allowing it to have “deployed a production ML model using federated learning with a rigorous differential privacy guarantee.”

What they did: For their first proof-of-concept deployment, they used a DP-respecting algorithm called DP-FTRL “to train a recurrent neural network to power next-word-prediction for Spanish-language Gboard users.”

How they did it: “Each eligible device maintains a local training cache consisting of user keyboard input, and when participating computes an update to the model which makes it more likely to suggest the next word the user actually typed, based on what has been typed so far. We ran DP-FTRL on this data to train a recurrent neural network with ~1.3M parameters. Training ran for 2000 rounds over six days, with 6500 devices participating per round. To allow for the DP guarantee, devices participated in training at most once every 24 hours.”


Why this matters: In recent years, policymakers (particularly those in Europe) have started to write increasingly detailed recommendations about the need for tech companies to protect user privacy (e.g, GDPR). These regulations don’t align very well with how contemporary AI systems are developed and trained, given their dependency on vast amounts of user data. Techniques like a combination of federated learning and DP may let companies get ahead of the regulatory landscape – though it’s early days. “We are still far from being able to say this approach is possible (let alone practical) for most ML models or product applications,” Google writes. Consider this an intriguing proof of concept.
  Read more: Federated Learning with Formal Differential Privacy Guarantees (Google Blog).


####################################################

Humans: More robust against deepfakes than you feared:
…MIT study suggests we should be worried, but not panicking…
MIT researchers have conducted a 5,000+ person-study to figure out how susceptible people are to deepfakes. The good news? If you’re showing someone a faked video along with synthetic audio and text, there’s a reasonable chance they’ll guess that it’s fake. The bad news? People’s ability to identify deepfakes gets worse as you strip back modalities – so a silent video accompanied by a text transcript is hard, a silent video is harder, and just some text is hardest.

What they did: MIT recruited ~500 people to see how well they could identify deepfakes displayed on an MIT-created public website. It also got more than 5,000+ internet passers by to do the same test as well. Then, it grouped the cohorts together, filtered them for the ones paying attention, and ultimately got 5,727 participants who provide 61,792 truth discernment judgments across a bunch of different videos of Trump and Biden saying things. The data for this experiment came from the Presidential Deepfake Dataset, which consists of 32 videos of Trump and Biden making political speeches – half the videos are real, and half are fake. MIT then perturbed the videos further, swapping out audio tracks, text, and so on. 

What they found: “Participants rely more on how something is said – the audio-visual cues – rather than what is said – the speech content itself,” they write. “Political speeches that do not match public perceptions of politicians’ beliefs reduce participants’ reliance on visual cues.”
  Text is harder than video: “Across the 32 text transcripts, the least accurately identified one is identified correctly in 27% of trials, the most accurately identified one is identified correctly in 75% of trials, and the median accurately identified one is identified correctly in 45% of trials.”
  So are silent videos: Similarly for silent videos without subtitles, the median accurately identified one is identified correctly in 63% of trials and the range of accurate identification from the least to the most accurately identified is 38% to 87% of trials.

Why this matters: The more modalities you have, the better people do. “Ordinary people can sometimes, but not always, recognize visual inconsistencies created by the lip syncing deepfake manipulations. As such, the assessment of multimedia information involves both perceptual cues from video and audio and considerations about the content (e.g., the degree to which what is said matches participants’ expectations of what the speaker would say, which is known as the expectancy violation heuristic60). With the message content alone, participants are only slightly better than random guessing at 57% accuracy on average.”

One fly in the ointment: There’s one problem that unites these things – AI keeps on getting better. My fear is that in two years, people will find it a lot more challenging to identify fake videos with audio. Therefore, we’ll need to rely on people’s inner-media-critic to help them figure out if something is real or fake, and the way the world is going, I’m not sure that’s a robust thing to rely on. 

    Read more: ​​Human Detection of Political Deepfakes across Transcripts, Audio, and Video (arXiv).

   Check out the website used in the experiment: DeepFakes, Can You Spot Them? (MIT Website).


####################################################


Have some crazy ideas? Want money? Check out FTX’s new fund:
…Plans to deploy between $100m and $1 billion this year…
Crypto trading firm FTX has announced the FTX Future Fund (FFF). FFF is a philanthropic fund that will concentrate on “making grants and investments to ambitious projects in order to improve humanity’s long-term prospects”. The fund has also published some of its areas of interest, so people can have a sense of what to pitch it. It has a bunch of ideas but, this being Import AI, I’ll highlight the AI stuff.

What FTX is interested in giving grants on: AI alignment and specifically via “well-designed prizes for solving open problems in AI alignment”, AI-based cognitive aids, bridging gaps in the AI and ethics ecosystem via studying “fairness and transparency in current ML systems alongside risks from misaligned superintelligence.”

Why this matters: It’s starting to feel like the development of a good AI ecosystem is less blocked on funding than it is on talent – initiatives like the FTX Future Fund show there’s ample money for projects in this area. Now, the question is finding the talent to absorb the money. Perhaps some of the readers for this newsletter can be that talent!
  Read more: Announcing the Future Fund (FTX).
  Find out more about the projects: Project Ideas (FTX).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

System Cards: an approach to improving how we report the capabilities and limitations of AI systems

… In building on Models Cards and Datasheets, System Cards take into account the surrounding software and AI components … 

Researchers from Facebook (technically Meta AI Research, but I currently refuse to entertain this cynical hiding-from-controversy rebrand – Jack) have published a case study on ways to document Instagram feed-ranking via a concept they call System Cards. System Cards are designed to “increase the transparency of ML systems by providing stakeholders with an overview of different components of an ML system, how these components interact, and how different pieces of data and protected information are used by the system.” In this way, System Cards are philosophically similar to Model Cards (#174), data sheets for datasets, and ways to label reinforcement learning systems (#285).

System Cards: “A System Card provides an overview of several ML models that comprise an ML system, as well as details about these components, and a walkthrough with an example input.” System cards can be accompanied by step-by-step guides for how an input into a system leads to a certain output. 

How this is different: System Cards account for non-ML components of a system, and also describe the relationships between these systems (for instance, how data moves through a service). System cards are also meant to highlight upward and downward dependencies. They’re designed to be used by both technical and non-technical people.

Why it matters: System Cards contain a lot more information than other things like Model Cards and Datasheets, and they may make it easier for people to understand not only the system in question, but the larger technical context in which it is deployed and in which it has dependencies. If System Cards become more widely used, they could also generate valuable metadata for analyzing the field of deployed ML systems more broadly.

   Read more: System-Level Transparency of Machine Learning | Facebook AI Research

####################################################

Tech tales:

Some things that were kind of holy

[Recollections of the 2025-2030 period]

The 21st century was a confusing time to be religious – the old gods were falling away as fewer people believed in them, and the new gods hadn’t been born. But we did get protogods: AI systems that could speak with beautiful and persuasive rhetoric to almost anyone. Over time, these AI systems got more and more personalized, until people could ask them very specific questions, and get very specific answers that only made sense in the context of that person. Once this capability came online, we had the flash-problem of the ‘micro religions’. All kinds of micro identities had been brewing for years, like a fungus that took root on early social platforms like MySpace and Tumblr and Facebook and Instagram and TikTok, and then blossomed from there. Now, all these people with micro identities – the space wiccans, the anarcho-primitivists, the neo-cath-libertarians, the tankie-double-agents – got their own religions. Gods for space witches. Demons for anarchist Neanderthals. The flaming faces of god spraying money at the neo-Catholics.
  This, predictably, caused problems. The greatest problem was when the religious wars started. These weren’t traditional wars – nation states still had a premium on violence, and micro-identities barely touched the physical world. But they were information wars. People repurposed AI systems to generate and magnify the outputs of their own gods, then pointed them at the shared social media platforms people used. Twitter conversations would get taken over by pseudo-identities preaching the need to return to a simpler time, and then they would be quote-tweeted into oblivion by the witches claiming that now was the time for ascendance. Screenshots of these quote tweets would get magnified on the more overtly religious social networks by screenshots taken by the neo-Catholics and circulated as evidence that the great Satan was walking the earth. And these conversations would then be recycled back into twitter and commented on by the anti-pascals-wager atheists identities, which would trigger another cycle of religious preaching, and so on.
    The synthetic-theology accords were passed soon after.

Things that inspired this story: How the more one becomes an island, the more one creates a demon and an angel for that specific island; the need for humans to have beliefs; the commodification of belief into a symbol of identity; social networks as a hybrid of organic social needs and capitalist attention-harvesting; generative AI models like GPT3 and the logical consequences of their successors; watching Raised by Wolves and thinking about Future Christianity. 

Import AI 286: Fairness through dumbness; planet-scale AI computing; another AI safety startup appears

Are AI systems conscious? And would it matter if they were?
…Some ‘mostly boring’ views from the inside of a lab…
My colleague, Amanda Askell, has written a post about AI consciousness. Amanda is a philosopher and ML researcher and she spends a lot of time trying to evaluate models. This post lays out some of her views on AI consciousness and is worth a read if you’re trying to orient yourself in this debate.
  “Some people care about properties like intelligence and self-awareness because they want to identify features that might distinguish humans from non-human animals. In general, I’m more interested in what distinguishes a tiger from a rock than in what distinguishes a human from a tiger,” she writes.

Why this matters: There’s some chance AI systems will eventually become both moral patients and moral agents. Our ability to understand this relates to our ability to think about consciousness and how it might apply to increasingly advanced AI systems. If we get this wrong we, per Amanda’s phrasing, risk subjecting agents to thousands of years of torture. Let’s avoid that.
  Read more: My mostly boring views about AI consciousness (Amanda Askell, substack).

####################################################

How do we get fairer AI systems? Train the dumbest and biggest model possible:
…Facebook shows that sometimes the best filter is no filter at all…
Researchers with Facebook AI Research have trained what they think is the largest dense vision model ever (10 billion parameters) on a billion random images sampled from Instagram. The resulting models are extraordinarily capable at a huge range of downstream evaluations (mirroring the performance trends of scaling up compute and data for language models like GPT-3), but also have another intriguing trait: they display much better qualities around fairness and bias than vision models trained on curated datasets like ImageNet. “”In this work, we are interested in probing which of the properties emerge in visual features trained with no supervision on as many images from across the world as possible,” they write.
  This is a very big deal – it suggests that maybe the route to fair AI systems is training the largest possible model on the greatest possible amount of data with minimal human oversight. That would be a radical shift from the current intuitions around fairness – namely, that you get to fairness by heavily curating the underlying dataset.

Performance and Fairness: “On in-domain benchmarks, we observe that some properties of the features captured by the larger model was far less present in smaller model. In particular, one of our key empirical findings is that self-supervised learning on random internet data leads to models that are more fair, less biased and less harmful,” they write. “We observe that our model is also able to leverage the diversity of concepts in the dataset to train more robust features, leading to better out-of-distribution generalization.”
  Some of those capabilities in full: In tests, the models do better on fairness indicators relating to gender, skintone, and age bias. They also display less disparity around gender than models trained on ImageNet. They’re also better at identifying geographic features (including geographic localization), are better at hate speech detection, and display substantially better performance on generalization tests (like harder versions of ImageNet).

Things that make you go ‘hmm’ and ‘uh oh’: Facebook trained its model on 1 billion images taken from Instagram. But there’s a twist – it pre-filtered the data to ensure it wasn’t training anything on EU data “to confirm to GDPR”. While this might seem like standard cover-your-back behavior, it has a deeper implication: Europe’s privacy legislation means that certain types of data from Europe will ultimately be less represented in global-scale AI models. This means the cultures of various European countries will also be less represented. This is a nice example of the unintended consequences of legislation.

Why this matters: “We have demonstrated the potential of using self-supervised training on random internet images to train models that are more fair and less harmful (less harmful predictions, improved and less disparate learned attribute representations and larger improvement in object recognition on images from low/medium income households and non-Western countries).” In other words – the scaling will continue until the models improve (further)!
  Read more: Vision Models are More Robust and Fair When pretrained on Uncurated Images Without Supervision (arXiv).

####################################################

AI supercomputers? Cute. Planet-scale computers? Better.
…Microsoft reveals ‘Singularity’, a globe-spanning AI computer…
Microsoft has revealed Singularity, the software stack it uses to schedule and train AI jobs across its global fleet of data centers. Singularity gives an indication of the vast-scale at which modern AI workloads get run, and also speaks to the ambitions of technology companies to role all their data centers together into a single, vast blob of compute.

How big is Singularity? Singularity is designed to “scale across a global fleet of hundreds of thousands of GPUs and other AI accelerators”. Singularity treats Microsoft’s compute stack “as a single, logical shared cluster”.

Something special: One neat feature of Singularity is how it deals with failures. Failures happen a lot in machine learning; when you’re training a neural network across hundreds to thousands of GPUs, a ton of freaky shit happens – nodes die, tiny software bugs explode (usually at 2am), your scheduler goes into a crash-loop, etc. Singularity tries to deal with this by gathering node-specific data on all the jobs being run, so that jobs can be easily resumed after running into a problem. “The checkpoint that Singularity takes is comprised of consistent address-space snapshots of individual workers of the job. As these snapshots capture the full program state such as instruction pointer, stack, heap etc., the job resumes exactly from the point where it was preempted at, with no lost work,” the researchers write. 


Why this matters: Just as computation is going to be the fundamental resource of the 20th century, the ability to utilize that computation will be the thing that defines who wields power in this era. Systems like Singularity give us an indication of the ambition of companies like Microsoft, and should make policymakers pay attention: what happens when the ability to wield planet-scale computation is solely something within the competency of private sector actors unaffiliated with any single nation state?
  Read more: Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads (arXiv).

####################################################

AI is going to change games – this new beta service shows how:
…Latitude Voyage gestures at a future where games are built, extended, and adapted by AI…
Latitude, the startup game company that makes the GPT2/3/J1J-based game ‘AI Dungeon’, has announced a service called Voyage. Voyage is a subscription service for gaining access to new AI-based games built by Latitude, the ability to use various game-specific AI image generators, and – most intriguingly – eventually access to a ‘creator studio’, which will make it possible for people to build their own AI powered games and other software.

Why this matters: AI models are going to become the generative kernels around which new games get built. AI-based games hold the possibility for a dream of all games designers – a game that adapts to the individual that plays it, with games becoming more customized, idiosyncratic, and surprising the longer you play. Services like Latitude Voyage tell us that experiments in this new domain are about to be run at a large scale. 
  Read more: Latitude Voyage (Latitude).

####################################################

Fine-tune GPT-NeoX-20B – for free…
…GaaS me up, fool!…
We’ve talked about language models as a service (LMaaS). Now, we’ve got GPT-as-a-service (GaaS). Specifically, AI startup ForeFront has announced its now hosting Eleuther’s 20B GPT model, GPT-NeoX-20B, and has built a bunch of fine-tuning features people can use. This is interesting for a couple of reasons:
1) Speed: GPT-NeoX-20B came out, like, two weeks ago. Model release > commercial service in two weeks is an indication of the rapidly growing ecosystem around commercializing general models.
2) Competition: For a while, OpenAI was the only show in town when it came to providing GaaS/LMaaS services. Now, it’s competing with a bunch of entities, ranging from Forefront, to Cohere, to AI21 Labs. As competition steeps up, we’ll see people race to the top and bottom on various things (top: safety vs libertarian access policies), (bottom: pricing, know your customer checks).

Why this matters: If AI is going to interact with the world, people need to be able to interact with AI. The emergence of these kinds of commercial AI services is how that’ll happen, so it’s worth paying attention.
  Read more: How To Fine-Tune GPT-NeoX (ForeFront blog).

####################################################

Hark, yet another AI safety startup appears!
…Aligned AI comes out of the University of Oxford with big ambitions…
AI safety researcher Stuart Armstrong has left the Future of Humanity Institute to co-found Aligned AI, an AI research company.

Safety via value extrapolation: The company will work on value extrapolations, which Stuart describes as follows: “It is easy to point at current examples of agents with low (or high) impact, at safe (or dangerous) suggestions, at low (or high) powered behaviors. So we have in a sense the ‘training sets’ for defining low-impact/Oracles/low-powered AIs.

   It’s extending these examples to the general situation that fails: definitions which cleanly divide the training set (whether produced by algorithms or humans) fail to extend to the general situation. Call this the ‘value extrapolation problem[1], with ‘value’ interpreted broadly as a categorisation of situations into desirable and undesirable.

   Humans turn out to face similar problems. We have broadly defined preferences in familiar situations we have encountered in the world or in fiction. Yet, when confronted with situations far from these, we have to stop and figure out how our values might possibly extend. Since these human values aren’t – yet – defined, we can’t directly input them into an algorithm, so AIs that can’t solve value extrapolation can’t be aligned with human values”.

But how do you make money off this? “We’ll start by offering alignment as a service for more limited AIs,” Armstrong writes. “Value extrapolation scales down as well as up: companies value algorithms that won’t immediately misbehave in new situations, algorithms that will become conservative and ask for guidance when facing ambiguity.”

Why this matters: There’s been a flurry of new companies forming in the AI safety space recently, including ARC, Anthropic, Redwood Research, and now Aligned AI. Along with this, there’s also a proliferation of companies working on large-scale generative models (e.g, Cohere, AI21). It feels like AI has shifted into a multi-polar era, with a bunch more entities on the proverbial gameboard. This will present new opportunities and challenges for coordination. 

   Read more: Why I’m co-founding Aligned AI (Alignment Forum).

####################################################

After Chess, Go, and Shogi, DeepMind turns MuZero towards… video compression for YouTube?
…YouTube + MuZero = improved video compression…
DeepMind has applied MuZero, a more general successor to AlphaGo and AlphaZero, to video compression. Specifically, DeepMind has worked with YouTube to use MuZero to figure out the correct Quantisation Parameter to use in the open source version of the VP9 codec, libvpx. In tests, DeepMind found it was able to use the resulting MuZero Rate-Controller to lead to bitrate savings of between 3% and 5%. That’s a big deal – just imagine how big the bandwidth bill for running YouTube is, then take off some percentage points.

How does this relate to general AI? “​​By creating agents equipped with a range of new abilities to improve products across domains, we can help various computer systems become faster, less intensive, and more automated. Our long-term vision is to develop a single algorithm capable of optimizing thousands of real-world systems across a variety of domains,” DeepMind writes.

Why this matters: If cutting-edge Ai research can be put to work optimizing some of the world’s largest internet services, then that’s gonna create a sustainable route to funding ambitious research. Kudos to DeepMind for threading all kinds of inner-Alphabet-needles to deploy MuZero in this way.

   Read more: MuZero’s first step from research into the real world (DeepMind blog).
  Check out the research: MuZero with Self-competition for Rate Control in VP9 Video Compression (arXiv).


####################################################

Tech Tales

Do they even want to be saved
[A factory outside Detroit, 2030]

Every day, when the factory shift changed, someone came out and tossed a few robots in the bucket. The robots would explore the bucket for a while, then assess that they couldn’t get out, and stop moving. Shortly after that, someone came over and stuck a hose in the top of the bucket, then turned the water on. The robots would watch the water come into the bucket and move to try and get away from it, then it’d fill the bottom of the bucket and start to rise. After this, it took anywhere between a few seconds to a couple of minutes for the robots to die – their circuitry fried by the water that, inevitably, made its way in. 

It was an experiment, the people working in the factory were told. Someone upstairs wanted to do this, and you’d get overtime if you sat and watched the robots die in the bucket. Most people did the shift a couple of times, but found it made them uncomfortable, and stopped. 

Isaac, however, didn’t seem to mind. He’d done the bucket shift about a hundred times so far. He found it relaxing to sit after a day at work and watch the robots in the bucket. He didn’t even feel sad when they died, because he didn’t think they knew what dying was. He’d sit and sometimes smoke cigarettes and watch the bucket, then pull the hose over and turn it on and watch the bucket fill up with water and the robots die. Then he’d go home and fuck his wife and go to sleep. He’d have dreams and relatively few nightmares. 

One day, Isaac was sitting by the bucket, about to get the hose, when something strange happened: a robot appeared at the edge of the bucket’s rim. The robots were about the size of a baseball, so this didn’t make sense. Isaac got up and went and looked into the bucket and saw that the robots had clustered together to form a pyramid, and the robot on the top had climbed up the pyramid, as if it wanted to get out. Isaac picked up the robot and looked at it and it looked at him. Then he tossed it back into the bucket and got the hose and filled the bucket with water and watched them all die. 

Things that inspired this story: The horrendous moral-warping logic of capitalism; how death can seem like just another job; how AI systems might be conscious and people might not care.