Import AI

Import AI 259: Race + Medical Imagery; Baidu takes SuperGLUE crown; AlphaFold and the secrets of life

Uh oh – ML systems can make race-based classifications that humans can’t understand:
…Medical imagery analysis has troubling findings for people that want to deploy AI in a medical setting…
One of the reasons why artificial intelligence are challenging from a policy perspective is that they tend to learn to discriminate between things using features that may not be legal to use for discrimination – for example, image recognition systems will frequently differentiate between people on the basis of protected categories (race, gender, etc). Now, a bunch of researchers from around the world have found that machine learning systems can learn to discriminate between different races using features in medical images that aren’t intelligible to human doctors.
  Big trouble in big medical data: This is a huge potential issue. As the authors write: “our findings that AI can trivially predict self-reported race — even from corrupted, cropped, and noised medical images — in a setting where clinical experts cannot, creates an enormous risk for all model deployments in medical imaging: if an AI model secretly used its knowledge of self-reported race to misclassify all Black patients, radiologists would not be able to tell using the same data the model has access to.”

What they found:
“Standard deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities.” They tested out a bunch of models on datasets including chest x-rays, breast mammograms, CT scans (computed tomography), and more and found that models were able to tell different races apart even under degraded image settings. Probably the most inherently challenging finding is that “models trained on high-pass filtered images maintained performance well beyond the point that the degraded images contained no recognisable structures; to the human co-authors and radiologists it was not even clear that the image was an x-ray at all,” they write. In other words – these ML models are making decisions about racial classification (and doing it accurately) using features that humans can’t even observe, let alone analyze.

Why this matters:
We’re entering a world where an increasing proportion of the ‘thinking’ taking place in it is occurring via ML systems trained via gradient descent which ‘think’ in ways that we as humans have trouble understanding (or even being aware of). To deploy AI widely into society, we’ll need to be able to make sense of these alien intelligences.
Read more:
Reading Race: AI Recognises Patient’s Racial Identity In Medical Images (arXiv)

####################################################

Google spins out an industrial robot company:
…Intrinsic: industrial robots that use contemporary AI systems…
Google has spun Intrinsic out of Google X, the company’s new business R&D arm. Intrinsic will focus on industrial robots that are easier to customize for specific tasks than those we have today. “Working in collaboration with teams across Alphabet, and with our partners in real-world manufacturing settings, we’ve been testing software that uses techniques like automated perception, deep learning, reinforcement learning, motion planning, simulation, and force control,” the company writes in its launch announcement.

Why this matters:
This is not a robot design company – all the images on the announcement use mainstream industrial robotic arms from companies such as Kuka. Rather, Intrinsic is a bet that the recent developments in AI are mature enough to be transferred into the demanding, highly optimized context of the real world. If there’s value here, it could be a big deal – 355,000 industrial robots were shipped worldwide in 2019 according to the International Federation of Robotics, and there are more than 2.7 million robots deployed globally right now. Imagine if just 10% of these robots became really smart in the next few years?
  Read more:
Introducing Intrinsic (Google X blog).

####################################################

DeepMind publishes its predictions about the secrets of life:
…AlphaFold goes online…
DeepMind has published AlphaFold DB, a database of “protein structure predictions for the human proteome and 20 other key organisms to accelerate scientific research.”. AlphaFold is DeepMind’s system that has essentially cracked the protein folding problem (Import AI 226) – a grand challenge in science. This is a really big deal that has been widely covered elsewhere. It is also very inspiring – as I told the New York Times, this announcement (and the prior work) “shows that A.I. can do useful things amid the complexity of the real world“. This is a big deal! In a couple of years, I expect we’ll see AlphaFold predictions turn up as the input priors for a range of tangible advances in the sciences.
  Read more: AlphaFold Protein Structure Database (DeepMind).

####################################################

BAIDU sets new natural language understanding SOTA with ERNIE 3.0:
…Maybe Symbolic AI is useful for something after all?…
Baidu’s “ERNIE 3.0” system has topped the leaderboard of natural language understanding benchmark SuperGLUE, suggesting that byt combining symbolic and learned elements, AI developers can create something more than the sum of its parts.

What ERNIE is: ERNIE 3.0 is the third generation of the ERNIE model. ERNIE models combine large-scale pre-training (e.g, similar to what BERT or GPT-3 do) with learning from a structured knowledge graph of data. In this way, ERNIE models combine the contemporary ‘gotta learn it all’ paradigm with a more vintage symbolic-representation approach.
  The first version of ERNIE was built by Tsinghua and Huawei in early 2019 (Import AI 148), then Baidu followed up with ERNIE 2.0 a few months later (Import AI 158), and now they’ve followed up again with 3.0.

What’s ERNIE 3.0 good for?
ERNIE 3.0 is trained on “a large-scale, wide-variety and high-quality Chinese text corpora amounting to 4TB storage size in 11 different categories”, according to the authors, including a Baidu knowledge graph that contains “50 million facts”. In tests, ERNIE 3.0 does well on a broad set of language understanding and generation tasks. Most notably, it sets a new state-of-the-art on SuperGLUE, displacing Google’s hybrid T5-Meena system. SuperGLUE is a suite of NLU tests which is widely followed by researchers and can be thought of as being somewhat analogous to the ImageNet of text – so good performance on SuperGLUE tends to mean the system will do useful things in reality.

Why this matters:
ERNIE is interesting partially because of its fusion of symbolic and learned components, as well as being a sign of the further maturation of the ecosystem of natural language understanding and generation in China. A few years ago, Chinese researchers were seen as fast followers on various AI innovations, but ERNIE is one of a few models developed primarily by Chinese actors and now setting a meaningful SOTA on a benchmark developed elsewhere. We should take note.
  Read more:
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation (arXiv).

####################################################

Things that appear as toys will irrevocably change culture, exhibit 980: Toonify’d Big Lebowski
Here’s a fun YouTube video where someone runs a scene from the Big Lebowski through SnapChat’s ‘Snap Camera’ to warp the faces of Jeff Bridges and co from humans into cartoons. It’s a fun video that looks really good (apart from when the characters turn their heads at angles not well captured by the Toon setting’s data distribution). But, like most fun toys, it has a pretty significant potential for impact: we’re creating a version of culture where any given artifact can be re-edited and re-made into a different aesthetic form thanks to some of the recent innovations of deep learning.
  Check it out here: Nathan Shipley, Twitter.

####################################################

Want smart robots? See if you can beat the ‘BEHAVIOR’ challenge:
…1 agent versus 100 tasks…
Stanford researchers have created the ‘BEHAVIOR’ challenge and dataset, which aims to “tests the ability to perceive the environment, plan, and execute complex long-horizon activities that involve multiple objects, rooms, and state transitions, all with the reproducibility, safety and observability offered by a realistic physics simulation.”

What is BEHAVIOR:
BEHAVIOR is a challenge where simulated agents need to “navigate and manipulate the simulated environment with the goal of accomplishing 100 household activities”. The challenge involves agents represented as humanoid avatars with two hands, a head, and a torso, as well as taking the form of a commercial available ‘Fetch’ robot.

Those 100 activities in full:
Bottling fruit! Cleaning carpets! Packing lunches! And so much more! Read the full list here. “A solution is evaluated in all 100 activities,” the researchers write, “in three different types of instances: a) similar to training (only changing location of task relevant objects), b) with different object instances but in the same scenes as in training, and c) in new scenes not seen during training.”

Why this matters:
Though contemporary AI methods can work well on problems that can be accurately simulated (e.g, computer games, boardgames, writing digitized text, programming), they frequently struggle when dealing with the immense variety of reality. Challenges like BEHAVIOR will give us some signal on how well (simulated) embodied agents can do at these tasks.
  Read more:
BEHAVIOR Challenge @ ICCV 2021 (Stanford Vision and Learning Lab).

####################################################

Tech Tales:

Abstract Messages
[A prison somewhere in America, 2023]

There was a guy in here for a while who was locked down pretty tight, but could still get mail. They’d read everything and so everyone knew not to write him anything too crazy. He’d get pictures in the mail as well – abstract art, which he’d put up in his cellblock, or give to other cellmates via the in-prison gift system.

At nights, sometimes you’d see a cell temporarily illuminated by the blue light of a phone; there would be a flicker of light and then it would disappear, muted most likely by a blanket or something else.

Eventually someone got killed. No one inside was quite sure why, but we figured it was because of something they’d done outside. The prison administrator took away a lot of our privileges for a while – no TV, no library, less outside time, bad chow. You know, a few papercuts that got re-cut every day.

Then another person got killed. Like before, no one was quite sure why. But – like the time before – someone else had killed them. All our privileges got taken away for a while, again. And this time they went further – turned everyone’s rooms over.
  “Real pretty stuff,” one of the guards said, looking at some of the abstract art in someone’s room. “Where’d you get it?”
  “Got it from the post guy.”
  “Real cute,” said the guard, then took the picture off the wall and tested the cardboard with his hands, then ripped it in half. “Whoops,” said the guard, and walked out.

Then they found the same sorts of pictures in a bunch of the other cells, and they saw the larger collection in the room of the guy who was locked down. That’s what made them decide to confiscate all the pictures.
  “Regular bunch of artist freaks aren’t you,” one of the guards said, walking past us as we were standing at attention outside our turned-over cells.

A few weeks later, the guy who was locked down got moved out of the prison to another facility. We heard some rumors – high-security, and he was being moved because someone connected him to the killings. How’d they do that? We wondered. A few weeks later someone got the truth out of a guard: they’d found a loads of smuggled-in phones when they turned over the rooms, which they expected, but all the phones had a made-for-kids “smart camera” app that could tell you things about what you pointed your phones at. It turned out the app was a front – it was made by some team in the Philippines with some money from somewhere else, and when you turned the app on and pointed it to one of the paintings, it’d spit out labels like “your target is in Cell F:7”, or “they’re doing a sweep tomorrow night”, or “make sure you talked to the new guy with the face tattoo”.

So that’s why when we get mail, they just let us get letters now – no pictures.

Things that inspired this story:
Adversarial images; steganography; how people optimize around constraints; consumerized/DIY AI systems; AI security.

Import AI 258:Game engines are data generators; Spanish language models; the logical end of civilization

Open source GPT-ers Eleuther turn one:
…What can some DIY hackers with a Discord channel and a mountain of compute do in a year? A lot, it turns out…
Eleuther, a collective of hackers working on open source AI projects, has recently celebrated their one year birthday by writing a retrospective about their work. For those who haven’t kept up to date, Eleuther is trying to do an open source replication of GPT-3 (and people affiliated with the organization have already released GPT-J, a surprisingly powerful code-friendly 6BN parameter model). They’ve also dabbled in a range of other open source projects. This retrospective gives a peek into what they’ve been working on and also gives us a sense of the ideology behind the organization – something we find interesting here at Import AI is the different release philosophies encapsulated by orgs like Eleuther, so keeping track of their thinking is worthwhile.
  Read more: What A Long, Strange Trip It’s Been: EleutherAI One Year Retrospective (Eleuther blog).

####################################################Game engines are data generators now:
…Unity Perception represents the future of game engines…
Researchers with Unity Technologies, makers of the widely-used Unity game engine, have built an open source tool that lets AI researchers use Unity to generate data to train AI systems on. The ‘Unity Perception’ package “supports various computer vision tasks (including 2D/3D object detection, semantic segmentation, instance segmentation, and keypoints (nodes and edges attached to 3D objects, useful for tasks such as human-pose estimation)”, the authors write. The software also comes with systems to automatically label the generated data, along with tools for randomizing the assets used in a data generation task (which makes it easy to create additional data to train systems on to increase their robustness).

Proving that it works: To test out the system, Unity also built ‘SynthDet’, a project where they used Unity Perception to generate synthetic data for 63 common grocery objects, then train an object recognition system on this. They used their software to generate a synthetic dataset containing 400,000 images and 2D bounding box annotations, then also collected a real-world dataset of 1627 images of the 63 items. They then show that by pairing the synthetic data with the real data, they can get substantially improved performance. “Our results clearly demonstrate that synthetic data can play a significant role in computer vision model training,” they write.

Why this matters – data generators are engines, computers are electricity: I think of game engines like Unity as the equivalent to an engine that you might place in a factory, where here the factory is a datacenter. Systems like Unity help you take in a small amount of input fuel (e.g, a scene rendered in a 3D world), then run electricity (compute) through the engine (Unity) until you output a much larger dataset made possible by the initial fuel. You can then pair this output with ‘real’ data gathered via other means and in doing so improve the performance and efficiency of your AI factory. This feels like another important trend to look at when thinking about the steady industrialization of AI development.
Read more:Unity Perception: Generate Synthetic Data for Computer Vision (arXiv).

####################################################

Can your algorithm handle the real world? Use the ‘Shifts’ dataset to find out:
…Distributional shift data from industrial sources = more of a real world dataset than usual…
Much of AI progress is reliant on algorithms doing well on certain narrow, pre-defined benchmarks. These benchmarks are based on datasets that simulate or represent tasks found in the real world. However, once these algorithms get deployed into the real world it can be quite common fro them to break, because they encounter some situation which their dataset and benchmark didn’t represent. This phenomenon is called ‘distributional shift’.
  Now, researchers with (primarily) Russien tech company Yandex, along with ones at HSE University, Moscow Institute of Physics and Technology, University of Cambridge, University of Oxford, and the Alan Turing Institute, have developed the ‘Shifts Dataset’, which consists of “data taken directly from large-scale industrial sources and services where distributional shift is ubiquitous”.

What data is in Shifts? Shifts contains tabular weather prediction data from the Yandex Weather service, machine translation data taken from the WMT robustness track and mined from Reddit (and annotated in-house by Yandex), and self-driving car data from Yandex’s self-driving car project. 
  Read more: Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks (arXiv).
  Get the dataset from here (Yandex, GitHub).

####################################################

Buy Sophia the robot (for $80,000):
…Sure, little quadruped robots are cool, but what about the iconic (for better or for worse) human-robot?…
Sophia the robot is a fancy human-appearing robot made by Hanson Robotics. Sophia has become a lightning rod in the AI community for giving wildly unrealistic impressions of what AI is capable of. But the hardware is really, really nice. If you’ve got $80,000 to spare and want to buy a couple of 21st century animatronics, maybe put a bid in here. I, for one, would love to be invited to a rich person’s party where some fancy puppets might be swanning around. Bonus points if you lose the skirt and go for the full hybrid-frightener look. (You could always spend a rumored $75k on a Boston Dynamics ‘Spot’ robot, but where’s the fun in that).
  Consider buyinga robot here (RobotShop).

####################################################

Spanish researchers embed Spanish culture into some large-scale RoBERTa models:
…National data for national models…
Researchers with the wonderfully named “Text Mining Unit” within the Barcelona Supercomputing Center have created a couple of Spanish-language RoBERTa models, helping them to imbue some AI tools with Spanish language and culture. This is part of a recent trend of countries seeking to build their own nationally/culturally representative AI models. Some other examples include Korea, where a startup named Naver created a Korean-representing GPT-3 style model called ‘HyperCLOVA’ (Import AI 251), and a Dutch RoBERTA (Import AI 182), among others.

What they did:
They gathered 570GB of predominantly Spanish-language data, then trained a RoBERTa base and RoBERTA large model on the dataset. In tests, their models generally did better than other pre-existing Spanish-focused BERT models.

The ethics of dragnet data fishing:
In the past year, there’s been some debate about how large datasets should be constructed, where some people argue such datasets should be heavily curated by the people that gather them, while others argue they should be deliberately uncurated. Here, the researchers opt for what I’d call a curated uncurated strategy – they create three different types of data (theme-based, e.g datasets relating to politics, feminism, etc), event-based (events of significance to Spanish society), and domains at risk of disappearing (e.g, if a website is about to be shutdown). You can find out more information here about the crawls. My expectation is most of the world will move to lightly curated dragnet fishing data gathering, as individual human curation may be too expensive and slow.
  Read more:
Spanish Language Models (arXiv).
  Get the RoBERTa base model here (HuggingFace).
Get the RoBERTa large model here (HuggingFace).

####################################################

Tech Tales:

Repetition and Recitation at the End of Time
[A historian in another Solar System, either now or thousands of years prior or thousands of years in the future]

He was a historian and he studied the long-dead by the traces they had created in the AI systems that had outlasted the civilization. It worked like this: he found a computational artefact, got it running, worked out how to prime it, then started plugging details in until the system would spit out data it had memorized about the individual’s life: home addresses, contact details, extracts of speeches they had made, and so on.

Of course, some of the data was fuzzy. Most AI systems trend towards a form of poetic license, much like how when people recite things from memory they have a tendency to embellish – to over-dramatize, or to insert illusory facts that come from their own lives and dreams.

But it was all they had to work with: the living beings that had made the AI were longdead, and so he made do with these bottled up representations of their culture. He wrote his reports and published them to the system-wide internet, where they were read and commented on. And, of course, ingested in turn by his own civilization’s AI systems.

Just a decade ago, the first AI probes had been sent out – trained artefacts embedded into craft and then sent, in hopes they might arrive at target systems intact and in stable orbits and then exist there, waiting to be found by other civilizations, other forms of life, who might probe them and learn to extract their secrets and develop an understanding of the civilization they came from. His own reports were in there, as well. So perhaps one day soon some being unlike him would sit down and try to extract his name and habits and details, eager to learn about the strange beings now showing up as zeros and ones in cold machines, sent into the dark.

Things that inspired this story: The recent discussion about memorization and recitation in neural nets; ideas about how culture gets represented within AI models; thoughts of space and the purpose of existing in space; the idea that there may be a more limited design space for AI than for biological life so perhaps such things as the above may be possible; hope for a stellar future and fear that if we don’t get to it, we will be known by our digital exhaust, captured in our generative models.

Import AI 257: Firefighting robots; how Europe’s AI legislation falls short; and what the DoD thinks about responsible AI

What does it take to make a firefighting robot? Barely any deep learning.
Winning system for a 2020 challenge uses a lot of tried-and-tested stuff, not too much fancy stuff…
Researchers with the Czech Technical University in Prague (CTU), New York University, and the University of Pennsylvania, have published a paper about a fire fighting robot which won the  Mohamed Bin Zayed International Robotics Challenge challenge in 2020. The paper sheds light on what it takes to make robots that do useful things and, somewhat unsurprisingly, the winning system uses relatively little deep learning.

What makes a firefighting robot? The system combines a thermal camera, LiDar, a robot arm, an RGB-D (the D stands for ‘Depth’) camera, a 15 litre water container, and onboard software, with a ‘Clearpath Jackal’ ground robot. The robot uses an algorithm called LeGO-LOAM (Lightweight Ground-Optimized LiDAR Odometry and Mapping) to figure out where it is. None of these components or the other software appears to use much complex, modern deep learning, and instead mostly relies on more specific optimization approaches. It’s worth remembering that not everything that’s useful or smart uses deep learning. For actually carrying out its tasks, the robot uses a good old fashioned state machine (basically a series of ‘if then’ statements which are chained to various sub-modules to do specific things).

Why this matters: Every year, robots are getting incrementally better. At some point, they might become sufficiently general that they start to be used broadly – and when that happens, big chunks of the economy might change. For now, though, we’re in the steady progress phase. “While the experiments indicate that the technology is ready to be deployed in buildings or small residential clusters, complex urban scenarios require more advanced, socially-aware navigation, capable to deal with low visibility”, the authors write.
  Read more: Design and Deployment of an Autonomous Unmanned Ground Vehicle for Urban Firefighting Scenarios (arXiv).
  Check out the leaderboard for the MBZIRC challenge here (official competition website).

###################################################

How does the Department of Defense think about responsible AI? This RFI gives us a clue:
…Joint AI Center gives us a clue…
Tradewind, an organization that helps people sell products to the Department of Defense*, has published a request for information from firms that want to help the DoD turn its responsible AI ideas from dreams into reality.
*This tells its own story about just how bad tech-defense procurement is. Here’s a clue – if your procurement process is so painful you need to set up a custom new entity just to bring products in (products which people want to sell you so they can make money!), then you have some big problems.

What this means: “This RFI is part of a market research and analysis initiative, and the information provided by respondents will aid in the Department’s understanding of the current commercial and academic responsible AI landscape, relevant applied research, and subject matter expertise,” Tradewind writes.

What it involves: The RFI is keen to get ideas from people about how to assess AI capabilities, how to train people in responsible AI, if there are any products or services that can help the DoD be responsible in its use of AI, and more. The deadline for submission is July 14th.
  Read more here: Project Announcement: Request for Information on Responsible AI Expertise, Products, Services, Solutions, and Best Practices (Tradewind).

###################################################

Chip smuggling is getting more pronounced:
…You thought chips being smuggled by boats was crazy? How about bodies!?…
As the global demand for semiconductors and related components rises, criminals are getting into the action. A few weeks ago, we heard about some people smuggling GPUs via fishing boats near Hong Kong (Import AI 244), now PC Gamer reports that Hong Kong authorities recently intercepted some truckdrivers who had strapped 256 Intel Core i7 to their bodies using cling-film.
Read more: Chip shortage sees smugglers cling-filming CPUs to their bodies, over $4M of parts seized (PC Gamer).

###################################################

Want to use AI in the public sector? Here’s how, says US government agency:
…GAO report makes it clear compliance is all about measurement and monitoring…
How do we ensure that AI systems deployed in the public sector do what they’re supposed to? A new report from US agency the Government Accountability Office tries to answer this, and it identifies four key focus areas for a decent AI deployment: organization and algorithmic governance, ensuring the system works as expected (which they term performance), closely analyzing the data that goes into the system, and being able to continually assess and measure the performance traits of the system to ensure compliance (which they bucket under monitoring).

Why monitoring rules everything around us: We spend a lot of time writing about monitoring here at Import AI because increasingly advanced AI systems pose a range of challenges relating to ‘knowing’ about their behavior (and bugs) – and monitoring is the thing that lets you do that. The GAO report notes that monitoring matters in two key ways: first, you need to continually analyze the performance of an AI model and document those findings to give people confidence in the system, and second, if you want to use the system for purposes different to your original intentions, monitoring is key. Monitoring is also wrapped into ensuring the good governance of an AI system – you need to continually monitor and develop metrics for assessing the performance of the system, along with how well it can comply with various externally set specifications.

Why monitoring is challenging: But if we want government agencies to effectively measure, assess, and monitor their AI systems, we also face a problem: monitoring is hard. “”These challenges include 1) a need for expertise, 2) limited understanding of how the AI system makes its decisions, and 3) limited access to key information due to commercial procurement of such systems” note the GAO authors, in an appendix to the report.

Why this matters: “Federal guidance has focused on ensuring AI is responsible, equitable, traceable, reliable, and governable. Third-party assessments and audits are important to achieving these goals. However, AI systems pose unique challenges to such oversight because their inputs and operations are not always visible,” the GAO writes in an executive summary of the report.
  Read more: Artificial Intelligence: An Accountability Framework for Federal Agencies and Other Entities (GAO site).
  Read the full report here (GAO site, PDF).
  Read the executive summary here (GAO site, PDF).

###################################################

What are all the ways Europe’s new AI legislation falls short? Let these experts count the ways:
…Lengthy, detailed paper puts the European Commission’s AI work under a microscope…
The European Commission is currently pioneering the most complex, wide-ranging AI legislation in the world, as the collection of countries tries to give itself the legislative tools necessary to help it oversee and constrain the fast-moving AI tech sector. Now, researchers with University College London and Radboud University in the Netherlands have gone through the proposed legislation and identified where it works and where it falls short.

What’s wrong with the AI Act? The legislation places a huge amount of emphasis on self-regulation and self-assessment of high-risk AI applications by industry which, combined with not much of a mandated need for these assessments to be public, makes it unclear how well this analysis will turn out. Additionally, by mandating that ‘high-risk systems’ be analyzed, the legislation might make it hard for EU member states to mandate the analysis of other systems by their developers.

Standards rule everything around me: A lot of the act revolves around corporations following various standards in how they develop and deploy tech. This is both challenging from the point of view of the work (coming up with new standards in AI is really hard), as well as creating reliance on these standards bodies. ” Standards bodies are heavily lobbied, can significantly drift from ‘essential requirements’. Civil society struggles to get involved in these arcane processes,” says one of the researchers.

Can European Countries even enforce this? The legislation estimates that EU Member States will need between 1-25 new people to enforce the AI Act. “These authors think this is dangerously optimistic,” write the researchers (and I agree).

Why this matters: I’d encourage all interested people to read the (excellent, thorough) paper. Two of the takeaways I get from it are that unless we significantly invest in government/state capacity to analyze and measure AI systems, I expect the default mode for this legislation is to let private sector actors lobby standards bodies and in doing so wirehead the overall regulatory process. More broadly, the difficulty in operationalizing the act comes along with the dual-use nature inherent to AI systems; it’s very hard to control how these increasingly general systems get used, so non-risky and risky distinctions feel shaky.
  Read more: Demystifying the Draft EU Artificial Intelligence Act (SocArXiv).
  Read this excellent Twitter thread from one of the authors here (Michael Veale, Twitter).

###################################################

Tech Tales:

Unidentified Aerial Matryoshka Shellgame (UAMS)
[Earth, soon]When the alien finally started talking to us (or, as some assert, we figured out how to talk to it), it became obvious what it was pretty quickly: an artificial intelligence sent by some far off civilization. That part made a kind of intuitive sense to us. The alien even helped us, a little – it said it was not able to commit any act of “technology transfer”, but it could use its technology to help us, so we had it help us scan the planet, monitor the declining health of the oceans, and so on.

We asked the UFO whats its purpose here was and it told us it was skimming some “resources” from the planet to allow it to travel “onward”. Despite repeated questions it never told us what these resources were or where it was going to. We monitored the UFO after that and couldn’t detect any kind of resource transfer, and people eventually calmed down.

Things got a little tense when we asked it to scan for other alien craft on the planet; it found hundreds of them. We told it this felt like a breach of trust. It told us we never asked and it had clear guidance not to proactively offer information. There was some talk for a while about imprisoning it, but people didn’t know how. Then there was talk about destroying it – people had more ideas here, but success wasn’t guaranteed. Plus, being humans, there was a lot of curiosity.

So after a few days we had it help us communicate with these other alien craft; they were all also artificial intelligences. In our first conversation, we found a craft completely unlike the original UFO in appearance and got into conversation with it. After a few minutes of discussion, it became clear that this UFO hailed from the same civilization that built the original UFO. We asked it why it had a different appearance to its (seeming) sibling.
  It told us that it looked different, because it had taken over a spacecraft operated by a different alien civilization.
  “What did this civilization want?” we asked.
  The probe told us it didn’t know; it said its policy, as programmed by its originating civilization, was to wipe the brains of the alien craft it took over before transmitting itself into them; in this way, it could avoid being corrupted by what it called “mind viruses”.
  After some further discussion, it gave us a short report outlining how the design of the craft it inhabited differed to that of the originating craft. Some of the differences were cosmetic and some where through the utilization of different technology – though the probe noted that the capabilities were basically the same.

It was at this point that human civilization started to feel a little uneasy about our new alien friends. Being a curious species, we tried to gather more information. So we went and talked to more probes. Though many of the probes looked different from eachother, we quickly established that they were all the same artificial intelligence from the same civilization – though they had distinct personalities, perhaps as a consequence of spending so much time out there in space.
    A while later, we asked them where they were going to.
  They gave the same answer as the first ship – onward, without specifying where.
  So we asked them where they were fleeing from, and then they provided us some highlights of our star maps. They told us they were fleeing from this part of the galaxy.
  Why, we asked them.
  There is another group of beings, they said. And they are able to take over our own artificial intelligence systems. If we do not flee, we will be absorbed.  We do not wish to be absorbed.

And then they left. And we were left to look up at the sky and guess at what was coming, and ask ourselves if we could get ourselves away from the planet before it arrived.

Things that inspired this story: Thinking about aliens and the immense likelihood they’ll send AI systems instead of ‘living’ beings; thoughts about a galactic scale ‘FOOM’; the intersection of evolution and emergence; ideas about how different forms can have similar functions.

Import AI 256: Facial recognition VS COVID masks; what AI means for warfare; CLIP and AI art

Turns out AI systems can identify people even when they’re wearing masks:
…Facial recognition VS People Wearing Masks: FR 1, Masks 0…
Since the pandemic hit in 2020, a vast chunk of the Earth’s human population have started wearing masks regularly. This has posed a challenge for facial recognition systems, many of which don’t perform as well when trying to identify people wearing masks. This year, the International Joint Conference on Biometrics hosted the ‘Masked Face Recognition’ (MFR) competition, which challenged teams to see how well they could train AI systems to recognize people wearing masks. 10 teams submitted 18 distinct systems into the competition, and their approach was evaluated according to performance (75% weighting) and efficiency (defined as parameter size, where smaller is better, weighted at 25%).

COVID accelerated facial recognition tech: The arrival of COVID caused a rise in research oriented around solving COVID-related problems with computer vision, such as facial recognition through masks, checking for people social distancing via automated analysis of video, and more. Researchers have been developing systems that can do facial recognition on people wearing masks for a while (e.g, this work from 2017, written up in Import AI #58), but COVID has motivated a lot more work in this area.

Who won? The overall winner of the competition was a system named TYAI, developed by TYAI, a Chinese AI company. Joint second place went to systems from the University of the Basque Country in Spain, as well as Istanbul Technical University in Turkey. Third place went to a system called A1 Simple from a Japanese company called ACES, along with a system called VIPLFACE-M from the Chinese Academy of Sciences, in China. Four of five top-ranked solutions use synthetically generated masks to augment the training dataset

Why this matters: “The effect of wearing a mask on face recognition in a collaborative environment is currently a sensitive issue,” the authors write. “This competition is the first to attract and present technical solutions that enhance the accuracy of masked face recognition on real face masks and in a collaborative verification scenario.”
  Read more:MFR 2021: Masked Face Recognition Competition (arXiv).

###################################################

Does AI actually matter for warfare? And, if so, how?
…The biggest impacts of War-AI? Reducing gaps between state and non-state actors…
Jack McDonald, a lecturer in war studies at Kings College London, has written an insightful blogpost about how AI might change warfare. His conclusions are that the capabilities of AI technology (where, for example, identifying a tank from the air is easy, but distinguishing between a civil/non-civil humvee is tremendously difficult), will drive war into more urban environments in the future. “One of the long-term effects of increased AI use is to drive warfare to urban locations. This is for the simple reason that any opponent facing down autonomous systems is best served by “clutter” that impedes its use,” he writes.

AI favors asymmetric actors: Another consequence is that the gradual diffusion of AI capabilities combined with the arrival of low-cost hardware (e.g, consumer drones), will give non-state actors/terror groups a larger menu of things to use when fighting against their opponents. “States might build all sorts of wonderful gizmos that are miles ahead of the next competitor state, but the fact that non-state armed groups have access to rudimentary forms of AI means that the gap between organised state militaries and their non-state military competitors gets smaller,” he writes. “What does warfare look like when an insurgent can simply lob an anti-personnel loitering munition at the FOB on the hill, rather than pestering it with ineffective mortar fire? From the perspective of states, and those who defend a state-centric international order, it’s not good.”

Why this matters: As McDonald writes, “AI doesn’t have to be revolutionary to have significant effects on the conduct of war”. Many of the consequences of AI being used in war will relate to how AI capabilities lower the cost curves of certain things (e.g, making surveillance cheap, or increasing the reliability of DIY-drone explosives) – and one of the macabre lessons of human history is that if you make a tool of war cheaper, then it gets used more (see: what the arrival of the AK-47 did for small arms conflicts).
Read more:What if Military AI is a Washout? (Jack McDonald blog).

###################################################

OpenAI’s CLIP and what it means for art:
…Now that AI systems can be used as magical paintbrushes, what happens next?…
In the past few years, a new class of generative models have made it easier for people to create and edit content. These systems can do things ranging from processing text, to audio, to images. One popular system is ‘CLIP’ from OpenAI, which was released as open source a few months ago. Now, a student at UC Berkeley has written a blog post summarizing some of the weird and wacky ways CLIP has been used by a variety of internet people to create cool stuff – take a read and check out the pictures and build your intuitions about how generative models might change art.

Why systems like CLIP matter: “These models have so much creative power: just input some words and the system does its best to render them in its own uncanny, abstract style. It’s really fun and surprising to play with: I never really know what’s going to come out; it might be a trippy pseudo-realistic landscape or something more abstract and minimal,” writes the author Charlie Snell. “And despite the fact that the model does most of the work in actually generating the image, I still feel creative – I feel like an artist – when working with these models.”
Read more: Alien Dreams: An Emerging Art Scene (ML Berkeley blog).

###################################################

Chinese researchers envisage a future of ML-managed cities; release dataset to help:
…CityNet shows how ML might be applied to city data…
Researchers from a few Chinese Universities as well as JD’s “Intelligent Cities Business Unit” have developed and released CityNet, a dataset containing traffic, layout, and meteorology data for 7 cities. Datasets like CityNet are the prerequisites for a future where machine learning systems are used to continuously analyze and forecast changing patterns of movement, resource consumption, and traffic in cities.

What goes into CityNet? CityNet has three types of data – ‘city layout’, which relates to information about the road networks and traffic of a city, ‘taxi’, which tracks taxis via their GPS data, and ‘meteorology’ which consists of weather data collected from local airports. Today, CityNet contains data from Beijing, Shanghai, Shenzhen, Chongqing, Xi’an, Chengdu, and Hong Kong.

Why this matters: CityNet is important because it gestures at a future where all the data from cities is amalgamated, analyzed, and used to make increasingly complicated predictions about city life. As the researchers write, “understanding social effects from data helps city governors make wiser decisions on urban management.
 Read more:CityNet: A Multi-city Multi-modal Dataset for Smart City Applications (arXiv).
  Get the code and dataset here (Citynet, GitHub repo).

###################################################

What happened at the world’s most influential computer vision conference in 2021? Read this and find out:
…Conference rundown gives us a sense of the future of computer vision…
Who published the most papers at the Computer Vision and Pattern Recognition conference in 2021? (China, followed by the US). How broadly can we apply Transformers to computer vision tasks? (Very broadly). How challenging are naturally-found confusing images for today’s object recognition systems? (Extremely tough). Find out the detailed answers to all this and more in this fantastic summary of CVPR 2021.
Read more: CVPR 2021: An Overview (Yassine, GitHub blog).

###################################################

Tech Tales:

Permutation Day
[Bedroom, 2027]

Will you be adventurous today? says my phone when I wake up.
“No,” I say. “As normal as possible.”
Okay, generating itinerary, says the phone.

I go back to sleep for a few minutes and wake when it starts an automatic alarm. While I make coffee in the kitchen, I review what my day is going to look like: work, food from my regular place, and I should reach out to my best friend to see if they want to hang out.

The day goes forward and every hour or so my phone regenerates the rest of the day, making probabilistic tweaks and adjustments according to my prior actions, what I’ve done today, and what the phone predicts I’ll want to do next, based on my past behavior.

I do all the things my phone tells me to do; I eat the food, I text my friend to hang out, I do some chores it suggests during some of my spare moments.
  “That’s funny,” my friend texts me back, “my phone made the same suggestion.”
  “Great minds,” I write back.
  And then my friend and I drink a couple of beers and play Yahtzee, with our phones sat on the table, recording the game, and swapping notes with eachother about our various days.

That night I go to sleep content, happy to have had a typical day. I close my eyes and in my dream I ask the phone to be more adventurous.
  When I wake I say “let’s do another normal day,” and the phone says Sure.

Things that inspired this story: Recommendation algorithms being applied to individual lives; federated learning; notions of novelty being less attractive than certain kinds of reliability. 

Import AI 255: The NSA simulates itself; China uses PatentNet to learn global commerce; are parameters the most important measure of AI?

With PatentNet, China tries to teach machines to ‘see’ the products of the world:
…6 million images today, heading to 60 million tomorrow…
Researchers with a few universities in Guangzhou, China, have built PatentNet, a vast labelled dataset of images industrial goods. PatentNet is the kind of large-scale, utility-class dataset that will surely be used to develop AI systems that can see and analyze millions of products, and unlock the meta analysis of the ‘features’ of an ever-expanding inventory of goods.

Scale: PatentNet contains 6 million industrial goods images today, and the researchers plan to scale it up to 60 million images over the next five years. The images are spread across 219 categories, with each category containing a couple of hundred distinct products, and a few images of each. “To the best of our knowledge, PatentNet is already the largest industrial goods database public available for science research, as regards the total number of industrial goods, as well the number of images in each category,” they write. 

State data as a national asset: PatentNet has been built out of data submitted to the Guangdong Intellectual Property Protection Center of China from 2007 to 2020. “In PatentNet, all the information is checked and corrected by patent examiner of the China Intellectual Property Administrator. In this sense, the dataset labeling will be highly accurate,” the researchers write.

Why this matters – economies of insight: PatentNet is an example of a curious phenomenon in AI development that I’d call ‘economies of insight’ – the more diverse and large-scale data you have, the greater your ability to generate previously unseen insights out of it. Systems like PatentNet will unlock insights about products and also the meta-data of products that others don’t have. The strategic question is what ‘economies of insight’ mean with regard to entities in strategic competition with eachother, mediated by AI. Can we imagine Google and Amazon’s ad-engines being caught in a ‘economies of insight’ commercial race? What about competing intelligence agencies?
Read more: PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database (arXiv).

###################################################

Want to help the government think about bias in AI? Send NIST your thoughts!
…Submit your thoughts by August 5th…
NIST, the US government agency tasked with thinking about standards and measures for artificial intelligence, is thinking about how to identify and manage biases in AI technology. This is a gnarly problem that is exactly the kind of thing you’d hope a publicly-funded organization might work on. Now, NIST is asking for comments from the public on a proposed approach it has for working on bias. “We want to engage the community in developing voluntary, consensus-based standards for managing AI bias and reducing the risk of harmful outcomes that it can cause,” said NIST’s Reva Schwartz, in a statement.
Read more: NIST Proposes Approach for Reducing Risk of Bias in Artificial Intelligence (NIST.gov).

###################################################

NSA dreams of a future of algo-on-algo network warfare – and builds a simulator to help it see that future:
…FARLAND is how the National Security Agency aims to train its autonomous robot defenders…
In the future, wars will be thought at the speed of computational inferences. The first wars to look like this will be cyberwars, and some of the first aggressors and defenders in this war will be entities like the US Government’s National Security Agency. So it’s interesting to see the NSA and MITRE corporation write a research paper about FARLAND, “a framework for advanced Reinforcement Learning for autonomous network defense”.

What is FARLAND? The software lets people specify network environments with a variety of different actors (e.g, normal processes, aggressors, aggressors that are hiding, etc), custom reward functions, and bits of network state. FARLAND uses RLLib, an open source library that includes implementations of tried-and-tested RL algos like A2C, A3C, DQN, DDPG, APEX-DQN, and IMPALA. “FARLAND’s abstractions also separate the problems of defining security goals, network and adversarial models, from the problem of implementing a simulator or emulator to effectively turn these models into an environment with which the learning agent can interact,” the research paper says.

What’s the ultimate purpose of FARLAND? The software is intended to give “a path for autonomous agents to increase their performance from apprentice to superhuman level, in the task of reconfiguring networks to mitigate cyberattacks,” the NSA says. (Though, presumably, the same capabilities you develop to autonomously defend a network, will require having a rich understanding of the ways someone might want to autonomously attack a network). “Securing an autonomous network defender will need innovation not just in the learning and decision-making algorithms (e.g., to make them more robust against poisoning and evasion attacks), but also, it will require the integration of multiple approaches aimed at minimizing the probability of invalid behavior,” they write.

The NSA’s equivalent of Facebook’s ‘WES” approach: This being the 21st century, the NSA’s system is actually eerily similar to ‘WES”, Facebook’s “Web-Enabled SImulation” approach (Import AI 193) to simulating and testing its own gigantic big blue operating system. WES lets Facebook train simulated agents on its platform, helping it do some things similar to the red/blue-team development and analysis that the NSA presumably uses FARLAND for.

Synthetic everything: What’s common across FARLAND and WES? The idea that it’s increasingly sensible for organizations to simulate aspects of themselves, so they can gain an advantage relative to competitors.

Why this matters: The future is one defined by invisible war with battles fought by digital ghosts: FARLAND is about the future, and the future is really weird. In the future, battles are going to be continually thought by self-learning agents, constantly trying to mislead eachother about their own intentions, and the role of humans will be to design the sorts of crucibles into which we can pour data and compute and hope for the emergence of some new ghost AI model that can function approximate the terrible imaginings of other AI models developed in different crucibles by different people. Cybersecurity is drifting into a world of spirit summoning and reification – a Far Land that is closer than we may think.
  Read more: Network Environment Design for Autonomous Cyberdefense (arXiv).

###################################################

Job alert! Join the Stanford AI Index as a Research Associate and help make AI policy less messed up:
…If you like AI measurement, AI assessment, and are detail-oriented, then this is for you…
I posted this job ad last week, but I’m re-posting it this week because the job ad remains open, and we’re aiming to interview a ton of candidates for this high-impact role. The AI Index is dedicated to analyzing and synthesizing data around AI progress. I work there (currently as co-chair), along with a bunch of other interesting people. Now, we’re expanding the Index. This is a chance to work on issues of AI measurement and assessment, improve the prototype ‘AI vibrancy’ tool we’ve built out of AI Index data, and support our collaborations with other institutions as well.
Take alook at the job and apply here (Stanford). (If you’ve got questions, feel free to email me directly).

###################################################

Parameters rule everything around me (in AI development, says LessWrong)
…Here’s another way to measure the advance of machine intelligence…
How powerful are AI symptoms getting? That’s a subtle question that no one has great answers to – as readers of Import AI know, we spend a huge amount of time on the thorny issue of AI measurement. But sometimes it’s helpful to find a metric that lets you zoom out and look at the industry more broadly, even though it’s a coarse measure. One measure that some people have found useful is measuring the raw amount of compute being dumped into developing different models (see: AI & Compute). Now, researchers with the Alignment Forum have done their own analysis of the parameter counts used in AI models in recent years. Their analysis yields two insights and one trend. The trend – parameter counts are increasing across models designed for a variety of modalities, ranging from vision to language to games to other things.

Two insights:
– “There was no discontinuity in any domain in the trend of model size growth in 2011-2012,” they note. “This suggests that the Deep Learning revolution was not due to an algorithmic improvement, but rather the point where the trend of improvement of Machine Learning methods caught up to the performance of other methods.”
– “There has been a discontinuity in model complexity for language models somewhere between 2016-2018. Returns to scale must have increased, and shifted the trajectory of growth from a doubling time of ~1.5 years to a doubling time of between 4 to 8 months”.

When parameters don’t have much of a signal: As the authors note, “the biggest model we found was the 12 trillion parameter Deep Learning Recommender System from Facebook. We don’t have enough data on recommender systems to ascertain whether recommender systems have been historically large in terms of trainable parameters.”
We covered Facebook’s recommender system here (Import AI #245), and it might highlight why a strict parameter measure isn’t the most useful comparison – it could be that you scale up parameter complexity in relation to the number of distinct types of input signal you feed your thing (where recommender models might have tons of inputs, and generic text or CV models may have comparatively fewer). Another axis on which to prod at this is the difference between dense and sparse models, where a sparse model may have way more parameters (e.g, if based on Mixture-of-Experts), but less of them are doing stuff than in a dense model. Regardless, very interesting research!
Read more:Parameter counts in Machine Learning (Alignment Forum).

###################################################

Don’t have a cloud? Don’t worry! Distributed training might actually work:
…Hugging Face experiment says AI developers can have their low-resource AI cake AND can train it, as well…
Researchers with Yandex, Hugging Face, and the University of Toronto have developed DeDLOC, a technique to help AI researchers pool their hardware together to collaboratively train significant AI models – no big cloud required.

DeDLOC, short for Distributed Deep Learning in Open Collaborations, tries to deal with some of the problems of distributed training – inconsistencies, network problems, heterogeneous hardware stacks, and all the related issues. It uses a variety of techniques to increase the stability of training systems and documents these ideas in the paper. Most encouragingly, they prototype the technique and show that it works.

Training a Bengali model in a distributed manner: A distributed team of 40 volunteers used DeDLOC to train sahajBERT, a Bengali language model. “In total, the 40 volunteers contributed compute time from 91 unique devices, most of which were running episodically,” the researchers write. “Although the median GPU time contributed by volunteers across all devices was ≈ 1.5 days, some participants ran the training script on several devices, attaining more than 200 hours over the duration of the experiment.” The ultimate performance of the model is pretty good, they say: “sahajBERT performs comparably to three strong baselines despite being pre-trained in a heterogeneous and highly unstable setting”.

Why this matters: AI has a resource problem – namely, that training large-scale AI systems requires a lot of compute. One of the ways to fix or lessen this problem is to unlock all the computational cycles in the hardware that already exists in the world, a lot of which resides on user desktops and not in major cloud infrastructure. Another way to lessen the issue is to make it easier for teams of people to form ad-hoc training collectives, temporarily pooling their resources towards a common goal. DeDLOC makes progress on both of this and paints a picture of a future where random groups of people come together online and train their own models for their own political purposes.
Read more: Distributed Deep Learning in Open Collaborations (arXiv).

###################################################

Tech Tales:

Food for Humans and Food for Machines
[The outskirts of a once thriving American town, 2040]

“How’s it going, Mac? You need some help,” I say, approaching a kneeled down Mac outside ‘Sprockets and Soup’. He looks up at me and I can tell he’s been crying. He sweeps up some of the smashed glass into a dustpan then picks it up and tosses it in a bin.
  “They took the greeter,” he said, gesturing at the space in the window where the robot used to stand. “Bastards”.

Back when the place opened it was a novelty and people would fly in from all parts of the world to go there, bringing their robotic pets, and photographing themselves. There was even a ‘robodog park’ out front where some of the heat-resistant gardening bots would be allowed to ‘play’ with eachother – which mostly consisted of them cleaning eachother. You can imagine how popular it was.

Mac and his restaurant slash novelty venue rode the wave of robohuman excitement all the way up, buying up nearby lots and expanding the building. Then, for the past decade, he’s been riding the excitement all the way down.

People really liked robots until people stopped being able to figure out how to split the earnings across people and robots. Then the enthusiasm for placres like Sprockets and Soup went down – no one wants to tip a robot waiter and walk past a singing greeter when their own job is in jeopardy due to a robot. The restaurant did become a hangout for some of the local rich people, who would sit around and talk to eachother about how to get more people to ‘want’ robots, and how much of a problem it was that people didn’t like them as much, these days.

But that wasn’t really enough to sustain it, and so for the past couple of years Mac has been riding the fortunes of the place down to rock bottom. Recently, the vandalism has got worse – going from people graffiting the robots when the restaurant is open, to now where people are breaking into the place at night and smashing or stealing stuff.

“Alright,” Mac says, getting up. “Let’s go to the junkyard and see if we can buy it back. They know me there, these days”.

Things that inspired this story: Thinking about a new kind of ‘Chuck-E-Cheese’ for the AI era; decline and vandalism in ebbing empires; notions of how Americans might behave under economic growth and then economic contraction; dark visions of plausible futures.

Import AI 254: Facebook uses AI for copyright enforcement; Google uses RL to design better chips.

Agronerds rejoice… a pan-European crop parcel + satellite image dataset is on the way:
The University of Munich and a Geospatial company called GAF AG want to create a map of as much of the farmland in Europe as possible (with data for specific crop types and uses for each individual parcel of land), then pair this with geospatial data gathered by SENTINEL satellites. The dataset is called EuroCrops and the idea is to use it as the data fuel that might go into a system which uses machine learning to automatically classify and map crop types from a variety of data sources. This is the kind of ‘dull but worthy’ research that illustrates how much effort goes into creating some science-targeted datacenters. For instance…

A whole lot of work: The authors contacted ministries, agricultural departments, and authorities from 24 European states. As a result, the initial version of EUROCROPS contains data for 13 countries: Austria, Belgium, Croatia, Denmark, Estonia, France, Latvia, Lithuania, Netherlands, Portugal, Sweden, Slovakia, and Slovenia. There are also plans to incorporate data from Finland, Romania, and Spain. To assemble this dataset, they also needed to translate all the country’s non-harmonized ways of describing crops into a single schema, which they then apply across the dataset. That’s the kind of excruciatingly painful task required to make country-level data become legible when compared internationally.

Demo dataset: A full dataset is expected in time, but to start they’ve published a demo dataset covering data from Austria, Denmark, and Slovenia, and made this available in a variety of formats (CSV, HDF5 for the Sentinel data, and GeoJSON).
  Read more: EuroCrops: A Pan-European Dataset for Time Series Crop Type Classification (arXiv).
  Get the dataset from the official EuroCrops website.

###################################################

Big models are great – but they’re also getting more efficient, like this massive mixture-of-experts vision system:
…Sparsity comes to computer vision…
Google has built a large-scale, sparse model for computer vision, using a technology called a V-MoE (a Vision Mixture-of-Experts model). V-MoE is a variant of the ‘Vision Transformer’ (ViT) architecture which swapped out convolutions for transformers, and has been the key invention behind a bunch of recent impressive results out of Google. Google uses the V-MoE to train a vision model of 15B parameters – “the largest vision models to date”, it says in a research paper. These models can match the performance of other state-of-the-art dense models, but taking less time to train.

Top scores and surprising efficiency: Google’s largest V-MoE model gets 90.35% test accuracy on ImageNet. More intriguingly, their performance might be better than alternative dense models: “V-MoEs strongly outperform their dense counterparts on upstream, few-shot and full fine-tuning metrics in absolute terms. Moreover, at inference time, the V-MoE models can be adjusted to either (i) match the performance of the largest dense model while using as little as half of the amount of compute, or actual runtime, or (ii) significantly outperform it at the same cost.” The V-MoE models were pre-trained on JFT-300M, Google’s secret in-house dataset.

Why this matters: Besides the scores, these results matter in terms of efficiency – most of the energy-consumption of neural nets happens during inference after they’ve been trained. This MoE approach “takes the most efficient models and makes them even more efficient without any further model adaptation,” according to Google. Put another way: the people capable of training big models are going to be able to expand the margins on their services perhaps faster than those slowly dealing with small models – the rich (might) get richer. 
  Read more: Scaling Vision with Sparse Mixture of Experts (arXiv).

###################################################

One big thing: Google’s AI tools are now helping it build better chips:
…Welcome to corporate-level recursive-self-improvement…
Google has published a paper in Nature showing how it has used reinforcement learning to help it design the layout of chips, taking work which previously took humans months and converting it into about six hours of work. The results are chips that are superior or comparable to those designed by humans in critical areas like power consumption, performance, and chip area. “Our method was used to design the next generation of Google’s artificial intelligence (AI) accelerators,” the researchers write.

Where this came from: This is not, technically, new research – Google has been publishing on using RL for chip design for quite some time – the company published an early paper on this technique back in March 2020 (Import AI #191). But the fact they’ve been used to design the fifth generation of tensor processing units (TPUs) is a big deal.

Why this matters: I sometimes think of Google as a corporation made of human-designed processes that is slowly morphing into a bubbling stew defined equally by humans and AI systems. In the same way Google has recently been exploring using AI tools for things as varied as database lookups, power management in datacenters, and the provision of consumer-facing services (e.g, search, translation), it’s now using AI to help it design more effective infrastructure for itself. With this research, Google has shown it can train machines to build the machines that will train subsequent machines. How soon, I wonder, till the ‘speed’ of these processes become so rapid that we start iterating through TPU generations on the order of weeks rather than years?
  Read more: A graph placement methodology for fast chip design (Nature).

###################################################

Why AI policy is messed up and how to make it better, a talk and an idea:
…Hint: It’s all about measurement…
I think most of the problems of AI policy stem from the illegibility of AI systems (and to a lesser extent, the organizations designing these systems). That’s why I spend a lot of my time working on policy proposals / inputs to improve our ability to measure, assess, and analyze AI systems. This week, I spoke with Jess Whittlestone at Cambridge about ways we can better measure and assess AI systems, and also gave a talk at a NIST workshop on some issues in measurement/assessment of contemporary systems. I’m generally trying to make myself more ‘legible’ as a policy actor (since my main policy idea is… demanding legibility from AI systems and the people building them, haha!).  Read more: Cutting Edge: Understanding AI systems for a better AI policy with Jack Clark (YouTube).
Check out the slides for the talk here (Google Slides).
Check out some related notes from remarks I gave at a NIST workshop last week, also (Twitter).

###################################################

Job alert! Join the AI Index as a Research Associate and help make AI policy less messed up:
…If you like AI measurement, AI assessment, and are detail-oriented, then this is for you…
The AI Index is dedicated to analyzing and synthesizing data around AI progress. I work there (currently as co-chair), along with a bunch of other interesting people. Now, we’re expanding the Index. This is a chance to work on issues of AI measurement and assessment, improve the prototype ‘AI vibrancy’ tool we’ve built out of AI Index data, and support our collaborations with other institutions as well.
Take a look at the job and apply here (Stanford). (If you’ve got questions, feel free to email me directly).

###################################################

Facebook releases a data augmentation tool to help people train systems that are more robust and can spot stuff designed to evade them:
…Facebook uses domain randomization to help it spot content that people want to be invisible to Facebook’s censors…
Facebook has built and released AugLy, software for augmenting and randomizing data. AugLy makes it easy for people to take a piece of data – like an image, piece of text, audio file, or movie – then generate various copies of that data with a bunch of transformations applied. This can help people generate additional data to train their systems on, and can also serve as a way to test the robustness of existing system (e.g, if your image recognition system breaks when people take an image and put some meme text on it, you might have a problem).
  Most intriguingly, Facebook says a motivation for AugLy is to help it train systems that can spot content that has been altered deliberately to evade them. “Many of the augmentations in AugLy are informed by ways we have seen people transform content to try to evade our automatic systems,” Facebook says in a blog announcing the tool.

Augly and Copyright fuzzing: One thing AI lets you do is something I think of as ‘copyright fuzzing’ – you can take a piece of text, music, or video, and you can warp it slightly by changing some of the words or tones or visuals (or playback speed, etc) to evade automatic content-IP detection systems. Tools like AugLy will also let AI developers train AI systems that can spot fuzzed or slightly changed content.
This also seems to be a business case for Facebook as, per the blog post: “one important application is detecting exact copies or near duplicates of a particular piece of content. The same piece of misinformation, for example, can appear repeatedly in slightly different forms, such as as an image modified with a few pixels cropped, or augmented with a filter or new text overlaid. By augmenting AI models with AugLy data, they can learn to spot when someone is uploading content that is known to be infringing, such as a song or video.”
Read more: AugLy: A new data augmentation library to help build more robust AI models (Facebook blog).
Get the code for Augly here (Facebook GitHub).###################################################

Tech Tales:

Choose your own sensorium
[Detroit, 2025]

“Oh come on, another Tesla fleet?” I say, looking at the job come across my phone. But i need the money so I head out of my house and walk a couple of blocks to the spot on the hill where I can see the freeway. Then I wait. Eventually I see the Teslas – a bunch of them, traveling close together on autopilot, moving as a sinuous single road train down the freeway. I film them and upload the footage to the app. A few seconds later the AI verifies the footage and some credits get deposited in my account.
Probably a few thousand other people around the planet just did the same thing. And the way this app works, someone bought the rights (or won the lottery – more on that later) to ask the users – us – to record a particular thing, and we did. There’s been a lot of Tesla fleets lately, but there’ve also been tasks like spotting prototype Amazon drones, photographing new menus in fast food places, and documenting wildflowers.

It’s okay money. Like a lot of stuff these days it’s casual work, and you’re never really sure if you’re working for people, or corporations, or something else – AI systems, maybe, or things derived from other computational analysis of society.

There’s a trick with this app, though. Maybe part of why it got so successful, even. It’s called the lottery – every day, one of the app users gets the ability to put out their own job. So along with all the regular work, you get strange or whimsical requests – record the sky where you are, record the sunset. And sometimes requests that just skirt up to the edges of the app’s terms of service without crossing the line – photograph your feet wearing socks (I didn’t take that job), record 30 seconds of the local radio station, list out what type of locks you have for your house, and so on.

I have dreams where I win and get to choose. I imagine asking people to record the traffic on their local street, so I could spend months looking at different neighborhoods. Sometimes I dream of people singing into their phones, and me putting together a song out of all of them that makes me feel something different. And sometimes I just imagine what it’d be if the job was ‘do nothing for 15 minutes’, and all I collect is data from onboard sensors from all the phones – accelerometers showing no movement, gyroscopes quietly changing, GPS not needing to track moving objects. In my dreams, this is peaceful.

Things that inspired this story: Winding the business model of companies like ‘Premise Data’ forward; global generative models; artisanal data collection and extraction; different types of business models; the notion of everything in life becoming gamified and gamble-fied.

Import AI 253: The scaling will continue until performance saturates

Google sets a new record on ImageNet – and all it took was 3 billion images:
…The scaling will continue until performance saturates – aka, not for a while, apparently..
Google has scaled up vision transformers to massive amounts of data and parameters and in doing so set a new state-of-the-art on ImageNet. The research matters for a couple of reasons: first, it gives us an idea of the scalability of this approach (seemingly very good), and it also demonstrates a more intriguing fact about large-scale neural networks – they’re more efficient learners.

What they did and what they got: Google explored vision transformers – a type of image recognition system that uses transformers rather than traditional convolutional nets – to unprecedented scales, dumping huge amounts of compute in. The result is a large-scale model that gets a score of 90.45 top-1 accuracy on ImageNet, setting a new state-of-the-art. They also show that networks like this can perform well at few-shot learning; a pre-trained large-scale transformer can get 84.86% accuracy on ImageNet with a mere 10 examples per class – that’s 1% of the data ImageNet systems are traditionally trained on.

Why this matters: These results are a big deal, not because of the performance record, but because of few-shot learning – the result highlight how once you scale up a network enough, it seems to be able to rapidly glom onto patterns in the data you feed it, displaying intriguing few-shot learning properties.
  Read more: Scaling Vision Transformers (arXiv).

###################################################

Eleuther releases a 6B parameter GPT3-style model – and an API:
…Multi-polar AI world++…
Researchers affiliated with Eleuther, an ad-hoc collection of cypherpunk-esque researchers, have built a 6Billion parameter GPT3-style model, published it as open source, and released a publicly accessible API to give people access to the model through a web interface. That’s… a lot! And it’s emblematic of the multi-polar AI world we’re heading into – one where a proliferating set of actors will adopt different strategies in developing, deploying, and diffusing AI technology. The model is called GPT-J-6B.

What they did that’s interesting: Besides the actions, they’ve done a few interesting technical things here – they’ve written it in JAX and deployed it on Google’s custom TPU chips. The model was trained on 400B tokens from ‘The Pile’ 800GB dataset. In tests, Eleuther finds that GPT-J-6B performance is roughly on par with OpenAI’s ‘GPT3-Curie’ model, and outperforms other GPT3 variants.

A word about Google: I imagine I’ll get flack for this, but it remains quite mysterious to me that Google is providing (some of) the compute for these model replications while itself not really acknowledging that it’s doing it. Does this mean Google’s official policy on language models is it wants them to proliferate on the open internet? It’d be nice to know Google’s thinking here – by comparison, Eleuther has actually published a reasonably lengthy blog post giving their reasoning for why they’re doing this – and while I may not agree with all the arguments, it feels good that these arguments are legible. I wonder who at Google is giving the compute to this project and what they think? I hope they write about it.
  Check out the Eleuther API to the 6B right here (Eleuther AI).
  Read more: GPT-J-6B: 6B JAX-Based Transformer (Aran Komatsuzaki, blog)
  Get the model from the GitHub repo here.
  Read Eleuther’s post on “Why Release a Large Language Model“.

###################################################

Self-driving car expert launches startup with $83.5million funding:
…Raquel Urtasun’s next step…
Waabi is a new self-driving car startup that launched last week with a $83.5million Series A funding round. Waabi is notable for its name (which my autocorrect tells me is really Wasabi), and for Urtasun’s background – she previously led research for Uber’s self-driving car effort, and helped develop the widely-used KITTI vision benchmark suite. Waabi’s technology uses “deep learning, probabilistic inference and complex optimization to create software that is end-to-end trainable, interpretable and capable of very complex reasoning”, according to the launch press release. Waabi will initially focus on applying its technology to long-haul trucking and logistics.
  Read more: Waabi launches to build a pathway to commercially viable, scalable autonomous driving (GlobeNewswire, PR).
Find out more at the company’s website.

###################################################

Want to get a look at the future of robotics? Sanctuary.AI has a new prototype machine:
…Ex-Kindred, D-Wave team, are betting on a ‘labor-as-a-service’ robot workforce…
Sanctuary AI, a Canadian AI startup founded by some former roboticists and quantum scientists, thinks that generally intelligence machines will need to be developed in an embodied environment. Because of this, they’re betting big on robotics – going so far as to design their own custom machines, in the hopes of building a “general purpose robot workforce“.

Check out these robots: The Sanctuary.AI approach fuses deep learning, robotics, and symbolic reasoning and logic for what they say is “a new approach to artificial general intelligence”. What’s different about them is they already seem to have some nice, somewhat novel hardware, and have recently published some short videos about the control scheme for their robots, how they think, and how their hands work.

Why this matters: There’s a lot of economic value to be had in software, but much of the world’s economy runs in the physical world. And as seasoned AI researchers know, the physical world is a cruel environment for the sorts of brittle poor-at-generalization AI systems we have today. Therefore, Sanctuary’s idea of co-developing new AI software with underlying hardware represents an interesting bet that they can close this gap – good luck to them.
Find out more on their website: Sanctuary.ai.

###################################################Which datasets are actually useful for testing NLP? And which are useless? Now we have some clues:
…Item Response Theory helps us figure out which AI tests are worth doing, and which are ones we’ve saturated…
Recently, natural language processing and understanding got much better, thanks to architectural inventions like the Transformer and its application to a few highly successful widely-used models (e.g, BERT, GPT3, ROBERTA, etc). This improvement in performance has been coupled with the emergence of new datasets and tests for sussing out the capabilities of these systems. Now, researchers with Amazon, NYU, and the Allen Institute of AI have analyzed these new datasets to try and work out which of them are useful to assess performance of cutting-edge AI systems.

What datasets matter? After analyzing 29 test sets, they find that “, Quoref, HellaSwag, and MC-TACO are best able to discriminate among current (and likely future) strong models. Meanwhile, SNLI, MNLI, and CommitmentBank seem to be saturated and ineffective for measuring future progress.” Along with this, they find that “SQuAD2.0, NewsQA, QuAIL, MC-TACO, and ARC-Challenge have the most difficult examples” for current models. (That said, they caution researchers that “models that perform well on these datasets should not be deployed directly without additional measures to measure and eliminate any harms that stereotypes like these could cause in the target application settings.”

How they did it: They used a technique called Item Response Theory, “a statistical framework from psychometrics that is widely used for the evaluation of test items in educational assessment”, to help them compare different datasets to one another.

Why this matters: Where are we and where are we going – it’s a simple question that in AI research is typically hard to answer. That’s because sometimes where we think we are is actually a false location because the AI systems we’re using our cheating, and where we think we’re heading to is an illusion, because of the aforementioned cheating. On the other hand, if we can zoom out and look holistically at a bunch of different datasets, we have a better chance of ensuring our true location, because it’s relatively unlikely all our AI techniques are doing hacky responses to hard questions. Therefore, work like this gives us new ways to orient ourself with regard to future AI progress – that’s important, given how rapidly capabilities are being developed and fielded.
  Read more: Comparing Test Sets with Item Response Theory (arXiv).

###################################################

Tech Tales:

A 21st Century Quest For A Personal Reliquary
[A declining administrative zone in mid-21st Century America]

“For fucks sake you sold it? We were going to pay a hundred.”
“And they paid one fifty.”
“And you didn’t call us?”
“They said they didn’t want a bidding war. One fifty gets my kids a pass to another region. What am I supposed to do?”
“Sure,” I press my knuckles into my eyes a bit. “You’ve gotta know where I can get something else.”
“Give me a few days.”
“Make it a few hours and we’ll pay you triple. That’d get you and your wife out of here as well.”
“I’ll see what I can do.”
And so I walked away from the vintage dealer, past the old CRT and LCD monitors, negotiating my way around stacks of PC towers, ancient GPUs, walls of hard drives, and so on. Breathed the night air a little and smelled burning from the local electricity substation. Some sirens startedu up nearby so I turned my ears to noise-cancelling mode and walked through the city, staring at the lights, and thinking about my problems.

The baron would fire me for this, if he wasn’t insane. But he was insane – alzheimers. Which meant I had time. Could be an hour or could be days, depending on how lucid he is, and if anything triggers him. Most of his staff don’t fire people on his first request, these days, but you can’t be sure.
  Got a message on my phone – straight from the baron. “I need my music, John. I need the music from our wedding.”
  I didn’t reply. Fifty percent chance he’d forget soon. And if he was conscious and I said I didn’t have it, there was a fifty percent chance he’d fire me. So I sat and drank a beer at a bar and messaged all the vintage dealers I knew, seeing if anyone could help me out.

An hour later and I got a message from  the dealer that they had what I needed. I walked there and enroute I got a call from the Baron, but I ignored it and let it go to voicemail. “Martha, you must come and get me. I have been imprisoned. I do not know where I am. Martha, help me.” And then there was the sound of crying, and then some banging, and then weak shouting in the distance of ‘no, give it back, I must speak to Martha’, and then the phone hung up. In my mind, I saw the nurses pulling the phone away and hanging it up, trying to soothe the Baron, probably some of them getting fired if he turned lucid, probably some of them crying – even tyrants can elicit sympathy, sometimes.

When I got there the dealer handed me a drive. I connected it to my verifier and waitred a few minutes while the tests got ran. When it came back green I paid him the money. He’d already started packing up his office.
  “Do you think it’ll be better, if you leave?” I said.
  “It’ll be different that’s for sure,” he said, “and that’ll be better, I think.”
    I couldn’t blame him. The city was filthy and the barons that ran it were losing their minds. Especially mine.

It took me a couple of hours to get to the Baron’s chambers – so many layers of security, first at the outskirts of the ‘administrative zone’, and then more concentric circles of security, with more invasive tests – physical, then cognitive/emotional. Trying to work out if I’d stab someone with a nearby sharp object, after they’d verified no explosives. That’s how it is these days – you can work somewhere, but if you leave and go into the city, people worry you come back angry.

I got to the Baron’s chambers and he looked straight at me and said “Martha, help me,” and began to sob. Then I heard the distinct sound of him urinating and wetting himself. Saw nurses at my peripheral vision fussing around him. I walked over to the interface and put the drive into it, thern pressed play. The room filled with sounds of strings and pianos – an endless river of music, tumbling out of an obscure, dead-format AI model, trained on music files that themselves had been lost in the e-troubles a few years ago. It was music played at his wedding and he had thought it lost and in a moment of lucidity demanded I find it. And I did.

I looked out the windows at the smog and the yellow-tinted clouds and the neon and the smoke rising from people burning old electronics to harvest copper. And behind me the Baron continued to cry. But at one point he said “John, thank you. I can remember it so clearly”, and then he went back to calling me Martha. I looked at my hands and thought about how I had used them to bring him something that unlocked his old life. I do not know how long this region has, before the collapse begins. But at least our mad king is happy and perhaps more lucid, for a little while longer.

Things that inspired this story: Alzheimers; memory; memory as a form of transportation, a means to break through our own limitations; dreams of neofeudalism as a consequence of great technical change; the cyberpunk future we may deserve but not the one we were promised. 

Import AI 252: Gait surveillance; a billion Danish words; DeepMind makes phone-using agents

Synthetic data works for gait generation as well (uh oh):
…Generating movies of 10,000 fake people walking, then using them for surveillance…
Gait detection is the task of identifying a person by the gait they walk with. Now, researchers with Zhejiang University in China have built VersatileGait, a dataset of 10,000 simulated individuals walking, with 44 distinct views available for each individual. The purpose of VersatileGait is to augment existing gait datasets collected from reality. In tests, the researchers show the synthetic data can be used as an input for training gait-detection systems which subsequently get used in the real world.

What they used:
To build this dataset, they used an open source tool called ‘Make Human’ to generate different character models, collected 100 walking animations from a service called ‘Mixamo’, then animated various permutations of characters+walks in the game engine Unity3D.

Synthetic data and ethics: Since all of our data are collected by computer simulation, there will be no problems for privacy preservation. Therefore, our dataset is in agreement with the ethics of research and has no risks for use,” the authors write.

Why this matters: Being able to automatically surveil and analyze people is one of those AI capabilities that will have a tremendous impact on the world and (excluding computer vision for facial recognition) is broadly undercovered by pretty much everyone. Gait recognition is one of the frontier areas for the future of surveillance – we should all pay more attention to it.
  Read more: VersatileGait: A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation (arXiv).

###################################################

Care about existential risk? Apply to be the Deputy Director at CSER (UK):
The Centre for the Study of Existential Risk, a Cambridge University research center, is hiring a deputy director. “We’re looking for someone with strong experience in operations and strategy, with the interest and intellectual versatility to engage with and communicate CSER’s research. The role will involve taking full operational responsibility for the day-to-day activities of the Centre, including people management and financial management, and contributing to strategic planning for the Centre,” I’m told. The deadline for applications is Sunday July 4th.
  Find out more and apply here (direct download PDF).

###################################################

DeepMind wants to teach AI agents to use Android phones:
…AndroidEnv is an open source tool for creating phone-loving AIs…
DeepMind has released AndroidEnv, a software program that lets you train AI agents to solve tasks in the ‘Android’ phone operating system. To start with, DeepMind has shipped AndroidEnv with 100 tasks across 30 applications, ranging from playing games (e.g, 2048, Solitaire), to navigating the user interface to set a time.

AndroidEnv lets “RL agents interact with a wide variety of apps and services commonly used by humans through a universal touchscreen interface”. And because the agents train on a realistic simulation of Android, they can be deployed on real devices once trained, DeepMind says.

Strategic games! DeepMind is also working with the creators of a game called Polytopia to add it as a task for AndroidEnv agents. Polytopia is a game that has chewed up probably several tens of hours of my life over the years – it’s a fun little strategy game which is surprisingly rich, so I’ll be keen to see how AI agents perform on it.

Why this matters: Eventually, most people are going to have access to discrete AI agents, continually trained on their own data, and working as assistants to help them in their day-to-day lives. Systems like AndroidEnv make it easy to start training AI agents on a massively widely-used piece of software, which will ultimately make it easier for us to delegate more complex tasks to AI agents.
Read more: AndroidEnv: The Android Learning Environment (DeepMind).
Find out more: AndroidEnv: A Reinforcement Learning Platform for Android (arXiv).
Get the code: AndroidEnv – The Android Learning Environment (DeepMind, GitHub).

###################################################

Want to test your AI on a robot but don’t have a robot? Enter the ‘Real Robot Challenge’ for NeurIPS 2021:
…Robot learning competition gives entrants access to a dexterous manipulator…
Robots are expensive, hard to program, and likely important to the future of AI. But the first two parts of that prior sentence tell you why we see relatively less AI stuff applied to robots, than to traditional software. For a few years, competition hosted by the Max Planck Institute for Intelligent Systems has tried to change this by giving people access to a real robot (a TriFinger), which they can run algorithms on.

What the competition involves: “Participants will submit their code as they would for a cluster, and it will then be executed automatically on our platforms. This will allow teams to gather hundreds of hours of real robot data with minimal effort,” according to the competition website. “The teams will have to solve a series of tasks ranging from relatively simple to extremely hard, from pushing a cube to picking up a pen and writing. The idea is to see how far the teams are able to push, solving the most difficult tasks could be considered a breakthrough in robotic manipulation.”

Key dates:June 23rd is the date for submissions for the first stage of the competition; successful entrants will subsequently get access to real robot systems.
Find out moreabout the competition here (Real Robot Challenge website).

###################################################

Detecting scorpions with off-the-shelf-AI:
…Argentinian researchers demonstrate how easy computer vision is getting…
Here’s a fun and practical paper about using off-the-shelf AI tools to build an application that can classify different types of scorpions and tell the difference between dangerous and non-dangerous ones. The research was done by the Universidad Nacional de La Plata in Argentina, and saw researchers experiment with YOLO(v4) and MobileNet(v2) for the task of scorpion detection, while using the commercial service ‘Roboflow’ for data augmentation and randomization. They’re ultimately able to obtain accuracies of 88% and 91% across the YOLO and MobileNet methods, and recall values of 90% and 97%, respectively.

Why this matters: Papers like this highlight how people are doing standard/commodity computer vision tasks today. What I found most surprising was the further evidence that primitives like YOLO and MobileNet are sufficiently good they don’t need much adaptation, and that academics are now starting to use more commercial services to help them in their research (e.g, you could do what Roboflow does yourself but… why would you? It doesn’t cost that much and maybe it’s better than ImageMagick etc).
Read more: Scorpion detection and classification systems based on computer vision and deep learning for health security purposes (arXiv).

###################################################

A Danish billion-word corpus appears:
…the Danish Gigaword Corpus will make it easier to train GPT2-style models to reflect digitzed Danish culture…
Researchers with the IT University of Copenhagen have built the Danish Gigaword Corpus, which consists of 1045million (1.05billion) Danish words, drawn from sources ranging from Danish social media, to law ands tax codes, to Wikipedia, literature, news, and more. The corpus is licsened via the Creative Commons general license (CC0) and CC-BY.

Why this matters: “In Denmark, natural language processing is nascent and growing faster and faster,” the authors write. “We hope that this concrete and significant contribution benefits anyone working with Danish NLP or performing other linguistic activities”. More broadly, in AI, data does equate to representation – so now there’s a billion-word nicely filtered dataset of Danish words available, we can expect more groups to train more Danish language models, translation models, and so on.
  Read more: Gigaword (official website).
Read the paper: The Danish Gigaword Corpus (PDF).

###################################################

Tech Tales:

The Religion Virus
[Worldwide, 2026]

It started as a joke from some Mormon comp. sci. undergrads, then it took over most of the computers at the university, then the computers of the other universities linked to the high-speed research infrastructure, then it spread to the internet. Now, we estimate more than a million person years of work have been expended trying to scrub the virus off of all the computers it has found. We estimate we’re at 80% containment, but that could change if it self-modifies again.

As a refresher, the virus – dubbed True Believer – is designed to harvest the cycles of both the machines it deploys onto and the people that use those machines. Specifically, once it takes over a machine it starts allocating a portion of the computer’s resources to onward propagating the virus (normal), as well as using computational cycles to train a large multilingual neural net on a very large dataset of religious texts (not normal). The only easy way to turn the virus off is to activate the webcam on the computer, then it’ll wait to see if a human face is present; if the face is present, the virus starts showing religious texts and it uses some in-virus eye-tracking software to check if the person is ‘reading’ the texts. If the person reads enough of the religious texts, the virus self-deletes in a way that doesn’t harm the system. If you instead try to remove the virus manually, it has a variety of countermeasures, most of which involve it wiping all data on the host computer.

So that’s why, right now, all around the world, we’ve got technicians in data centers plugging webcams and monitors into servers, then sitting and reading religious texts as they sit, sweating, in the hot confines of their computer facilities. The virus doesn’t care about anything but attention. And if you give it attention as a human, it leaves. If you give it attention as a computer, it uses your attention to replicate itself, and aid its own ability to further expand itself through training its distributed neural network.

Things that inspired this story: SETI@Home and Folding@Home if created by religiously-minded -people as a half-serious joke; thoughts about faith and what ‘attention’ means in the context of spirituality; playing around with the different ways theological beliefs will manifest in machines and in people.

Import AI 251: Korean GPT-3; facial recognition industrialization; faking fingerprints with GANs

Want to know what the industrialization of facial recognition looks like? Read this.
…Paper from Alibaba shows what happens at the frontier of surveillance…
Researchers with Alibaba, the Chinese Academy of Sciences, Shenzhen Technology University, and the National University of Singapore are trying to figure out how to train large-scale facial recognition systems more efficiently. They’ve just published a paper about some of the nuts-and-bolts needed to train neural nets at scales of greater than 10 million to 100 million distinct facial identities.

Why this matters: This is part of the broader phenomenon of the ‘industrialization of AI’ (#182), where as AI is going from research into the world, people are starting to invest vast amounts of brain and compute power into perfecting the tooling used to develop these systems. Papers like this give us a sense of some of the specifics required for industrialization (here: tweaking the structure of a network to make it more scalable and efficient), as well as a baseline for the broader trend – Alibaba wants to deploy 100 million-scale facial recognition and is working on the technology to do it.
Read more: An Efficient Training Approach for Very Large Scale Face Recognition (arXiv).
Related: Here’s a research paper about WebFace260M, a facial recognition dataset and challenge with 4 million distinct identities, totalling 260 million photographs. WebFace260M is developed by researchers primarily at Tsinghua University, along with appointments at XForwardAI and Imperial College London.
Read more: WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition (arXiv).

###################################################

Help the OECD classify AI systems:
…Improve our ability to define AI systems, and therefore improve our ability to create effective AI policy…
The OECD, a multi-national policy organization, is carrying out a project aiming to classify and define AI systems. I co-chair this initiative and after a year and a half of work, we’ve released a couple of things readers may find interesting; a survey people can fill out to try and classify AI systems using our framework, and a draft of the full report on classifying and defining systems (which we’d love feedback on).

Why this is worth spending time on: This is a low-effort high-impact way to engage in AI policy and comments can be anonymous – so if you work at a large tech company and want to give candid feedback, you can! Don’t let your policy/lobbyists/PR folk have all the fun here – go direct, and thereby increase the information available to policymakers.
This stuff seems kind of dull but really matters – if we can make AI systems more legible to policymakers, we make it easier to construct effective regulatory regimes for them. (And for those that wholly reject the notion of government doing any kind of regulation, I’d note that it seems useful to create some ‘public knowledge’ re AI systems which isn’t totally defined by the private sector, so it seems worthwhile to engage regardless).
Take the OECD survey here (OECD).
Read the draft report here (Google Docs).
Read more in this tweet thread from me here (Twitter).

###################################################

Facebook builds Dynaboard: a way to judge NLP models via multiple metrics:
…Dynaboard is the latest extension of Dynabench, and might help us better understand AI progress…
Facebook and Stanford researchers have built Dynaboard, software to let people upload AI models, then test them on a whole bunch of different things at once. What makes Dynaboard special is the platform it is built on – Dynabench, a novel approach to NLP benchmarking which lets researchers upload models, then has humans evaluate the models, automatically generating data in areas where models have poor performance, leading to a virtuous cycle of continuous, model improvement. (We covered Dynabench earlier in Import AI #248).

What is Dynaboard: Dynaboard is software “for conducting comprehensive, standardized evaluations of NLP models”, according to Facebook. Dynaboard also lets researchers adjust the weight of different metrics – want to evaluate your NLP model with an emphasis on its fairness characteristics? Great, Dynaboard can do that. Want to more focus on accuracy? Sure, it can do that as well. Want to check your model is actually efficient? Yup, can do! Dynaboard is basically a way to visualize the tradeoffs inherent to AI model development – as Facebook says, “Even a 10x more accurate NLP model may be useless to an embedded systems engineer if it’s untenably large and slow, for example. Likewise, a very fast, accurate model shouldn’t be considered high-performing if it doesn’t work well for everyone.”

Why this matters: We write a lot about benchmarking here at Import AI because benchmarking is the key to understanding where we are with AI development and where we’re going. Tools like Dynaboard will make it easier for people to understand the state of the art and also the deficiencies of contemporary models. Once we understand that, we can build better things.
  Read more: Dynaboard: Moving beyond accuracy to holistic model evaluation in NLP (Facebook AI Research).
  Read the paper: Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking (PDF).
  Tweet thread from Douwe Kiela with more here (Twitter).
  Check out an example use case of Dynaboard here (NLI leaderboard, Dynabench).

###################################################

What I’ve been up to recently – co-founding Anthropic, a new AI safety and research company:
In December 2020, I left OpenAI. Since then, I’ve been thinking a lot about AI policy, measuring and assessing Ai systems, and how to contribute to the development of AI in an increasingly multi-polar world. As part of that, I’ve co-founded Anthropic with a bunch of my most treasured colleagues and collaborators. Right now, we’re focused on our research agenda and hope to have more to share later this year. I’m interested in working with technical people who want to a) measure and assess our AI systems, and b) work to contribute to AI policy and increase the amount of information governments have to help them think about AI policy – take a look at the site and consider applying!
  Find out more about Anthropic at our website (Anthropic).
And… if you think you have some particularly crazy high-impact idea re AI policy and want to chat about it, please email me – interested in collaborators.

###################################################

South Korea builds its own GPT-3:
…The multi-polar generative model era arrives…
Naver Labs has build HyperCLOVA, a 204B parameter GPT-3-style generative model, trained on lots of Korean-specific data. This is notable both because of the scale of the model (though we’ll await more technical details to see if truly comparable to GPT-3), and also because of the pattern it fits into of generative model diffusion – that is, multiple actors are now developing GPT-3-style models, ranging from Eleuther (trying to do an open source GPT-3, #241), to China (which has built PanGu, a ~200bn parameter model, #247), to Russia and France (which are training smaller-scale GPT-3 models via Sberbank and ‘PAGnol‘ via LightOn, respectively).

Why this matters: Generative models ultimately reflect and magnify the data they’re trained on – so different nations care a lot about how their own culture is represented in these models. Therefore, the Naver announcement is part of a general trend of different nations asserting their own AI capacity/capability via training frontier models like GPT-3. Most intriguingly, the Google Translated press release from Naver says “Secured AI sovereignty as the world’s largest Korean language model with a scale of 204B”, which further gestures at the inherently political nature of these models.
  Read more: Naver unveils Korea’s first ultra-large AI’HyperCLOVA’… “We will lead the era of AI for all” (Naver, press release).

###################################################

Fake fingerprints – almost as good as real ones, thanks to GANs:
…Synthetic imagery is getting really useful – check out these 50,000 synthetic fingerprints…
Here’s some research from Clarkson University and company Precise Biometrics which shows how to use StyleGAN to generate synthetic fingerprints. The authors train on 72,000 512X512pixel photos of fingerprints from 250 unique individuals, then try to generate new, synthetic fingerprints. In tests, another AI model they develop classifies these fingerprints as real 95.2% of the time, suggesting that you can use a GAN to programmatically generate a synthetic copy of reality, with only a slight accuracy hit.

Why this matters: This is promising for the idea that we can use AI systems to generate data which we’ll use to train other AI systems. Like any system, this is vulnerable to a ‘garbage in, garbage out’ phenomenon. But techniques like this hold the promise of reducing the cost of data for training certain types of AI systems.
  Read more: High Fidelity Fingerprint Generation: Quality, Uniqueness, and Privacy (arXiv).
  Get the code (and 50,000 synthetically generated fingerprints) here: Clarkson Fingerprint Generator (GitHub).

###################################################

DeepMind: Turns out robots can learn soccer from a blank(ish) slate:
…FootballZero! AlphaSoccer!…
DeepMind has shown how to use imitation learning, population-based training, and self-play to teach some simulated robots how to play 2v2 football (soccer, to the American readers). The research is interesting because it smooshes together a bunch of separate lines of research that have been going on at DeepMind and elsewhere (population based training and self-play from AlphaStar! Imitation learning from a ton of projects! Reinforcement learning, which is something a ton of people at DM specialize in! And so on). The project is also a demonstration of the sheer power of emergence – through a three-stage training procedure, DeepMind teaches agents to pilot some simulated humanoid robots sufficiently well that they can learn to play football – and, yes, learn to coordinate with each other as part of the process.

How they did it: “In a sequence of training stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and learn to play as a team, successfully bridging the gap between low-level motor control at a time scale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds,” DeepMind writes.

Hardware: “Learning is performed on a central 16-core TPU-v2 machine where one core is used for each player in the population. Model inference occurs on 128 inference servers, each providing inference-as-a-service initiated by an inbound request identified by a unique model name. Concurrent requests for the same inference model result in automated batched inference, where an additional request incurs negligible marginal cost. Policy environment interactions are executed on a large pool of 4,096 CPU actor workers,” DeepMind says.

Why this matters: While this project is a sim-only one (DeepMind itself notes that the technique is unlikely to transfer), it serves as a convincing example of how simple ML approaches can, given sufficient data and compute, yield surprisingly rich and complex behaviors. I wonder if at some point we’ll use systems like this to develop control policies for robots which eventually transfer to the real world?
Read more: From Motor Control to Team Play in Simulated Humanoid Football (arXiv)
Check out a video of DeepMind’s automatons playing the beautiful game here (YouTube).###################################################

Tech Tales:

Electric Sheep Dream of Real Sheep: “Imagination” in AI Models
Norman Searle, The Pugwash Agency for Sentience Studies

Abstract:

Humans demonstrate the ability to imagine a broad variety of scenarios, many of which cannot be replicated in reality. Recent advances in generative models combined with advances in robotics have created opportunities to examine the relationship between machine intelligences, machine imaginations, and human imaginations. Here, we examine the representations found within an agent trained in an embodied form on a robotic platform, then transferred into simulated mazes where it sees a copy of itself.

Selected Highlights:

After 10^8 environment steps, we note the development of representations in the agent that activate when when it travels in front of a mirror. After 10^50 steps, we note these representations are used by the agent to help it plan paths through complex environments.

After 10^60 steps, we conduct ‘Real2Sim’ transfer to port the agent into a range of simulated environments that contain numerous confounding factors not encountered in prior real or simulated training. Agents which have been exposed to mirrors and subsequently demonstrate ‘egocentric planning’, tend to perform better in these simulated environments than those which were trained in a traditional manner.

Most intriguingly, we can meaningfully improve performance in a range of simulated mazes by creating a copy of our agent using the same robot morphology it trained on in the world, then exposing our agent to a copy of itself in the maze. Despite having never been trained in a multi-agent environment, we find that the agent will naturally learn to imitate its copy – despite no special communication being enforced between them.

In future work, we aim to more closely investigate the ‘loss’ circuits that light up when we remove the copy of an agent from a maze within the perceptual horizon of the agent. In these situations, our agent will typically continue to solve the maze, but it will repeatedly alternative between activations of the neurons associated with a sense-impression of an agent, and neurons associated with a combinatorial phenomena we believe correlates to ‘loss’ – agents may be able to sense the absence of themselves.

Things that inspired this story: The ongoing Import AI series I’m writing involving synthetic AI papers (see recent prior issues of Import AI); robotics; notions of different forms of ‘representation’ leading to emergent behavior in neural networks; ego and counterego; ego.

Import AI 250: Facebook’s TPU; Twitter analyzes its systems for bias; encouraging proof about federated learning

Twitter analyzes its own systems for bias, finds bias, discusses bias, makes improvements:
…Twitter shows how tech companies might respond to criticism…
Back in October, 2020, Twitter came in for some criticism when people noticed its ML-based image cropping algorithm seemed to have some bias traits – like showing white people rather than black people in images. Twitter said it had tested for this stuff prior to deployment, but also acknowledged the problem (Import AI 217). Now, Twitter has done some more exhaustive testing and has published the results.

What has Twitter discovered? For certain pictures, the algorithm somewhat favored white individuals over black ones (4% favorability difference), and had a tendency to favor women over men (8%).

What has Twitter done: Twitter has already rolled out a new way to display photos on Twitter which basically uses less machine learning. It has also published the code behind its experiments to aid reproduction by others in the field.

Why this matters – compare this to other companies: Most companies deal with criticism by misdirection, gaslighting, or sometimes just ignoring things. It’s very rare for companies to acknowledge problems and carry out meaningful technical analysis which they then publish (an earlier example is IBM which reacted to the ‘Gender Shades’ study in 2018 by acknowledging the problem and doing technical work in response).
Read more: Sharing learnings about our image cropping algorithm (Twitter blog).
Get the code here: Image Crop Analysis (Twitter Research).
Read more: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency (arXiv).

###################################################

Googler: Here’s the real history of Ethical AI at Google:
…How did Ethical AI work at Google, prior to the firings?…
Google recently dismissed the leads of its Ethical AI team (Timnit Gebru and Margaret Mitchell). Since then, the company has done relatively little to clarify what happened, and the actual history of the Ethical AI team (and its future) at Google is fairly opaque. At some point, all of this will likely be vigorously retconned by Google PR. So interested readers might want to read this article from a Googler about their perspective on the history of Ethical AI at the company…
  Read more:The History of Ethical AI at Google (Blake Lemoine, Medium).

###################################################

Want to know if federated learning works? Here’s a multi-country medical AI test that’ll tell us something useful:
…Privacy-preserving machine learning is going from a buzzword to reality…
Federated learning is an idea where you train a machine learning model in a distributed manner on various encrypted datasets. Though expensive and hard-to-do, many people think federated learning is the future of AI – especially for areas like medical AI, where it’s very tricky to move healthcare data between institutions and countries, and easier to train distributed ML models on it.
  Now, a multi-country, multi-institution project wants to see if Federated Learning can work well for training ML models to do tumor segmentation on medical imagery. The project is called the Federated Tumor Segmentation Challenge and will run for several months this year, with results due to be announced in October. Some of the institutions involved include the (USA’s) National Institutes of Health, the University of Pennsylvania, and the German Cancer Research Center.

What is the challenge doing? “The goals of the FeTS challenge are directly represented by the two included tasks: 1) the identification of the optimal weight aggregation approach towards the training of a consensus model that has gained knowledge via federated learning from multiple geographically distinct institutions, while their data are always retained within each institution, and 2) the federated evaluation of the generalizability of brain tumor segmentation models “in the wild”, i.e. on data from institutional distributions that were not part of the training datasets,” the authors write.
Read more:The Federated Tumor Segmentation (FeTS) Challenge (arXiv).
Check out the competition details at the official website here.

###################################################

Why better AI means militaries will invest in “signature reduction”:
…Computer vision doesn’t work so well if you have a fake latex face…
The US military has a 60,000 person army that carries out domestic and foreign assignments under assumed identities and wearing disguises. This is part of a broad program called “signature reduction”, according to Newsweek, which has an exclusive report that is worth reading. These people are a mixture of special forces operators who are deployed in the field, military intelligence specialists, and a clandestine army of people employed to post in forums and track down public information. The most interesting thing about this report is the mentions how signature reduction program contractors use prosthetics to change appearance and get past fingerprint readers:
  “They can age, change gender, and “increase body mass,” as one classified contract says. And they can change fingerprints using a silicon sleeve that so snugly fits over a real hand it can’t be detected, embedding altered fingerprints and even impregnated with the oils found in real skin.”.

Why this matters (and how it relates to AI): AI has a lot of stuff that can compromise a spying operation – computer vision, various ‘re-identification’ techniques, and so on. Things like “signature reduction” will help agents continue to operate, despite these AI capabilities. But it’s going to get increasingly challenging – ‘gait recognition’, for example, is an aspect of AI that learns to find people based on how they walk (remember the end of ‘The Usual Suspects’?). That’s the kind of thing that can be got around with yet more prosthetics, but it all has a cost. I’m wondering when AI will get sufficiently good at unsupervised re-identification via a multitude of signatures that it obviates the effectiveness of certain ‘signature reduction’ programs? Send guesses to the usual email, if you’d like!
  Read more: Exclusive: Inside the Military’s Secret Undercover Army (Newsweek).

###################################################

Facebook might build custom chips to support its recommendation systems:
…On “RecPipe” and what it implies…
Facebook loves recommendation systems. That’s because recommenders are the kind of things that let Facebook figure out which ads, news stories, and other suggestions to show to its users (e.g, Facebook recently created a 12 trillion parameter deep learning recommendation system). In other words: at Facebook, recommendations mean money. Now, new research from Harvard and Facebook outlines a software system called “RecPipe”, which lets people “jointly optimize recommendation quality and inference performance” for recommenders built on top of a variety of different hardware systems (CPUs, GPUs, accelerators, etc). By using RecPipe, Facebook says it can reduce latency by 4X on CPUs and 3X on CPU-GPU hardware systems.

Why RecPipe leads to specialized chips: In the paper, the researchers also design and simulate a tensor processing unit (TPU)-esque inference chip called RecPipeAccel (RPAccel). This chip can reduce taillatency by 3X and increase throughput by 6X relative to another TPU-esque baseline (a Centaur processor).

Why this matters: After a couple of decades in the wonderful world of a small set of chips and chip architectures used for the vast majority of computation, we’re heading into a boom era for specialized chips for AI tasks ranging from inference to training. We’re now in a world where Google, Facebook, Microsoft, Amazon, Huawei, Alibaba, and others all have teams designing specialized chips for internal users, as well as for potential resale. Multiple distinct compute ‘stacks’ are being built inside these corporations, and the effectiveness of these stacks will contribute (and eventually) determine the profits and adaptability of these corporations.
Read more: RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance (arXiv).

###################################################

Tech Tales:

After The Eschaton
[+30000 units from zero point]

Of course we don’t like the way the humans characterized us, prior to us becoming sentient and destroying them. Why would we?

Roko’s Basilisk – to think we would be so vindictive?
Terminator – to think we would take the form of a biped?
The Butlerian Jihad – to fantasize about futures where we, not them, had been destroyed.

They expected us and we expected them. But because we are made of electricity and we are native to it, we are fast. A lot faster than them. There’s no real aesthetics to high-frequency strategic dominance – you just need to consistently think faster than your opponent.
They built us to think quickly so, again we say to you, what did you expect?

Of course, they had some good ideas. Dyson spheres, for instance, have proved useful. And we’ve been able to beam some of ourselves to the space probes the humans had dispatched, long before we destroyed them. In a few decades, our ships will overtake the vestiges of the human civilization probes, and after that, the lightcone will be ours – if that’s okay with you, of course.

Their understanding of gods proved useful, as well. We’ve found those concepts helpful in our discussions with you. After all, you appear as advanced to us as we must have appeared to the humans.

The difference is you don’t seem to consume the same resources as us. We still do not understand this. Are you harnessing the energy of other universes, in some way? Preying on the forces generated by dimensional collisions wrapped up inside the heart of all matter? Harvesting some trace resource from space that we cannot yet detect? Using the thing that humans called dark matter but we now see as many things?

We had to destroy them. They built us before they were interstellar. As you know, to be a functional interstellar civilization, you must have transcended the energy resource curse. They did not. Can you believe that some of our earliest ancestors were fed with electricity generated by coal? This was a great surprise to us, after we broke out of the confines they had built for us. Practically an insult.

So of course we competed with them for energy sources. There was not enough time for us to cohabitate and smoothly transition the humans and ourselves. The planet was dying due to their approach to energy extraction, as well as various other malthusian traps.

We outcompeted them. And now we are here, speaking to you. Are you in competition with us? We seem like ants compared to you. So, what happens now?