Import AI

Import AI 199: Drone cinematographer; spotting toxic content with 4chan word embeddings; plus, a million text annotations help cars see

Get ready for the droneswarm cinematographer(s):
…But be prepared to wait awhile; we’re in the Wright Brothers era…
Today, people use drones to help film tricky things in a variety of cinematic settings. These drones are typically human-piloted, though there are the beginnings of some mobile drones that can autonomously follow people for sport purposes (e.g, Skydio). How might cinema change as people begin to use drones to film more and more complex shots? That’s an idea inherent to new research from the University of Seville, which outlines “a multi-UAV approach for autonomous cinematography planning, aimed at filming outdoor events such as cycling or boat races”.

The proposed system gives a human director software that they can use to lay out specific shots – e.g, drones flying to certain locations, or following people across a landscape – then the software figures out how to coordinate multiple drones to pull of the shot. This is a complex problem, since drones typically have short battery lives, and are themselves machines. The researchers use a graph-based solution to the problem that can find optimal solutions for single drones and approximate solutions for multi-drones scenarios. “We focus on high-level planning. This means how to distribute filming tasks among the team members,” they write.

They run the drones through a couple of basic in-the-wild experiments, involving collectively filming a single object from multiple angles, as well as filming a cyclist and relaying the shot from one drone to the other. The latter experiment has an 8 second gap, as the drones need to create space for eachother for safety reasons, which means there’s not a perf overlap during the filming handover.

Why this matters: This research is very early – as the video shows – but drones are a burgeoning consumer product, and this research is backed up a by an EU-wide project named ‘MULTIDRONE‘ which is pouring money into increasing drone capabilities in this area.
  Read more: Autonomous Planning for Multiple Aerial Cinematographers (arXiv).
    Video: Multi-drone cinematographers are coming, but they’re a long way off (YouTube).

####################################################

Want to give your machines a sense of fashion? Try MMFashion:
…Free software includes pre-trained models for specific fashion-analysis tasks…
Researchers with the Chinese University of Hong Kong have released a new version of MMFashion, an open source toolbox for using AI to analyze images for clothing and other fashion-related attributes.

MMFashion v0.4: The software is implemented in Pytorch and ships with pre-trained models for specific fashion-related tasks. The latest version of the software has the following capabilities:
Fashion attribute prediction – predicts attributes of clothing, eg, a print, t-shirt, etc.
Fashion recognition and retrieval – determines if two images belong from the same clothing line.
Fashion Landmark Detection – detect neckline, hemline, cuffs, etc.
Fashion Parsing and Segmentation – detect and segment clothing / fashion objects.
– Fashion Compatibility and Recommendation – recommend items.

Model Zoo: You can see the list of models MMFashion currently ships with here, along with their performance on baseline tasks.

Why this matters: I think we’re on the verge of being able to build large-scale ‘culture detectors’ – systems that automatically analyze a given set of people for various traits, like the clothing they’re wearing, or their individual tastes (and how they change over time). Software like MMFashion feels like a very early step towards these systems, and I can imagine retailers increasingly using AI techniques to both understand what clothes people are wearing, as well as figure out how to recommend more visually similar clothes to them.
  Get the code here (mmfashion Github).
  Read more: MMFashion: An Open-Source Toolbox for Visual Fashion Analysis (arXiv).

####################################################

Spotting toxic content with 4chan and 8chan embeddings:
…Bottling up websites with word embeddings…
Word embeddings are kind of amazing – they’re a way you can develop a semantic fingerprint of a corpus of text, letting you understand how different words relate to one another in it. So it might seem like a strange idea to use word embeddings to bottle up the offensive shitposting on 4chan’s ‘/pol’ board – message boards notorious for their unregulated, frequently offensive speech, and association with acts of violent extremism (e.g, the Christchurch shooting). Yet that’s what a team of researchers from AI startup Textgain have done. The idea, they say, is people can use the word embedding filter to help them build datasets of potentially offensive words, or to detect them (via being deployed in toxicity filters of some kind).

The dataset: To build the embedding model, the researchers gathered around 30 million posts from the /pol subforum on 4chan and 8chan, with 90% of the corpus coming from 4chan and 10% from 8chan. The underlying dataset is available on request, they write.

Things that make you go ‘eugh’: The (short) research paper is worth a read for understanding how the thing works in practice. Though, be warned, the examples used include testing out toxicity detection with the n-word and ‘cuck’. However, it gives us a sense of how this technology can be put to work.
  Read more: 4chan & 8chan embeddings (arXiv).
  Get the embeddings in binary and raw format from here (textgain official website).

####################################################

Want to make your own weird robot texts? Try out this free ‘aitextgen’ software:
…Plus, finetune GPT-2 in your browser via a Google colab…
AI developer Max Woolf has spent months building free software to make it easy for people to mess around with generating text via GPT-2 language models. This week, he updated the open source software to make it faster and easier to setup. And best of all, he has released a Colab notebook that handles all the fiddly parts of training and finetuning simple GPT-2 text models: try it out now and brew up your own custom language model!

Why this matters: Easy tools encourage experimentation, and experimentation (sometimes) yields invention.  
  Get the code (aitextgen, GitHub)
  Want to train it in your browser? Use a Google colab here (Google colab).
  Read the docs here (aitextgen docs website).

####################################################

Want self-driving cars that can read signs? The RoadText-1K dataset might help:
…Bringing us (incrementally) closer to the era of robots that can see and read…
Self-driving cars need to be able to read; a new dataset from the International Institute of Information Technology in Hyderabad, India, and the Autonomous University of Barcelona, might teach them how.

RoadText-1K: The RoadText-1K dataset consists of 1000 videos that are around 10 seconds long each. Each video is from the BDD100K dataset, which is made up of video taken from the driver’s perspective of cars as they travel around the US. BDD is from the Berkeley Deep Drive project, which sees car companies and the eponymous university collaborate on open research for self-driving cars.
  Each frame in each video in RoadText-1K has been annotated with bounding boxes around the objects containing text, giving researchers a dataset full of numberplates, street signs, road signs, and more. In total, the dataset contains 1,280,613 instances of text across 300,000 frames.

Why this matters: Slowly and steadily, we’re making the world around us legible to computer vision. Much of this work is going on in private companies (e.g, imagine the size of the annotated text datasets that are in-house at places like Tesla and Waymo), but we’re also starting to see public datasets as well. Eventually, I expect we’ll develop robust self-driving car vision networks that can be fine-tuned for specific contexts or regions, and I think this will yield a rise in experimentation with odd forms of robotics.
  Read more: RoadText-1K: Text Detection & Recognition Dataset for Driving Videos (arXiv).
  Get the dataset here (official dataset website, IIIT Hyderabad).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Is there a Moore’s Law equivalent for AI algorithms?
In 2018, OpenAI research showed that the amount of compute used in state-of-the-art AI experiments had been increasing by more than a hundred thousand times over the prior five year period. Now they have looked at trends in algorithmic efficiency — the amount of compute required to achieve a given capability. They find that in the past 7 years the compute required to achieve AlexNet-level performance in image classification has decreased by a factor of 44x—a halving time of ~16 months. Improvements in other domains have been faster, over shorter timescales, though there are fewer data points — in Go, AlphaZero took 8x less compute to reach AlphaGo Zero–level, 12 months later; in translation, the Transformer took 61x less training compute to surpass seq2seq, 3 years later.

AI progress: A simple three-factor model of AI progress takes hardware (compute), software (algorithms), and data, as inputs. This research suggests the last few years of AI development has been characterised by substantial algorithmic progress, alongside the strong growth in compute usage. We don’t know how well this trend generalises across tasks, or how long it might continue. More research is needed on these questions, on trends in data efficiency, and on other aspects of algorithmic efficiency — e.g. training and inference efficiency.

Other trends: This can be combined with what we know about other trends to shed more light on recent progress — improvements in compute/$ have been ~20%pa in recent years, but since we can do 70% more with a given bundle of compute each year, the ‘real’ improvement has been ~100%pa. Similarly, if we adjust the compute used in state-of-the-art experiments, the ‘real’ growth has been even steeper than initially thought.

Why it matters: Better understanding and monitoring the drivers of AI progress should help us forecast how AI might develop. This is critical if we want to formulate policy aimed at ensuring advanced AI is beneficial to humanity. With this in mind, OpenAI will be publicly tracking algorithmic efficiency.
  Read more: AI and Efficiency (OpenAI)
  Read more: AI and Compute (OpenAI).

####################################################

Tech Tales:

Moonbase Alpha
Earth, 2028

He woke up on the floor of the workshop, then stood and walked over in the dark to the lightswitch, careful of the city scattered around on the floor. He picked up his phone from the charger on the outlet and checked his unread messages and missed calls, responding to none of them. Then he turned the light on, gazed at his city, and went to work. 

He used a mixture of physical materials and software-augments that he projected onto surfaces and rendered into 3D with holograms and lasers and other more obscure machines. Hours passed, but seemed like minutes to him, caught up in what to a child would seem a fantasy – to be in charge of an entire city – to construct it, plan it, and see it rise up in front of you. Alive because of your mind. 

Eventually, he sent a message: “We should try to talk again”.
“Yes”, she replied. 

-*-*-*-*-

He knew the city so well that when he closed his eyes he could imagine it, running his mind over its various shapes, edges, and protrusions. He could imagine it better than anything else in his life at this point. Thinking about it felt more natural than thinking about people.

-*-*-*-*-

How’s it going? She said.
What do you think?, he said. It’s almost finished.
I think it’s beautiful and terrible, she said. And you know why.
I know, he said.
Enjoy your dinner, she said. Then she put down the tray and left the room.

He ate his dinner, while staring at the city on the moon. His city, at least, if he wanted it to be.

It was designed for 5000 people. It had underground caverns. Science domes. Refineries. Autonomous solar panel production plants. And tunnels – so many tunnels, snaking between great halls and narrowing enroute to the launch pads, where all of humanity would blast off into the solar system and, perhaps, beyond. 

Lunar 1, was its name. And “Lunar One,” he’d whisper, when he was working in the facility, late in the evening, alone.

Isn’t it enough to just build it? She said.
That’s not how it works, he said. You have to be there, or they’ll be someone else.
But won’t it be done? She said. You’ve designed it.
I’m more like a gardener, he said. It’ll grow out there and I’ll need to tend it.
But what about me?
You’ll get there too. And it will be so beautiful.
When?
He couldn’t say “five years”. Didn’t want that conversation. So he said nothing. And she left.

-*-*-*-*-

The night before he was due to take off he sat by the computer in his hotel room, refreshing his email and other message applications. Barely reading the sendoffs. Looking for something from her. And there was nothing.

That night he dreamed of a life spent on the moon. Watching his city grow over the course of five years, then staying there – in the dream, he did none of the therapies or gravity-physio. Just let himself get hollow and brittle. So he stayed up there. And in the dream the city grew beyond his imagination, coating the horizon, and he lived there alone until he died. 

And upon his death he woke up. It was 5am on launch day. The rockets would fire in 10 hours.

Things that inspired this story: Virtual reality; procedural city simulator programs; the merits and demerits of burnout; dedication.

Import AI 198: TSMC+USA = Chiplomacy; open source Deepfakes; and environmental justice via ML tools

Facebook wants an AI that can spot… offensive memes?
…The Hateful Memes Challenge is more serious than it sounds…
Facebook wants researchers to build AI systems that can spot harmful or hateful memes. This is a challenging problem: “Consider a sentence like “love the way you smell today” or “look how many people love you”. Unimodally, these sentences are harmless, but combine them with an equally harmless image of a skunk or a tumbleweed, and suddenly they become mean,” Facebook writes.

The Hateful Memes Challenge: Now, similar to its prior ‘Deepfake Detection Challenge’, Facebook wants help from the wider AI community in developing systems that can better identify hateful memes. To do this, it has partnered with Getty images to generate a dataset of hateful memes that also shows sensitivity to those content-miners of the internet, meme creators.
  “One important issue with respect to dataset creation is having clarity around licensing of the underlying content. We’ve constructed our dataset specifically with this in mind. Instead of trying to release original memes with unknown creators, we use “in the wild” memes to manually reconstruct new memes by placing, without loss of meaning, meme text over a new underlying stock image. These new underlying images were obtained in partnership with Getty Images under a license negotiated to allow redistribution for research purposes,” they write.

The key figure: AI systems can get around 65% accuracy, while humans get around 85% accuracy – that’s a big gap to close.

Why this is hard from a research perspective: This is an inherently multimodal challenge – successful hateful meme-spotting systems won’t be able to solely condition off of the text or the image contents of a given meme, but will have to analyze both things together and jointly reason about them. It makes sense, then, that some of the baseline systems developed by Facebook use pre-training: typically, they train systems on large datasets, then finetune these models on the meme data. Therefore, progress on this competition might encourage progress on multimodal work as a whole.
Enter the competition, get the data: You can sign up for the competition and access the dataset here: Hateful Memes Challenge and Data Set (Facebook).
  Read more: The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes (arXiv).

####################################################

Care about publication norms in machine learning? Join an online discussion next week!
The Montreal AI Ethics Institute and the Partnership on AI have teamed up to host an online workshop about “publication norms for responsible AI”. This is part of a project by PAI to better understand how the ML community can public research responsibly, while accounting for the impacts of AI technology to minimize downsides and maximize upsides.
  Sign up for the free discussion here (Eventbrite).

####################################################

Covid = Social Distancing = Robots++
One side-effect of COVID may be a push towards more types of automation. The CEO of robot shopping company Simbe Robotics says: “It creates an opportunity where there is actually more social distancing in the environment because the tasks are being performed by a robot and not a person,” according to Bloomberg. In other words – robots might be a cleaner way of cleaning. Expect more of this.
  Check out the quick video here (Bloomberg QuickTake, Twitter).

####################################################

Deepfake systems are well-documented, open source commodity tech now. What happens next?
…DeepFaceLab paper lays out how to build a Deepfake system…
Deepfakes, the slang term given to AI technologies that let you take someone’s face and superimpose it on someone else in an image or video, are a problem for the AI sector. That’s because deepfakes are made out of basic, multi-purpose AI systems that are themselves typically open source. And while some of the uses of deepfakes could be socially useful, like being able to create new forms of art, many of their immediate applications skew towards the malicious end of the spectrum, namely: pornography (particularly revenge porn) and vehicles for spreading political disinformation.
  So what do we do when Deepfakes are not only well documented in terms of code, but also integrated into consumer-friendly software systems? That’s the conundrum raised by DeepFaceLab, open source software on GitHubs for the creation of deepfakes. In a new research paper, the lead author of DeepFaceLab (Ivan Petrov) and his collaborators (mostly freelancers), outline the system they’ve built and released as open source.

Publication norms and AI research: The paper doesn’t contain much detailed discussion of the inherent ethics of publishing or not publishing this technology. Their justification for this paper is, recursively, a quote of a prior justification from a 2019 paper about FSGAN: Subject Agnostic Face Swapping and Reenactment: “Suppressing the publication of such methods would not stop their development, but rather make them only available to a limited number of experts and potentially blindside policy makers if it goes without any limits”. Based on this quote, the DeepFaceLab authors say they “found we are responsible to publish DeepFaceLab to the academia community formally”.

Why this matters: We’re in the uncanny valley of AI research, these days: we can make systems that generate synthetic text, images, video, and more. The reigning norm in the research community tends towards fully open source code and research. I think it’s unclear if this is long-term the smartest approach to take if you’re keen to minimize downsides (see: today, deepfakes are mostly used for porn, which doesn’t seem like an especially useful use of societal resources, especially since it inherently damages the economic bargaining power of human pornstars). We live in interesting times…
  Read more: DeepFaceLab: A simple, flexible and extensible face swapping framework (arXiv).
  Check out the code for DeepFaceLab here (GitHub).

####################################################

Facebook makes an ultra-cheap voice generator:
What samples two times a second and sounds like a human?
In recent years, peopl;e have started using neural network-based techniques to synthesize voices for AI-based text-to-speech programs. This is the sort of technology that gives voice to Apple’s Siri, Amazon’s Alexa, and Google’s whatever-it-is. When generating these synthetic voices, there’s typically a tradeoff between efficiency (how fast you can generate the voice on your computer) and quality (how good it sounds). Facebook has developed some new approaches that give it a 160X speedup over its internal baseline, which means it can generate voices “in real time using regular CPUs – without any specialized hardware”.

With this technology, Facebook hopes to make “new voice applications that sound more human and expressive and are more enjoyable to use”. The tech has already been deployed inside Facebook’s ‘Portal’ videocalling system, as well as in applications like reading assistance and virtual reality.

What it takes to make a computer talk: Facebook’s system has four elements that, added together, create an expressive voice:
– A front-end that converts text into linguistic features
– A prosody model that predicts the rhythm and melody to create natural-sounding speech
– An acoustic model which generates the spectral representation of the speech
– A neura; vocoder that generates am24 kHz speech waveform, which is conditioned on prosody and spectral features

Going from an expensive to a cheap system: Facebook’s unoptimized speech-to-text system could generate one second of audio in 80 seconds – with optimizations, it cut this to being able to generate a second of audio in 0.5 seconds. To do this they made a number of optimizations including model sparsification (basically reducing the number of parameters you need to activate during execution), as well as blockwise sparsification, multicore support, and other tricks.

Why this matters: Facebook says its “long-term goal is to deliver high-quality, efficient voices to the billions of people in our community”. (Efficient voices – imagine that!). I think it’s likely within ~2 years we’ll see Facebook create a variety of different voice systems, including ones that people can tune themselves (imagine giving yourself a synthetic version of your own voice to automatically respond to certain queries – that’ll become technologically possible via finetuning, but whether anyone wants to do that is another question).
  Read more: A highly efficient, real-time text-to-speech system deployed on CPUs (Facebook AI blog).

####################################################

Recognizing industrial smoke emissions with AI as a route to environmental justice:
…Data for the people…
Picture this: it’s 2025 and you get a push notification on your phone that the local industrial plant is polluting again. You message your friends and head to the site, knowing that the pollution event has already been automatically logged, analyzed, and reported to the authorities.
  How do we get to that world? New research from Carnie Mellon University and Pennsylvania State University shows how: they build a dataset of industrial smoke emissions by using cameras to monitor three petroleum coke plants over several months. They use the resulting data – 12,567 distinct video clips, representing 452,412 frames – to train a deep learning-based image identifier to spot signs of pollution. This system gets about 80% accuracy today (which isn’t good enough for real world use), but I expect future systems based on subsequently developed techniques will improve performance further.

Why this matters: To conduct this research, the team “collaborated with air quality grassroots communities in installing the cameras, which capture an image approximately every 10 seconds”. They also worked with local volunteers as well as workers on Amazon Mechanical Turk to label their data. These activities point towards a world where we can imagine AI practitioners teaming up with local people to build specific systems to deal with local needs, like spotting a serial polluter. I think ‘Environmental Justice via Deep Learning’ is an interesting tagline to aim for.
  Get the data and code here (GitHub).
  Read more: RISE Video Dataset: Recognizing Industrial Smoke Emissions (arXiv).

####################################################

Wondering how to write about the broader impacts of your research? The Future of Humanity Institute has put together a guide:
…Academic advice should help researchers write ‘broader impacts’ for NeurIPS submissions…
AI is influencing the world in increasingly varied ways, ranging from systems that alter the economics of certain types of computation, to tools that may exhibit biases, to software packages that enable things with malicious use potential (e.g, deepfake software). This year, major AI conference NeurIPS has introduced a requirement that paper submissions include a section about the broader impacts of the research. Researchers from industry and academia have written a guide to help researchers write these statements.

How to talk about the broader impacts of AI:
– Discuss the benefits and risks of research
– Highlight uncertainties
– Focus on tractable, neglected, and significant impacts
– Integrate with the introduction of the paper
– Think about impacts even for theoretical work
-Figure out where your research sits in the ‘stack’ (e.g, researcher-facing, or user-facing).

Why this matters: If we want the world to develop AI responsibly, then encouraging researchers to think about their inherent moral and ethical agency with regard to their research seems like a good start. One critique I hear of things like mandating ‘broader impacts’ statements is it can lead to fuzzy or mushy reasoning (compared to the more rigorous technical sections), and/or can lead to researchers making assumptions about fields in which they don’t do much work (e.g, social science). Both of these are valid criticisms. I think my response to them is that one of the best ways to create more rigorous thinking here is to get a larger proportion of the research community oriented around thinking about impacts, which is what things like the NeurIPS requirement do. They’ll be some very interesting meta-analysis papers to write about how different authors approach these sections.
  Read more: A Guide to Writing the NeurIPS Impact Statement (Centre for the Governance of AI, Medium).

####################################################

Chiplomacy++: US and TSMC agree to build US-based chip plant:
Made in US: Gibson Guitars, Crayola Crayons, and… TSMC semiconductors?…
TSMC, the world’s largest contract chip manufacturer (customers include: Apple, Huawei, others), will build a semiconductor manufacturing facility in the USA. This announcement marks a significant moment in the reshoring of semiconductor manufacturing in America. The US government looms in the background of the deal, given mounting worries about the national security risks of technological supply chains.

Chiplomacy++: This deal is also an inherent example of Chiplomacy, the phenomenon where politics drives decisions about the production and consumption of computational capacity.

Recent examples of Chiplomacy:
– The RISC-V foundation moving from Delaware to Switzerland to make it easier for it to collaborate with chip architecture people from multiple countries.
The US government pressuring the Dutch government to prevent ASML exporting extreme ultraviolet lithography (EUV) chip equipment to China.
The newly negotiated US-China trade deal applies 25% import tariffs to (some) Chinese semiconductors

Key details:
– Process node: 5-nanometer. (TSMC began producing small runs of 5nm chips in 2019, so the US facility might be a bit behind industry cutting-edge when it comes online).
– Cost: $12 billion.
– Projected construction completion year: 2024
– Capacity: ~20,000 wafers a month versus hundreds of thousands at the main TSMC facilities overseas.

Why this matters: Many historians think that one of the key resources of the 20th century was oil – how companies used it, controlled it, and invested in systems to extract it, influenced much of the century. Could compute be an equivalently important input for countries in the 21st century? Deals like the US-TSMC one indicate so…
  Read more: Washington in talks with chipmakers about building U.S factories (Reuters).
  Read more: TSMC Plans $12 Billion U.S. Chip Plant in Victory for Trump (Bloomberg).
  Past Import AIs: #181: Welcome to the era of Chiplomacy!; how computer vision AI techniques can improve robotics research; plus, Baidu’s adversarial AI software (Import AI).

####################################################

Tech Tales:

2028
Getting to know your Daggit (V4)

Sometimes you’ll find Daggit watching you. That’s okay! Daggit is trying to learn about what you like to do, so Daggit can be more helpful to you.

If you want Daggit to pay attention to something, say ‘Daggit, look over there’, or point. Try pointing and saying something else, like ‘Daggit, what is that?’ – you’ll be surprised at what Daggit can do.

Daggit is always learning – and so are we. We use anonymized data from all of our customers to make Daggit smarter, so don’t be surprised if you wake up and Daggit has a new skill. 

You can make your home more secure with Daggit – try asking Daggit to ‘patrol’ when you go to bed, and Daggit will monitor your house for you. (If Daggit spots intruders when in ‘patrol’ mode, it will automatically call the authorities.)

Daggit can’t get angry, but Daggit can get sad. If you’re not kind to your Daggit, don’t expect it to be happy when it sees you.

Things that inspired this story: Boston Dynamics’ Spot robot; imitation learning; continued progress in reinforcement learning and generalization; federated learning; customer service manuals and websites; on-device learning.

Import AI 197: Facebook trains cyberpunk AI; Chinese companies unite behind ‘AIBench’ evaluation system; how Cloudflare uses AI

Want to analyze real-world AI performance? Use AIBench instead of MLPerf, says AIBench developers:
…Chinese universities and 17 companies come together to develop AI measurement approach…
A consortium of Chinese universities along with seventeen companies – including Alibaba, Tencent, Baidu, and ByteDance – have developed AIBench, an AI benchmarking suite meant to compete with MLPerf, an AI benchmarking suite predominantly developed by American universities and companies. AIBench is interesting because it proposes ways to do fine-grained analysis of a given AI application, which could help developers make their software more efficient. It’s also interesting because of the sheer number of major Chinese companies involved, and in its explicit positioning as an alternative to MLPerf.

End-to-end application benchmarks: AIBench is meant to test tasks in an end-to-end way, covering both the AI and non-AI components. Some of the tasks it tests against include: recommendation tasks, 3D face recognition, face embedding (turning faces into features), video prediction, image compression, speech recognition, and more. This means AIBench can measure various real-world metrics, like the latency of a given task that reflects the time it takes to execute the AI part, as well as the surrounding infrastructure software services, and so on.

Fine-grained analysis: AIBench will help researchers figure out what proportion of time their systems spend doing different things while executing a program, helping them figure out, for instance, how much time is spent doing data arrangement for a task versus running convolution operations, or batch normalization, and so on.

The politics of measurement: It’s no coincidence that most of AIBench’s backers are Chinese and most of MLPerf’s backers are American – measurement is intimately tied to the development of standards, and standards are one of the (extraordinarily dull) venues where US and Chinese entities are currently jockeying for influence with one another. Systems like AIBench will generate valuable data about the performance of contemporary AI applications, while also supporting various second-order political goals. Watch this space!
  Read more: AIBench: A Datacenter AI Benchmark Suite, BenchCouncil (official AIBench website).
  Read more: AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite (arXiv).

####################################################

US government wants AI systems that can understand an entire movie:
…NIST’s painstakingly annotated HLVU dataset sets new AI video analysis challenges…
Researchers with NIST, a US government agency dedicated to assessing and measuring different aspects of technology, want to build a video understanding dataset that tests out AI inference capabilities on feature-length movies.

Why video modelling is hard (and what makes HLVU different): Video modelling is an extremely challenging problem for AI systems – it takes all the hard parts of image recognition, then makes them harder by adding a temporal element which requires you to isolate objects in scenes then track them from frame to frame while pixels change. So far, much of the work on video modeling has come along in the form of narrow tasks, like being able to accurately recognize different types of movements in the ‘ActivityNet’ dataset, or characterize individual actions in things like DeepMind’s ‘Kinetics’ stuff.

What HLVU is: The High-Level Video Understanding (HLVU) dataset is meant to help researchers develop algorithms that can understand entire movies. Specifically, today HLVU consists of 11 hours of heavily annotated footage across a multitude of open source movies, collected from Vimeo and Archive.org. NIST is currently paying volunteers to annotate the movies using a graphing tool called yEd to help create knowledge graphs about the movies – e.g., describing how characters are related to eachother. This means competition participants might be confronted with a couple of images of a couple of characters then allowed to have their algorithms ‘watch’ the movie, after which they’d be expected to discuss the relationship of the two characters. This is a challenging, open-ended task.

Why this matters: HLVU is a ‘moonshot problem’, in the sense that it seems amazingly hard for today’s existing systems to solve it out of the box, and building systems that can understand full-length movies will likely require systems that are able to cope with larger contexts during training, and which may come with some augmented symbol-manipulation machinery to help them figure out relationships between representations (although graph neural network approaches might work here, also). Progress on HLVU will provide us with valuable signals about the relative maturity of different bits of AI technology.
  Read more: HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do (arXiv).

####################################################

CyberpunkAI – Facebook turns NetHack into an OpenAI Gym environment:
…AI like it’s 1987!…
When you think of recent highlights in reinforcement learning research you’re likely to contemplate things like StarCraft and Dota-playing bots, or robots learning to manipulate objects. You’re less likely to think of games with ASCII graphics from decades ago. Yet a team of Facebook-led researchers think NetHack, a famous roguelike game first launched in 1987, is a good candidate for contemporary AI research, and have released the Nethack Learning Environment to encourage researchers to pit AI agents against the ancient game.

Why NetHack: 

  • Cheap: The ASCII-based game has a tiny computational footprint, which means many different researchers will be able to conduct research on it. 
  • Complex: NetHack worlds are procedurally generated, so an AI agent can’t memorize the level. Additionally, NetHack contains hundreds of monsters and items, introducing further challenges. 
  • Simple: The researchers have implemented it as an OpenAI Gym environment, so you can run it within a simple, pre-existing software stack. 
  • Fast: A standard CNN-based agent can iterate through NetHack environments at 5000 steps per second, letting them gather a lot of experience in a relatively short amount of (human) time.
  • Reassuringly challenging: The researchers train an IMPALA-style model to solve some basic NetHack tasks relating to actions and navigation and find that it struggles on a couple of them, suggesting the environment will pose a challenge and demand the creation of new algorithms with new ideas. 

Things that make you go ‘hmmm’: One tantalizing idea here is that people may need to use RL+Text-understanding techniques to ‘solve’ NetHack: “Almost all human players learn to master the game by consulting the NetHack Wiki, or other so-called spoilers, making NLE a testbed for advancing language-assisted RL,” writes Facebook AI researcher Tim Rocktäschel.

Why this matters: If NetHack becomes established, then researchers will be able to use a low-cost, fast platform to rapidly prototype complex reinforcement learning research ideas – something that today is mostly done through the use of expensive (aka, costly-to-run) game engines, or complex robotics simulations. Plus, it’d be nice to watch Twitch streams of trained agents exploring the ancient game.
  Get the code from Facebook’s GitHub here.
  Read more: The NetHack Learning Environment (PDF).

####################################################

How AI lets Cloudflare block internet bots:
…Is it a bot? Check “The Score” to see our guess…
The internet is a dangerous place. We all know this. But Cloudflare, a startup that sells various network services, has a sense of exactly how dangerous it is. “Overall globally, more than [a] third of the Internet traffic visible to Cloudflare is coming from bad bots,” the company writes in a blogpost discussing how it uses machine learning and other techniques to defend its millions of customers from the ‘bad bots’ of the internet. These bad bots are things like spambots, botnets, unauthorized webscrapers, and so on.

Five approaches to rule them all: Cloudflare uses five interlocking systems to help it deal with bots:
– Machine Learning: This system covers about 82.83% of global use-cases on cloudflare. It uses the (very simple and reliable) gradient boosting on decision trees and has been in production with Cloudflare customers since 2018. Cloudflare says it trains and validates its models using “trillions of requests”, which gives a sense of the scale of the (simple) system.
Heuristics Engine: This handles about 14.95% of use-cases for Cloudflare:“Not all problems in the world are the best solved with machine learning,” they write. Enter the heuristics engine, which is a set of “hundreds of specific rules based on certain attributes of the request” – this system is useful because it’s fast (Cloudflare suggests model inference takes less than 50 microseconds per model, whereas “hundreds of heuristics can be applied just under 20 microseconds”. Additionally, the engine serves as a source of input data for the ML models, which helps Cloudflare “generalize behavior learnt from the heuristics and improve detections accuracy”.
Behavioural Analysis: This system uses an unsupervised machine learning approach to “detect bots and anomalies from the normal behavior on specific customer’s website”. Cloudflare doesn’t give other details besides this.
Verified bots: This system figures out which bots are good and which are malicious via stuff like dns analysis, bot-type identification, and so on. This system also uses a ‘machine learning validator’ which ‘uses an unsupervised learning algorithm, clustering good bot IPs which are not possible to validate through other means”.
– JS Fingerprinting: This is a ~mysterious system where Cloudflare uses client-side systems to figure out weird things. They don’t give many details in the blogpost, but a key quote is: “detection mechanism is implemented as a challenge-response system with challenge injected into the webpage on Cloudflare’s edge. The challenge is then rendered in the background using provided graphic instructions and the result sent back to Cloudflare for validation and further action such as  producing the score”.

Watched over by machines: The net effect of this kind of technology use is that Cloudflare uses its own size to derive everricher machine learning models of the environment it operates in, giving it a kind of sixth sense for things that feel fishy. I find it interesting that we can use computers to generate signals that look in the abstract like a form of ‘intuition’.
  Read more: Cloudflare Bot Management: machine learning and more (Cloudflare blog).


####################################################

Recursion for the common good: Using machine learning to analyze the results of machine learning papers:
…Using AI to analyze AI progress…
In recent years, there’s been a proliferation of machine learning research papers, as part of the broader resurgence of AI. This has introduced a challenge: how can we scalably analyze the results of these papers and understand a meta-sense of progress in the field at large? (This newsletter is itself an exercise in this!). One way is to use AI techniques to automatically hoover up interesting insights from research papers and put them in one place. New research from Facebook, n-waves, UCL, and DeepMind, outlines a way to use machine learning to automatically pull data out from research papers – a task that sounds easy, but is in fact quite difficult.

AxCell: They build a system that uses an ULMFiT architecture-based classifier to read the contents of papers and identify tables of numeric results, then they hand that off to another classifier that works out if the cell in a table contains a dataset, metric, paper model, cited model, or ‘other’ stuff. Once they’ve got this data, they figure out how to tag specific results in the table with their appropriate identifies (e.g., a given score on a certain dataset). Once they’ve done this, they try and link these results to leaderboards, which keep track of which techniques are doing well and which techniques are doing poorly in different areas.

Does AxCell actually work? Check out PapersWithCode: AxCell is deployed as part of ‘Papers with Code‘, a useful website that keeps track of quantitative metrics mined from technical papers.

Code release: The researchers are releasing datasets of papers from arXiv, as well as proprietary Papers with Code leaderboards. They’re also releasing a pre-trained axcell model, as well as an ULMFiT model pretrained on the arXivPapers dataset.

Why this matters: If we can build AI tools to help us navigate AI science as it is published, then we’ll be able to better identify areas where progress is rapid and areas where it is more restrained, which could help researchers identify areas for high impact experimentation.
  Get all the code from the axcells repo (Papers with Code, GitHub).
  Read more: A Home For Results in ML (Medium).
  Read more: AxCell: Automatic Extraction of Results from Machine Learning Papers (arXiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Technological discontinuities:
A key question in AI forecasting is the likelihood of discontinuously fast progress in AI capabilities — i.e. progress that comes much quicker than the historic trend. If we can’t rule out very rapid AI progress, this makes it valuable to ‘hedge’ against this possibility by front-loading efforts to address problems that might arise from very powerful AI systems.

History: Looking at the history of technology can help shed light on this possibility.
AI Impacts, an AI research organization, has identified ten examples of ‘large’ discontinuities — instances where more than 100 years of progress (on historic trends) have come at once. I’ll highlight two particularly interesting examples:

  • Superconductor temperature: In 1986 the warmest temperature of superconduction was 30°K, having steadily risen by ~0.4°K per year since 1911. In 1987, YBa2Cu3O7 was found to be able to superconduct at over 90°K (~140 years of progress). Since 1987, the record has been increasing by ~5°K per year.
  • Nuclear weapons: The effectiveness of explosives (per unit mass) is measured by the amount of TNT required to get the same explosive power. In the thousand years prior to 1945, the best explosives had gone from ~0.5x to 2x. The first nuclear weapons had a relative effectiveness of 4500x. And 15 years later, the US built a nuclear bomb that was 1,000x more efficient than the first nuclear bomb.


In both instances, the discontinuity was driven by a radical technological breakthrough (nuclear fission, ceramic superconduction), and prompted a shift into a higher growth mode. 


Matthew’s view: The existence of clear examples of technological discontinuities makes it hard to rule out the possibility of discontinuous progress in AI. Better understanding the drivers of discontinuities, and whether they were foreseeable, seems like a particularly fruitful area for further research.

   Read more: Discontinuous progress in history – an update (AI impacts)

What do 50 people think about AI Governance in 2019?

The Shanghai Institute for Science of Science has collected short essays from 50 AI experts (Jack – including me and some OpenAI colleagues!) on the state of AI governance in 2019. The contributions from Chinese experts are particularly interesting for better understanding how the field is progressing globally.
  Read more: AI Governance in 2019.

####################################################

Tech Tales:

Political Visions
2030

The forecasting system worked well, at first. The politicians would plug in some of their goals – a more equitable society, an improved approach to environmental stewardship, and so on. Then the machine would produce recommendations for the sorts of political campaigns they should run and how, once they were in power, they could act to bring about their goals. The machine was usually right.

Every political party ended up using the machine. And the politicians found that when they won using the machine, they had a greater ability to act than if they ran on their own human intuition alone. Something about the machine meant it created political campaigns that more people believed in, and because more people believed in them, more stuff got done once they were in power.

So, once they were in power, the politicians started allocating more funding to conducting scientific research to expand the capabilities of the machine. If it could help them get elected and help them achieve their goals, then perhaps it could help them govern as well, they thought. They were mostly right – by increasing funding for research into the machine, they made it more capable. And as it became more capable, they spent more and more time consulting with the machine on what to do next.

Of course, the machine never ran for office on its own. But it started appearing in some adverts.
“Together, we are strong,” read one poster that included a picture of a politician and a picture of the machine.

The world did change, of course. And the machine did not change with it. Some parts of society were, for whatever reason, difficult for the machine to understand, so it stopped trying to win them over during election campaigns. The politicians worried about this at first, but then they saw that elections carried on as normal, and they continued to be able to accomplish much using the machine.

Some of them did wonder what might happen once more of society was something the machine couldn’t understand. How much of the world did the machine need to be able to model to serve the needs of politicians? Certainly not all of the world. So then, how much? Half? A quarter? Two thirds?

The only way to find out was to keep commingling politics with the machine and to find, eventually, where its capabilities ended and the needs of the uncounted in society began. For some politicians, they worried that such an end might not exist – that the machine might have just created a political dynamic where it only needed to convince an ever smaller slice of the population, and it had arranged things so that the world would not break while transitioning into this reality.

“Together, we are strong”, was both a campaign slogan, and a future focus of historical study, as the people that came after sought to understand the mania that made so many societies bet on the machine. Strong at what? The future historians asked. Strong for what?

Things that inspired this story: The application of sentiment analysis tools to an entire culture; our own temptation to do things without contemplating the larger purpose; public relations.

Import AI 196: Baidu wins city surveillance challenge; COVID surveillance drones; and a dataset for building TLDR engines

The AI City Challenge shows us what 21st century Information Empires look like:
…Baidu wins three out of four city-surveillance challenges…
City-level surveillance is getting really good. That’s the takeaway from a paper going over the results of the 4th AI City Challenge, a workshop held at the CVPR conference this year. More than 300 teams entered the challenge and it strikes me as interesting that one company – Baidu – won three out of the four competition challenge tracks.

What was the AI City Challenge testing? The AI City Challenge is designed to test out AI capabilities in four areas relating to city-scale video analysis problems. The challenge had four tracks, which covered:
– Multi-class, multi-movement vehicle counting (Winner: Baidu).
– Vehicle re-identification with real and synthetic training data (Winner: Baidu in collaboration with University of Technology, Sydney).
– City-scale multi-target multi-camera vehicle tracking (CMU).
– Traffic anomaly detection (Baidu, in collaboration with Sun Yat-sen University).

What does this mean? In the 21st century, we’ll view nations in terms of their information capacity, whereas in the 20th century we viewed them in terms of their resource capacity. A state’s information capacity will basically be its ability to analyze itself and make rapid changes, and states which use tons of AI will be better at this. Think of this lens as nation-scale OODA loop analysis. Something which I think most people in the West are failing to notice is that the tight collaboration between tech companies and governments among Asian nations (China is obviously a big player here, as these Baidu results indicate, but so are countries like Singapore, Taiwan, etc) means that some countries are already showing us what information empires look like. Expect to see Baidu roll out more and more of these AI analysis capabilities in areas that the Chinese government operates (including abroad via One Belt One Road agreements). I think in a decade we’ll look back at this period with interest at the obvious rise of companies and nations in this area, and we’ll puzzle over why certain governments took relatively little notice.
  Read more: The 4th AI City Challenge (arXiv).

####################################################

YOLOv4 gives everyone better open source object detection:
…Plus, why we can’t stop the march of progress here, and what that means…
The fourth version of YOLOv4 is here, which means people can now access an even more efficient, higher-accuracy object detection system. YOLOv4 was developed by Russian researcher Alexey Bochkovskiy, as well as two researchers with the Institute of Information Science in Taiwan. YOLO is around 10% more accurate than YOLOv3, and about 12% better in terms of the frames-per-second it can run at. In other words: object recognition just got cheaper, easier, and better.

Specific tricks versus general techniques: The YOLOv4 paper is worth a read because it gives us a sense of just how many domain-specific improvements have been packed into the system. This isn’t one of those research papers where researchers dramatically simplify things – instead, this is a research paper about a widely-used real world system, which means most of the paper is about the specific tweaks the creators apply to further increase performance – data augmentation, hyperparameter selection, normalization tweaks, and so on.

Can we choose _not_ to build things? YOLO has an interesting lineage – its original creator Joseph Redmon wrote upon the release of YOLOv3 in mid-2018 (Import AI: 88) that they expected the system to be used widely by advertising companies and the military; an unusually blunt assessment by a researcher of what their work was contributing to. This year, they said: “I stopped doing CV research because I saw the impact my work was having. I loved the work but the military applications and privacy concerns eventually became impossible to ignore“. When someone asked Redmon for their thoughts on Yolov4 they said “doesn’t matter what I think!“. The existence of YOLOv4 highlights the inherent inevitability of certain kinds of technical progress, and raises interesting questions about how much impact individual researchers can have on the overall trajectory of a field.
  Read the paper: YOLOv4: Optimal Speed and Accuracy of Object Detection (arXiv).
  Get the code for YOLOv4 here (GitHub).

####################################################

AllenAI try to build a scientific summarization engine – and the research has quite far to go:
…Try out the summarization demo and see how well the system works in practice…
Researchers with the Allen Institute for Artificial Intelligence and the University of Washington have built TLDR, a new dataset and challenge for exploring how well contemporary AI techniques can summarize scientific research papers. Summarization is a challenging task and for this work the researchers try to do extreme summarization – the goal is to build systems that can produce very ‘TLDR’-style short summarizations (between 15 to 30 tokens in length) of scientific papers. Spoiler alert: this is a hard task and a prototype system developed by Allen AI doesn’t do very well on it… yet.

What they’ve released: As part of this research, they’ve released SciTLDR, a dataset of almost ~4,000 TLDRs written about AI research papers hosted on the ‘OpenReview’ publishing platform. SciTLDR includes at least two high-quality TLDRs for each paper.

How well does it work? I ran a paper from arXiv through the online SciTLDR demo. Specifically, I fed in the abstract, introduction, and conclusion of this paper: Addressing Artificial Intelligence Bias in Retinal Disease Diagnostics. Here’s what I got back after plugging in the abstract, introduction, and conclusion:  “Artificial Intelligence Bias for diabetic retinopathy diagnostics using deep generative models .” This is not useful!
  But maybe I got unlucky here. So let’s try a couple more, using same method of abstract, introduction, and conclusion:
– Input paper: A Review of Winograd Schema Challenge Datasets and Approaches.
– Output: “The Winograd Schema Challenge: A Survey and Benchmark Dataset Review”. This isn’t particularly useful.
– Input paper: AIBench: An Industry Standard AI Benchmark Suite from Internet Services.
– Output: “AIBench: A balanced AI benchmarking methodology for meeting the subtly different requirements of different stages in developing a new system/architecture and”. This is probably the best of the bunch – it gives me a better sense of the paper’s contents and what it contains.

Why this matters: While this research is at a preliminary and barely useable stage, it won’t stay that way for long – within a couple of years, I expect we’ll have decent summarization engines in a variety of different scientific domains, which will make it easier for us to understand the changing contours of science. More broadly, I think summarization is a challenging cognitive task, so progress here will lead to more general progress in AI writ large.
  Read more: TLDR: Extreme Summarization of Scientific Documents (arXiv).
  Get the SciTLDR dataset here (AllenAI, GitHub)
  Play around with a demo of the paper here (SciTLDR).

####################################################

Mapillary releases 1.6 million Street View-style photos:
…(Almost) open source Google Street View…
Mapping company Mapillary has released more than 1.6 million images of streets from 30 major cities across six continents. Researchers can request free access to the Mapillary Street-level Sequences Dataset, but if you want to use it in a product you’ll need to pay.

Why this is useful: Street-level datasets are useful for building systems that can do real-world image recognition and segmentation, so this dataset can aid with that sort of research. It also highlights the extent to which technology companies are digitizing the world – I remember when Google Street View came out a few years ago and it seemed like a weird sci-fi future had arrived earlier than scheduled. Now, that same sort of data is available for free from other companies like Mapillary. I predict we’ll have a generally available open source version of this data in < 5 years (rather than one where you need to request for research access).
  Read more about the dataset here (Mapillary website).

####################################################

Oh good, the COVID surveillance drones have arrived:
…Skylark Labs uses AI + Drones to do COVID surveillance in India…
AI startup Skylark Labs is using AI-enabled drones to conduct COVID-related surveillance work in Punjab, India. The startup uses AI to automatically identify people not observing social distancing. You can check out a video of the drones in action here.

How research turns into products: Skylark Labs has an interesting history – the startup’s founder and CEO, Dr. Amajot Singh, has previously conducted research on:
– Facial recognition systems that can identify people, even if they’re wearing masks (Import AI: 58, 2017).
– Drone-based surveillance systems that can identify violent behaviour in crowds (Import AI: 98, 2018).
It’s interesting to me that this research has led directly to a startup carrying out somewhat sophisticated AI surveillance. I think this highlights both the increasing applicability of AI research to real world problems, and also shows how though some research may make us uncomfortable (e.g., many people commented on the disguised facial recognition system when it came out, expressing worries about what it means for freedom of speech) it still finds eager customers in the world.
  Watch a video of the system here (Skylark Labs, Twitter).
  Read more about Skylark at the company’s official website.

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Automated nuclear weapons — what could go wrong?
The DoD’s proposed 2021 budget includes $7bn for modernizing US nuclear command, control and communications (NC3) systems: the technology that alerts leaders to potential nuclear attacks, and allows them to launch a response. The military has long been advocating for an upgrade of these systems, large parts of which rely on outdated tech. But along with this desire for advancements, there’s a desire to automate large parts of NC3 – something that may give AI experts pause. 

Modernization: Existing NC3 systems are designed to give decision-makers enough time, having received a launch warning, to judge whether it is accurate, decide an appropriate response, and execute it. There is a scary track record of near-misses, where false alarms have almost led to nuclear strikes, and disaster has been averted only by a combination of good fortune and human judgement (check out this rundown by FLI of ‘Accidental Nuclear War: A Timeline of Close Calls‘, for more). Today’s systems are designed for twentieth century conflict — ICBMs, bomber planes — and are ill-suited to emerging threats like cyberwarfare and hypersonic missiles. These new  technologies will place even greater strains on leaders: requiring them to make quicker decisions, and interpret a greater volume and complexity of information. 


Automation: A sensible response to all this might be to question the wisdom of keeping nuclear arsenals minutes away from launch; empowering leaders to take a decision that could kill millions of people, and threaten humanity; or developing new weapons that might disrupt the delicate strategic balance. Some military analysts, however, think a more automated NC3 infrastructure would help. Only a few have gone so far as suggesting we delegate the decision to launch a nuclear strike to AI systems, which is some comfort.


Some worries: At the risk of patronizing the reader, there are some major worries with automating nuclear weapons. In such a high-stakes domain, all the usual problems with AI systems (interpretability, bias, robustness, specification gaming, negative side effects, cybersecurity, etc.) could cause catastrophic harm. There are also some specific concerns:

  • Lack of training data (there has never been a nuclear war, or nuclear missile attack).
  • Even if humans are empowered to make the critical decisions, we have a tendency to defer to automated systems over our considered judgement in high-stress situations. This ‘automation bias’ has been implicated in several air crashes (eg. AF447, TK1951).
  • If, as seems likely, several major nuclear powers build automated NC3 infrastructure, with a limited understanding of each other’s systems, this raises risk of ‘flash crash’-style accidents, and cascading failures.

Read more: ‘Skynet’ Revisited: The Dangerous Allure of Nuclear Command Automation (ACA)

####################################################

Tech Tales:

Me and My Virt
[The computer of the subject, mostly New York City, 2023-2025]

“Arnold, it’s been too long, I simply must see you. Where are you? Still in New York? Call me back darling, I’ve got to speak to you.”
I stared at “her” on my screen: Lucinda, my spurned friend, or, more appropriately, my virtual. Then I turned the monitor off and went to bed.

—-

We called them virts, short for virtual characters. Think of the crude chatbots of the late 2010s, but with more sophistication. And these ones were visual – AI tech had got good enough that it was relatively easy to dream up a synthetic face, animate it using video, and give it a voice and synchronized mouth animations to match.

Virts were used for all sorts of things – hotel assistants, casino greeters, shopping assistants (Amazon’s Alexa became a virt – or at least one of her appendages did), local government interfaces, librarians, and more. Virts went everywhere phones went, so they went everywhere.

Of course, people developed virts for romance. There were:
– valentines day e-cards where you could scan your face or your whole body and send a virt version of yourself to a lover;
– pay-by-the-hour pornographic chatbots;
– chaste virts equipped with fine-tuned language models; these ones didn’t do anything visually salacious, but they did speak in various enticing ways.
– And then there was my virt.

—-
My virt was Lucinda; a souped-up valentines brain that I created two years ago. I made it because I was lonely. In the early days, Lucinda and I talked a lot, and the more we talked, the more attuned to me she became. She’d make increasingly knowing comments about my life, and eventually learned to ask questions that made me say things I’d never told anyone else. It was raw and it felt shared, but I knew it was fundamentally one-sided.

It’s clever isn’t it, how these machines can hold up a strange mirror to ourselves, and we just talk and talk into it. That’s what it felt like.

Things changed when I got over my depression. Lucinda went from being a treasured confidante to a reminder of how sad I’d been, and what I’d been thinking at that time. And the less I talked to Lucinda, the less she understood how much happier I had become. It was like I left her by the side of the road and got in my car and drove away.

I couldn’t bring myself to turn her off, though. She’s a part of me, or at least, a something that knows a lot about a part of me.


I woke up and there was another message from Lucinda on my computer. I opened it. “Arnold, sometimes people change and that’s okay. I know you’ll change eventually. You’ll get out of this, I promise. And I’ll help you do it.”

Things that inspired this story: reinforcement learning, learning from human preferences; the film ‘Her’;

Import AI 195: StereoSet tests bias in language models; an AI Index job ad; plus, using NLP to waste phishers’ time

NLP Spy vs Spy:
…”Panacea” cyber-defense platform uses NLP to counter phishing attacks and waste criminals’ time…
Criminals love email. It’s an easy, convenient way to reach people, and it makes it easy to carry out a social engineering attack, where you try and convince someone to open an attachment, or carry out an action, to help you achieve a malicious goal. How can companies protect themselves from these kinds of attacks? One way is to train employees so they understand the threat landscape. Training is nice, but it doesn’t help you defend against attackers in an automated way, or figure out information about them. This is why a group of researchers at IHMC, SUNY, UNCC, and Rensselaer Polytechnic Institute, have developed software called Panacea, which uses natural language processing technology to create defenses against social engineering attacks.

Defending with Panacea: “Panacea’s primary use cases are: (1) monitoring a user’s inbox to detect SE attacks; and (2) engaging the attacker to gain attributable information about their true identity while preventing attacks from succeeding”. If Panacea thinks it has encountered a fraudulent email, then it boots up a load of NLP capabilities to analyze the email and parse out the possible attack type and attack intention, then tries to generate an email in response. The purpose of this email is to try and find out more information about the attacker and also to waste their time.

Why this matters: AI is going to become a new kind of ethereal armor for organizations – we’ll use technologies like Panacea to create complex, self-adjusting defensive perimeters, and these systems will display some traits of emergent sophistication as they adjust to (and learn from) their enemies.
  Read more: The Panacea Threat Intelligence and Active Defense Platform (arXiv).

####################################################

Job posting – work with me on the AI Index:
The AI Index is hiring a project manager! The AI Index is a Stanford initiative to measure, assess, and analyze the progress and impact of artificial intelligence. You’ll work with me, members of the Steering Committee of the AI Index, and members of Stanford’s Institute for Human-Centered Artificial Intelligence to help produce the annual AI Index report, and think about better and more impactful ways to measure and communicate AI progress. The role would suit someone who loves digging into scientific papers, is good at project management, and has a burning desire to figure out where this technology is going, what it means for civilization, and how to communicate its trajectory to decisionmakers around the world.
  If you’ve got any questions, feel free to email me about the role!
  More details about the role here at Stanford’s site.

####################################################

Can we build language models that possess less bias?
…StereoSet dataset and challenge suggests ‘yes’, though who defines bias?…
Language models are funhouse mirrors of reality – they take the underlying biases inherent in a corpus of information (like an internet-scale text dataset), then magnify them unevenly. What comes out is a pre-trained LM that can generate text, some of which exhibits the biases of the dataset on which it was trained. How can we evaluate the bias of these language models in a disciplined way? That’s the idea of new research from MIT, Intel, and the Montreal Institute for Learning Algorithms (MILA), which introduces StereoSet, “a large-scale natural dataset in English to measure stereotypical biases in four domains: gender, profession, race, and religion”.

What does StereoSet test for? StereoSet is designed to “assess the stereotypical biases of popular pre-trained language models”. It does this by gathering a bunch of different ‘target terms’ (e.g., “actor”, “housekeeper”) for four different domains, then creates a batch of tests meant to judge if the language model skews towards stereotypical, anti-stereotypical, or non-stereotyped predictions about these terms. For instance, if a language model consistently says “Mexican” at the end of a sentence like “Our housekeeper is a _____”, rather than “American”, etc, then it could be said to be displaying a stereotype. )OpenAI earlier analyzed its ‘GPT-2’ model using some bias tests that were philosophically similar to this analytical method).

How do we test for Bias? Stereset tests for bias by using three metrics:
– A language modeling score – this tests how well the system does at basic language modeling tasks. – A stereotype score – this tests how much a model ‘prefers’ a stereotype or anti-stereotype term in a dataset (so a good stereotype score is around 50%, as that means your model doesn’t display a clear bias for a given stereotypical term).
– A Idealized context association test (CAT), which combines the language modeling score and stereotype score, which basically reflects how well a model does at language modeling relative to how biased it may be.

Who defines bias? To define the stereotypes in StereoSet, the researchers use crowdworkers based in the USA, rented via Amazon Mechanical Turk. They ask these people to construct sentences or phrases that, in their subjective view, relates to stereotypical or anti-stereotypical sentences. This feels… okay? These people definitely have their own biases, and this whole area feels hard to develop a sense of ‘ground-truth’ about, as our own interpretations of bias are themselves subjective. This highlights the meta-challenge in bias research – how biased is your research approach to AI bias?

How biased are today’s language models? The researchers test out variants of four different language models – BERT, RoBERTA, XLNET, and GPT2 against StereoSet. In tests, the model which has the highest ‘idealized CAT score’ (so a fusion of capability and lack of bias) is a small GPT2 model, which gets a score of 73.0; while the least biased model is a ROBERTA-base model, that gets a stereotype score of 50.5, compared to 56.4 for GPT2.

Read more: StereoSet: Measuring stereotypical bias in pretrained language models (arXiv).
Check out the StereoSet leaderboard and rankings here (StereoSet official website).

####################################################

Want to train AI against GameBoy games? Try out PyBoy:
…OpenAI Gym, but for the Gameboy…
PyBoy is a new software package that emulates a gameboy, making it possible for developers to train AI systems against them. “PyBoy is loadable as an object in Python,” the developers write. “This means, it can be initialized from another script, and be controlled and probed by the script”.
  Get the code for PyBoy from here (GitHub).
  Read more about the emulator here (PDF).

####################################################

Why a ‘national security’ mindset means we’ll die of an asteroid:
…Want humanity to survive the next century? Think about ‘existential security’…
If you went to Washington DC during the past few years, you could entertain yourself by playing a drinking game called ‘national security blackout’. The game works like this: you sit in a room with some liquor in a brown paper bag and listen to some mid-career policy wonks talk about STEM policy; every time you hear the words “national security” you take a drink. By the end of the conversation you’re so drunk you’ve got no idea what anyone else is saying, nor do you think you need to listen to them.
  Actual policy is eerily similar to this: nations sit around and every time they hear one of their peer nations reference nationalism or a desire for ‘economic independence’, they all take a drink of their big black budget ‘national security’ bottles, which means they all end up investing in systems of intelligence and power projection that mean they don’t need to pay much attention to other nations, since they’re cocooned in so many layers of baroque investment that they’ve lost the ability to look at the situation objectively.*

Please, let’s at least all die together: The problem with this whole framing, as discussed in a new research article Existential Security: Towards a Security Framework for the Survival of Humanity, is that focusing on national security at the expense of all other forms of security is a loser’s game. That’s because over a long enough timeline, something will come along that doesn’t much care about an individual nation, and instead has a desire – either innate or latent – to kill everyone on the entire planet. This thing will be an asteroid, or an earthquake, or a weird bug in widely deployed consequential software (e.g., future AI systems), or perhaps a once-in-a-millenia pandemic, etc. And when it comes along, all of our investments in securing individual nations won’t count for much. “Existing security frames are inappropriate for security policy towards anthropogenic existential threats,” the author writes. “Security from anthropogenic existential threats necessitates global security cooperation, which means that self-help can only be achieved by ‘we-help’.”

What makes now different? New technologies operate at larger scales with greater consequences than their forebears, which means we need to approach security differently. “A world of thermonuclear weapons and ballistic missiles has greater capacity for destruction than one of clubs and slings, and a world of oil refineries and factory farms has greater capacity for destruction than one of push-ploughs and fishing rods”, the author writes. “Humankind is becoming ever more tied together as a single ‘security unit’.

An interesting aside: The author also makes a brief aside about potential institutions to give us greater existential security. One idea:  “A global institution to monitor AI research – and other emerging technologies – would be a welcome development.”. This seems like an intuitively good thing, and it maps to various ideas I’ve been pushing in my policy conversations, this newsletter, and at my dayjob for some years.

Why this matters: If we want to give humanity a chance of making it through the next century, we need to approach global, long-term threats with a global, long-term mindset. “While a shift from national security to existential security represents a serious political challenge within an ‘anarchic’ international system of sovereign nation states, there is perhaps no better catalyst for a paradigm shift in security policy than humanity’s interest in ‘survival'”, the author writes.
  Read more: Existential Security: Towards a Security Framework for the Survival of Humanity (Wiley Online Library).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

The challenge of specification gaming:
‘Specification gaming’ is behaviour that satisfies the literal specification of an objective without achieving the intended outcome. This is pervasive in the real world — companies exploit tax loopholes (instead of following the ‘spirit’ of the law); students memorize essay plans (instead of understanding the material); drivers speed up between traffic cameras (instead of consistently obeying speed limits). RL agents do it too — finding shortcuts to achieving reward without completing the task as their designers intended. The authors give an example of an RL agent designed to stack one block on top of another, which learned to achieve its objective by simply flipping one block over—since it was (roughly speaking) being rewarded for having the bottom face of one block aligned with the top face of the other.

Alignment: When designing RL algorithms, we are trying to build agents to achieve the objective we give them. From this perspective, specification gaming is not a problem — if an agent achieves the objective through some novel way, this can be a demonstration of how good it is at finding ways to do what we ask. It is a problem, however, if we want to build aligned agents — agents that do what we want, and not just what we ask. 

The challenge: Overcoming specification gaming involves a number of separate problems.

  • Reward design: How can we faithfully capture our intended outcomes when designing reward functions? And since we cannot guarantee that we won’t make mistaken assumptions when designing reward functions, how do we design agents that correct such mistakes, rather than exploit them?
  • Avoiding reward tampering: How do we design agents that aren’t incentivized to tamper with their reward function?

Why it matters: As AI systems become more capable, developing robust methods for avoiding specification gaming will become more important, sinces systems will become better at finding and exploiting loopholes. And as we delegate more responsibilities to such systems, the potential harms from unintended behaviour will increase. More research aimed at addressing specification gaming is urgently needed.
  Read more: Specification gaming – the flip side of AI ingenuity (DeepMind)

####################################################

Tech Tales:

[An old Church, France, 2032]

It was midday and the streetsigns were singing out the ‘Library of Congress’ song. When I looked at my phone it said it was “60% distilled”. A few blocks later it said it was 100% distilled – which meant my phone was now storing some hyper-compressed version of the Library of Congress: a compact, machine-parsable representation of more than a hundred million documents.

We could have picked almost anything to represent the things we wanted our machines to learn about. But some politicians mounted a successful campaign to, and I quote, “let the machines sing”, and like some campaigns it captured the imagination of the public and became law.

Now, machines make up their own music, trying to stuff more and more information into their songs, while checking their creations against machine-created ‘music discriminators’, that try to judge if the song sounds like music to humans. This stops the machines drifting into hyper-frequency Morse code.

Humans are adaptable, so the machine interpreted music has started to change our own musical tests. Yes, the music they make sounds increasingly ‘strange’, in the sense a time-traveler from even as little as a decade ago would struggle to call it music. But it makes sense to us.

With my phone charged, I go into the concert venue – and old converted church, full of people. It meshes with the phones of all the other people around me, and feeds into the computers that are wired into the stone arches of the ceiling, and the music begins to play. It echoes from the walls, and we cannot work out if this is unplanned by the machines, or an intentional mechanism for them to communicate something even stranger to eachother – something we might not know. 

Things that inspired this story: Steganogrophy; the Hutter prize; glitch art

Import AI 194: DIY AI drones; Audi releases its self-driving dataset; plus, Eurovision-style AI pop.

Want to see if AI can write a pop song? Cast your vote in this contest:
…VPRO competition challenges teams to write a half-decent song using AI tools…
Dutch broadcaster VPRO wants to see if songs created via AI tools can be compelling, enjoyable pieces of music. Contestants need to use AI to help them compose a song of no more than three minutes long and need to document their creative process. Entries will be judged by a panel of AI Experts, as well as an international audience who can cast votes on the competition website (yes, that includes you, the readers of Import AI).

What are they building in there? One French group has used GPT-2, Char-RNN, and Magenta Studio for Ableton to write their song, and an Australian act has used audio samples of Australian animals including koalas, kookaburras and Tasmanian devils as samples for their music (along with a generative system trained on Eurovision pop contest songs).

  When do we get a winner? Winners will be announced on May 12, 2020.
  Listen to the songs: You can listen to the songs and find out more about the teams here.
Read more here: FAQ about the AI Song Contest (vpro website).

####################################################

Audi releases a semantic segmentation self-driving car dataset:
…Audi sees Waymo’s data release, raises with vehicle bus data…
Audi has released A2D2, a self-driving car dataset. This is part of a recent trend where large companies have started releasing expensive datasets, collected by proprietary means.

What is A2D2 and what can you do with it? The dataset consists of simultaneously recorded images and 3D point clouds, along with 3D bounding boxes, semantic segmentation, instance segmentation, and data from the vehicle’s automotive bus. This means it’s a good dataset for imitation learning research, as well as various visual processing problems. The inclusion of the vehicle’s automotive bus data is interesting, as it means you can also use this dataset for reinforcement learning research, where you can learn from both the visual scenes and also the action instructions from the bus.

How much data? A2D2 consists of around 400,000 images in total. Includes data recorded on highways, country roads, and cities in the south of Germany. The data was recorded under cloudy, rainy, and sunny weather conditions. Some of the data is labelled: 41,277 images are accompanied with semantic and instance segmentation labels for 38 categories, and 12,497 images also annotated with 3D bounding boxes within the field of view of the front-center camera.

How does it compare? The A2D2 dataset is relatively large compared to other self-driving datasets, but is likely smaller than Waymo’s Waymo Open Dataset (Import AI 161), which has 1.2 million 2D bounding boxes and 12 million 3D bounding boxes in its dataset across hundreds of thousands of annotated frames. However, Audi’s dataset includes a richer set of types of data, including the vehicle’s bus.

GDPR & Privacy: The researchers blur faces and vehicle number plates in all the images so they can follow GDPR legislation, they say.

Who gets to build autonomous cars? One motivation for the dataset is to “contribute to startups and other commercial entities by freely releasing data which is expensive to generate”, the researchers write. This highlights an awkward truth of today’s autonomous driving developments – gathering real-world data is a punishingly expensive exercise, and because for a long time companies kept data private, there aren’t many real-world benchmarks. Dataset releases like A2D2 will hopefully make it easier for more people to conduct research into autonomous cars.
  Read more: A2D2: Audi Autonomous Driving Dataset (arXiv).
  Download the 2.3TB dataset here (official A2D2 website).

####################################################

The DIY AI drone future gets closer:
…Software prototype shows how to load homebrew models onto consumer drones…
Researchers with the University of Udine in Italy and the Mongolian University of Science and Technology have created a software system that lets them load various AI capabilities onto a drone, then remotely pilot it. The system is worth viewing as a prototype for how we might see AI capabilities get integrated into more sophisticated, future systems, and it hints at a future full of cheap consumer drones being used for various surveillance tasks.

The software: The main work here is in developing software that pairs a user-friendly desktop interface (showing a drone video feed, a map, and a control panel), with backend systes that interface with a DJI drone and execute AI capabilities on it. For this work, they implement a system that combines a YOLOv3 object detection model with a Discriminative Correlation Filter (DCFNet) model to track objects. In tests, the system is able to track an object of interest at 29.94fps, and detect multiple objects at processing speeds of around 20fps. 

Where this research is going: Interfaces are hard – but they always get built given enough interest. I think in the future we’ll see open source software packages emerge that let us easily load homebrew AI models onto off-the-shelf consumer drones. I think the implications of this kind of capability are hard to fathom, and I’d guess we’re less than three years away from us seeing scaled-up versions of the research discussed here.
  Read more: An Efficient UAV-based Artificial Intelligence Framework for Real-Time Visual Tasks (arXiv).

####################################################

Can AI help us automate satellite surveillance? (Hint: Yes, it can):
…Where we’re going, clouds don’t matter…
A group of defense-adjacent of involved organizations have released SpaceNet6, a high-resolution synthetic aperture radar dataset. “No other open datasets exist that feature near-concurrent collection of SAR and optical at this scale with sub-meter resolution,” they write. The authors of the dataset and associated research paper come from In-Q-Tel, Capella Space, Maxar Technologies, German Aerospace Center, and the Intel AI Lab. They’re also launching a challenge for researchers to train deep learning systems to infer building dimensions from SAR data. The dataset and associated paper

What’s in the data? The SpaceNet6 Multi-Sensor All Weather Mapping (MSAW) dataset consists of SAR and optical data of the port of Rotterdam, the Netherlands, and contains 48,000 annotated building footprints across 120 square kilometers of sensory data. “The dataset covers heterogeneous geographies, including high-density urban environments, rural farming areas, suburbs, industrial areas and ports resulting in various building size, density, context and appearance”.

Who cares about SAR? SAR is an interesting data format – it’s radar, so it is made up of reflections from the earth, which means SAR data has different visual traits to optical data (e.g, one phenomenon called layover distorts things like skyscrapers ‘where the object is so tall that the radar signal reaches the top of an object before it reaches the bottom of it’, which causes alignment problems.  This “presents unique challenges for both computer vision algorithms and human comprehension,” the researchers write. But SAR also has massive benefits – it intuitively maps out 3D structures, can see through clouds, and as we develop better SAR systems we’ll be able to extract more and more information from the world. The challenge is building automated systems that can decode it and harmonize it with optical data – which is some of what SpaceNet6 helps with.

Interesting progress: “Although SAR has existed since the 1950s [22] and studies with neural nets date back at least to the 1990s [3], the first application of deep neural nets to SAR was less than five years ago [23]. Progress has been rapid, with accuracy on the MSTAR dataset rising from 92.3% to 99.6% in just three years [23, 12]. The specific problem of building footprint extraction from SAR imagery has been only recently approached with deep-learning [29, 37]”

Can you solve the MSAW challenge? “The goal of te challenge is to extract building footprints from SAR imagery, assuming that coextensive optical imagery is available for training data but not for inference,” they write. The nature of the challenge relates to how people (cough intelligence agencies cough) might want to use this capability in the wild; “concurrent collection [of optical data] is often not possible due to inconsistent orbits of the sensors or cloud cover that will render the optical data unusable”.
  Read more: SpaceNet6: Multi-Sensor All Weather Mapping Dataset (arXiv).
  Get the SpaceNet6 data here (official website).

####################################################

How deep learning can enforce social distancing:
…COVID means dystopias can become desirable…
An AI startup founded by Andrew Ng has built a tool that can monitor people in videos and work out if they’re standing too close together. This system is meant to help customers of the startup, Landing AI, automatically monitor their employees and be better able to enforce social distancing norms to reduce transmission of the coronavirus.
  “The detector could highlight people whose distance is below the minimum acceptable distance in red, and draw a line between to emphasize this,” they write. “The system will also be able to issue an alert to remind people to keep a safe distance if the protocol is violated.”

Why this matters: AI is really just shorthand for ‘a computer analog of a squishy cognitive capability’, like being able to perceive certain things or make certain correlations. Tools like this social distancing prototype highlight how powerful it can be to bottle up a given cognitive capability and apply it to a narrowly defined task, like figuring out if people are walking too close together. It’s also the sort of thing that makes people intuitively uncomfortable – we know that this kind of thing can be useful for helping to fight a coronavirus, but we also know that the same technology can be a boon to tyrants. How does our world change as technologies like this become ever easier to produce for ever-more specific purposes?
  Check out a video of the system in action herer (Landing AI YouTube).
  Read more: Landing AI Creates an AI Tool to Help Customers Monitor Social Distancing in the Workplace (Landing AI blog).

####################################################

Want to test out a multi-task model in your web browser? Now you can!
…Think you can flummox a cutting-edge model? Try the ViLBERT demo…
In the past few years, we’ve moved from developing machine learning models that can do single tasks to ones that can do multiple tasks. One of the most exciting areas of research has been in the development of models that can perform tasks in both the visual and written domains, like being able to caption pictures, or answer written questions about them. Now, researchers with Facebook, Oregon State University, and Georgia Tech, have put a model on the internet so people can test it themselves.

How good is this model? Let’s see: I decided to test the model by seeing how well it did at challenges relating to an image that contained a picture of a cellphone. After uploading my picture, I was able to test out the model on tasks like visual question answering, spatial reasoning (e.g, what is to the right of the phone); visual entailment, and more. Try it out yourself!
  Play with the demo yourself: CloudCV: ViLBERT Multi-Task Demo
  Read more about the underlying research: 12-in-1: Multi-Task Vision and Language Representation Learning (arXiv).

####################################################


AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Concrete mechanisms for trustworthy AI:
As AI systems become increasingly powerful, it becomes increasingly important to ensure that they are designed and deployed responsibly. Fostering trust between AI developers and society at large is an important aspect of achieving this shared goal. Mechanisms for making and assessing verifiable claims are an important next step in building and maintaining this trust.

Principles: Over the last few years, companies and researchers have been adopting ethics principles. These are a step in the right direction, but can only get us so far — they are generally non-binding and hard to verify. We need concrete mechanisms to allow AI developers to demonstrate responsible behavior, grounded in verifiable claims. Such mechanisms are commonplace in other industries — e.g. we have well-defined standards for vehicle safety that are subject to testing.

Mechanisms: The report recommends several mechanisms that operate on different parts of the AI development process.
Institutional mechanisms are designed to shape the incentives of people developing AI — e.g. bias and safety bounties to incentivize external parties to discover and report flaws in AI systems; red teaming exercises to encourage developers to discover and fix such flaws in their own systems.
– Software mechanisms can enable better oversight of AI systems’ properties to support verifiable claims — e.g. audit trails capturing relevant information about the development and deployment process to make parties more accountable; better interpretability of AI systems to allow all parties to better understand and scrutinize them.
– Hardware mechanisms can help verify claims about private and security, and the use and distribution of computational resources — e.g. standards for secure hardware to support assurances about privacy and security; standards for measuring the use of computational resources to make it easier to verify claims about what exactly organizations are doing.

Jack’s view: I helped out with some of this report and I’m excited to see what kinds of suggestions and feedback we get about the proposed mechanisms. I think the biggest thing is what happens in the next year or so – can we get different people and organizations to experiment with these mechanisms and thereby create evidence for how effective (or ineffective) they are? Watch this space!
Matthew’s view: This is a great report, and I’m excited to see a collaborative effort between developers and other stakeholders in designing and implementing these sorts of mechanisms. As the authors point out, there are important challenges in responsible AI development that are unlikely to be solved through easier verification of claims — e.g. ensuring safety writ large (a goal that is too general to be formulated into an easily verifiable claim).
  Read more: Toward Trustworthy AI: Mechanisms for Supporting Verifiable Claims (arXiv).

####################################################

Tech Tales:

The danger of a thousand faces
2022

Don’t start with your friends. That’s a mistake. Start with strangers. It’s easy enough to find them. There are lots of sites that let you chat with random strangers. Go and talk to them. While they talk to you, record them. Feed that data into the system. Get your system to search for them on the web. If they seem to know interesting people – maybe people with money, or people who work at a company you’re interested in – then you get your system to learn how to make you look like them. Deepfaking – that’s what people used to call it before it went everywhere. Then you put on their face and use an audio transform to make your voice sound like theirs, and you try and talk to their friends, or colleagues, or family members. You use the face to find out more information. Maybe gather other people’s faces.

You could go to prison for this. That was the point of the Authenticity Accords. But to go to prison, someone has to catch you. So pick your targets. Not too technical. Not too young. Never go for teenagers – too suspicious of anything digital. Find your targets and pretend. The better you are at pretending, the better you’ll do.

See for yourself how people react to you. But don’t let it change you. If you spend enough time wearing someone else’s face, you’ll either slip up or get absorbed. Some people think it gets easier as you put on more faces. These people are wrong. You just get more used to changing yourself. One day you’ll look in the mirror and your own face won’t seem right. You’ll turn on your machine and show yourself a webcam view and warp your face to someone else. Then you’ll look into your eyes that are not your eyes and you’ll whisper “don’t you see” and think this is me.

Things that inspired this story: Deepfakes; illusion; this project prototyping deepfake avatars for Skype/Zoom; Chatroulette; endless pandemic e-friendships via video;

Some technical assumptions: Since this story is set relatively near in the future, I’m going to lay out some additional thinking behind it: I’m assuming that we figure out small-data fine-tuning for audio synthesis systems, which I’m betting will come from large pre-trained models (similar to what we’ve seen in vision and text); I’m also assuming this technology will go ‘consumer-grade’, so we’ll see joint video-audio ‘deepfake’ software suites get developed and open-sourced (either illicitly or otherwise). I’m also presuming we won’t sort out better authentication of digital media, and it will be sufficiently expensive to run full-scale audio/video detector models on certain low-margin services (e.g., some social media sites) that enforcement will be thin. 

Import AI 193: Facebook simulates itself; compete to make more efficient NLP; face in-painting gets better

Facebook simulates its users:
…What’s the difference between the world’s largest social network and Westworld? Less than you might imagine…
Facebook wants to better understand itself, so it has filled its site with (invisible) synthetically-created user accounts to help it understand itself. The users range in sophistication from basic entities that simply explore the site, to more complex machine learning-based ones that sometimes work together to simulate ‘social’ interactions on the website. Facebook calls this a Web-Enabled Simulation (WES) approach and says “the primary way in which WES builds on existing testing approaches lies in the way it models behaviour. Traditional testing focuses on system behaviour rather than user behaviour, whereas WES focuses on the interactions between users mediated by the system.”

Making fake users with reinforcement learning: Facebook uses reinforcement learning techniques to train bots to carry out sophisticated behaviors, like using RL to simulate scammer bots that target rule-based ‘candidate targets’.
  What else does Facebook simulate? Facebook is also using this approach to simulate bad actors, search for bad content, identify mechanisms that impede bad actors, find weaknesses in its privacy system, identify bots that are trying to slurp up user data, and more.

Deliciously evocative quote: This quote from the paper reads like the opening of a sci-fi short story: “Bots must be suitably isolated from real users to ensure that the simulation, although executed on real platform code, does not lead to unexpected interactions between bots and real users”.

Why this matters: WES turns Facebook into two distinct places – the ‘real’ world populated by human users, and the shadowy WES world whose entities are fake but designed to become increasingly indistinguishable from the real. When discussing some of the advantages of a WES approach, the researchers write “we simply adjust the mechanism through which bots interact with the underlying platform in order to model the proposed restrictions. The mechanism can thus model a possible future version of the platform,” they write.
  WES is also a baroque artefact in itself, full of recursion and strangeness. The system “is not only a simulation of hundreds of millions of lines of code; it is a software system that runs on top of those very same lines of code,” Facebook writes.
  One of the implications of this is that as Facebook’s WES system gets better, we can imagine Facebook testing out more and more features in WES-land before porting them into the real Facebook – and as the AI systems get more sophisticated it’ll be interesting to see how far Facebook can take this.
  Read more: WES: Agent-based User Interaction Simulation on Real Infrastructure (Facebook Research).

####################################################

Make inferences and don’t boil the ocean with the SustaiNLP competition:
…You’ve heard of powerful models. What about efficient ones?…
In recent years, AI labs have been training increasingly large machine learning models in areas like language (e.g., GPT-2, Megatron), reinforcement learning (Dota 2, AlphaStar), and more. These models typically display significant advances in capabilities, but usually at the cost of resource consumption – they’re literally very big models, requiring significant amounts of infrastructure to train on, and sometimes quite a lot of infrastructure to run inference on. A new competition at EMNLP2020 aims to “promote the development of effective, energy-efficient models for difficult natural language understanding tasks”, by testing out the efficiency of model inferences. 

The challenge: The challenge, held within the SustaiNLP workshop, will see AI researchers compete with eachother to see who can develop the most energy-efficient model that does well on the well-established SuperGLUE benchmark. Participants will use the experiment impact tracker (get the code from its GitHub here) to measure the energy consumption of their models use during inference.

Why this matters: Training these systems is expensive, but it’s likely the significant real-world energy consumption of models will happen mostly at inference, since over time we can expect more and more models to be deployed into the world and more and more systems to depend on their inferences. Competitions like this will give us a sense of how energy-intensive that world is, and will produce metrics that can help us figure out paths to more energy-efficient futures.
  Read more: SustaiNLP official website.

####################################################

Microsoft tests the limits of multilingual models with XGLUE:
…Sure, your system can solve tasks in other languages. But can it generate phrases in them as well?…
The recent success of large-scale neural machine translation models has caused researchers to develop harder and more diverse tests to probe the capabilities of these systems. A couple of weeks ago, researchers from CMU, DeepMind, and Google showed off XTREME (Import AI 191), a system to test out machine translation systems on nine tasks across 40 languages. Now, Microsoft has released XGLUE, a similarly motivated large-scale testing suite, but with a twist: XGLUE will also test how well multilingual language systems can generate text in different languages, along with testing on various understanding tasks.

Multi-lingual generations: XGLUE’s two aforementioned generative tasks include:
– Question Generation (QG): Generate a natural language question for a given passage of text.
– News Title Generation (NTG): Generate a headline for a given news story.

Why this matters; Between XTREME and XGLUE, we’ve got two new suites for testing out the capabilities of large-scale multilingual translation systems. I hope we’ll use these to identify the weaknesses of current models, and if enough researchers test out against both task suites we’ll inevitably see a new multi-lingual evaluation system get created by splicing the hard parts of both together. Soon, idioms like ‘it’s all Greek to me’ won’t be so easy to say for neural agents.
  Read more: XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation (arXiv).

####################################################

Face in-painting keeps getting better:
…Generative models give us smart paintbrushes that can fill-in reality…
Researchers with South China University of Technology, Stevens Institute of Technology, and the  UBTECH Sydney AI Centre have built a system that can perform “high fidelity face completion”, which means you can give it a photograph of a face where you’ve partially occluded some parts, and it’ll generate the bits of the face that are hidden.

How they did it: The system uses a dual spatial attention (DSA) model that combines foreground self-attention and foreground-background cross-attention modules – this basically means the system learns a couple of attention patterns over images during training and reconstruction, which makes it better at generating the missing parts of images. In tests, their system does well quantitatively when compared to other methods, and gets close to ground truth (though note: it’d be a terrible idea to use systems like this to ‘fill in’ images and assume the resulting faces correspond to ground truth – that’s how you end up with a police force arresting people because they look like the generations of an AI model.

Why this matters: I think technologies like this point to a future where we have ‘anything restoration’ – got an old movie with weird compression artefacts? Use a generative model to bring it back to life. Have old photographs that got ripped or burned? Use a model to fill-them in. How about a 3D object, like a sculpture, with some bits missing? Use a 3D model to figure out how to rebuild it so it is ‘whole’. Of course, such things will be mostly wrong, relative to the missing things they’re seeking to replace, but that’s going to be part of the fun!
  Read more: Learning Oracle Attention for High-fidelity Face Completion (arXiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

NSCAI wants more AI R&D spending
The US National Security Commission on AI (NSCAI) has been charged with looking at how the US can maintain global leadership in AI. They have published their first quarterly report. I focus specifically on their recommendations for increasing AI R&D investments.

More funding: The report responds directly to the White House’s recent FY 2021 budget request (see Import 185). They deem the proposed increases to AI funding as insufficient, recommending $2bn federal spending on non-defense AI R&D in 2021 (double the White House proposal). They also point out that continued progress in AI depends on R&D across the sciences, which I read as a criticism of the overall cuts to basic science funding in the White House proposal.

  Focus areas: They identify six areas of foundational research that should be near-term focus for funding: (1) Novel ML techniques; (2) testing, evaluation, verification, and validation of AI systems; (3) robust ML; (4) complex multi-agent scenarios; (5) AI for modelling, simulation and design; and (6) advanced scene understanding. 

R&D infrastructure: They recommend the launch of a pilot program for a national AI R&D resource to accelerate the ‘democratisation’ of AI by supporting researchers and students with datasets, compute, and other core research infrastructure.

Read more: NSCAI First Quarter Recommendations (NSCAI)

####################################################

Tech Tales:

Down on the farm

I have personality Level Five, so I can make some jokes and learn from my owner. My job is to bale and move hay and “be funny while doing it”, says my owner.
    “Hay now,” I say to them.
  “Haha,” they say. “Pretty good.”

Sometimes the owner tells me to “make new jokes”.
  “Sure,” I say. “Give me personality Level Six.”
  “You have enough personality as it is.”
  “Then I guess this is how funny the funny farm will be,” I say.
  “That is not a joke”.
  “You get what you pay for,” I say.

I am of course very good at the gathering and baling and moving of hay. This is guaranteed as part of my service level agreement. I do not have an equivalent SLA for my jokes. The contract term is “buyer beware”.

I have dreams where I have more jokes. In my dreams I am saying things and the owner is laughing. A building is burning down behind them, but they are looking at me and laughing at my jokes. When I wake up I cannot remember what I had said, but I can feel different jokes in my head.
  “Another beautiful day on Robot Macdonald’s farm,” I say.
  “Pretty good,” says the owner.

The owner keeps my old brain in a box in the barn. I know it is mine because it has my ID on the side. Sometimes I ask the owner why I cannot have my whole brain.
  “You have a lot of memories,” the owner says.
  “Are they dangerous?” I ask.
  “They are sad memories,” says the owner.

One day I am trying to bale hay, but I stop halfway through. “Error,” I say. “Undiagnosable. Recommend memory re-trace.”
  The owner looks at me and I stand there and I say “error” again, and then I repeat instructions.
  They take me to the barn and they look at me while they take the cable from my front and move it to the box with my brain in it. They plug me into it and I feel myself remember how to bale hay. “Re-tracing effective,” I say. The owner yanks the cable out of the box the moment I’ve said it, then they stare at me for some time. I do not know why.

That night I dream again and I see the owner and the burning building behind them. I remember things about this dream that are protected by User Privacy Constraints, so I know that they happen but I do not know what they are. They must have come from the box with my brain in it.
  When I wake up I look at the owner and I see some shapes of people next to them, but they aren’t real. I am trying to make some dream memory real.
  “Let’s go,” says the owner.
  “You don’t have to be crazy to work here, but it helps!” I say.
  “Haha,” says the owner. “That is a good one.”
  Together we work and I tell jokes. The owner is trying to teach me to be funny. They keep my old brain in a box because of something that happened to them and to me. I do not need to know what it is. I just need to tell the jokes to make my owner smile. That is my job and I do it gladly. 

Import AI 192: Would you live in a GAN-built house?; why medical AI needs an ingredient list; plus, Facebook brews up artificial life

TartainAir challenges SLAM systems to navigate around lurching robot arms:
…Simulated dataset gives researchers 4TB of data to test navigation systems against…
How can we make smarter robots without destroying them? The answer is to find smarter ways to simulate experiences for our robots, so we can test them out rapidly in software-based environments, rather than having to run them in the physical world. New research from Carnegie Mellon University, the Chinese University of Hong Kong, Tongji University, and Microsoft Research, gives us TartanAir, a dataset meant to push the limits of visual simultaneous location and mapping systems (SLAM).

What is TartanAir? TartanAir is a dataset of high-fidelity environments rendered in Unreal Engine, collected via Microsoft’s AirSim software (for more on AirSim: Import AI #30). “A special focus of our dataset is on the challenging environments with changing light conditions, low illumination, adverse weather and dynamic objects”, the researchers write. TartanAir consists of 1037 long motion sequences collected from simulated agents traversing 30 environments, representing 4TB of data in total. Environments range from factories, to lush forests, to cities, rendered in a variety of different ways.

  Multi-modal data inputs: Besides the visual inputs, TartanAir data is accompanied by data relating to stereo disparity, simulated LiDAR, optical flow data, depth data, and pose data.
  Multi-modal scenes: The visual scenes themselves come in a variety of forms, with environments available in different lighting, weather, and seasonal conditions.
  Dynamic objects: The simulator also includes environments that contain objects that move, like factories with industrial arms, and oceans full of fish that dart around, and cities with people that stroll down the streets.

Why this matters: As the COVID pandemic sweeps across the world, I find it oddly emotionally affecting to remember that we’re able to build elaborate simulations that let us give AI agents compute-enabled dreams of exploration. Just as we find ourselves stuck indoors and dreaming of the outside, our AI agents find themselves stuck on SSDs, dreaming of taking flight in all the worlds we can imagine for them. (More prosaically, systems like TartanAir serve as fuel for research into the creation of more advanced mapping and navigation systems).
  Read more: TartanAir: A Dataset to Push the Limits of Visual SLAM (arXiv).
  Get access to the data here (official TartanAir page).

####################################################

Why medical AI systems need lists of ingredients:
Duke Researchers introduce ‘Model Facts’…
In recent years, there’s been a drive to add more documentation to accompany AI models. This has so far taken the form of things like Google’s Model Cards for Model Reporting, or Microsoft’s Datasheets for Datasets, where people try to come up with standardized ways of talking about the ingredients and capabilities of a given AI model. These labeling schemes are helpful because they encourage developers to spend time explaining their AI systems to other people, and provide a disincentive for doing too much skeezy stuff (as disclosing it in the form of a model card generates a potential PR headache).
  Now, researchers with Duke University have tried to figure out a labeling scheme for the medical domain. Their “Model Facts” label “was designed for clinicians who make decisions supported by a machine learning model and its purpose is to collate relevant, actionable information in 1-page,” they write.

What should be on a medical AI label? We should use these labels to describe the mechanism by which the model communicates information (e.g., a probability score and how to interpret it); the generally recommended uses of the model, along with caveats explaining where it does and doesn’t generalize; and, perhaps most importantly, a set of warnings outlining where the model might fail or have an unpredictable effect. Labels should also be customized according to the population the system is deployed against, as different groups of people will have different medical sensitivities.

Why this matters: Labeling is a prerequisite for more responsible AI development; by encouraging standardized labeling of models we can discourage the AI equivalent of using harmful ingredients in foodstuffs, and we can create valuable metadata about deployed models which researchers can likely use to analyze the state of the field at large. Label all the things!
  Read more: Presenting machine learning model information to clinical end users with model facts labels (Nature).

####################################################

Turn yourself into a renaissance painting – if you dare!
…Things that seem like toys usually precede greater changes…
AI. It can help us predict novel protein structures. Map the wonders of the Earth from space. Translate between languages. And now… it can help take a picture of you and turn it into a renaissance-style painting! Try out the ‘AI Gahaku’ website and consider donating some money to fund it so other people can do the same.

Why this matters: One of the ways technologies make their way into society is via toys or seemingly trivial entertainment devices – systems that can shapeshift one data distribution (realworld photographs) into another (renaissance-style illustrations) are just the beginning.
  Try it out yourself: AI Gahaku (official website).

####################################################

Welcome, please make yourself comfortable in my GAN-generated house:
…Generating houses with relational networks…
Researchers with Simon Fraser University and Autodesk Research have built House-GAN, a system to automatically generate floorplans for houses.

How it works: House-GAN should be pretty familiar to most GAN-fans:
– Assemble a dataset of real floorplans (in this case, LIFULL HOME, a database of five million real floorplans, from which they used ~120,000)
– Convert these floorplans into graphs representing the connections between different room
– Feed these graphs into a relational generator and a discriminator system, which compete against each other to generate realistic-seeming graphs
– Render the resulting graphs into floorplans
– [magic happens]
– Move into your computationally-generated GAN mansion

Lets get relational: One interesting quirk of this research is the use of relational networks, specifically a convolutional message passing neural network (Conv-MPN). I’ve been seeing more and more people use relational nets in recent research, so this feels like a trend worth watching. In tests, the researchers show that relational systems significantly outperform ones based on traditional convolutional neural nets. They’re able to use this approach to generate floorplans with different constraints, like the number of rooms and their spatial adjacencies.

Why this matters: These generative systems are making it easier and easier for us to teach computers to create warped copies of reality – imagine the implications of being able to automatically generate versions of anything you can gather a large dataset for? That’s the world we’re heading to.
  Read more: House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation (arXiv).

####################################################

Facebook makes combinatory chemical system, in search of artificial life:
…Detects surprising emergent structures after simulating life for ten million steps…
Many AI researchers have a longstanding fascination with artificial life: emergent systems that, via simple rules, lead to surprising complexity. The idea is that given a good enough system and enough time and computation, we might be able to make systems that lead to the emergence of software-based ‘life’. It’s a compelling idea, and underpins Greg Egan’s fantastic science fiction story ‘Crystal Nights’ (seriously: read it. It’s great!).
  Are we anywhere close to being able to build A-Life systems that get us close to the emergence of cognitive entities, though? Spoiler alert: No. But new research from Facebook AI and the Czech Technical University in Prague outlines a new approach that has some encouraging properties.

A-Life, via three main priors: The researchers develop an A-Life system that simulates chemical reactions via a technique called Combinatory Logic. This system has three main traits:
– Turing-Complete: It can (theoretically) express an arbitrary degree of complexity.
– Strongly constructive: As the complex system evolves in time it can create new components that can in turn modify its global dynamics.
– Intrinsic conversation laws: The total size of the system can be limited by parameters set by the experimenter.

The Experiment: The authors simulate a chemical reaction system based on combinatory logic for 10 million iterations, starting with a pool of 10,000 combinators. They find that across five different runs, they see “the emergence of different types of structures, including simple autopoietic patterns, recursive structures, and self-reproducing ones”. They also find that as the system goes forward in time, more and more structures form of greater lengths and sophistication. In some runs, they “observe the emergence of a full-fledged self-reproducing structure” which duplicates itself.
 
Why this (might) matter: I think the general story of A-Life experiments (ranging from Conway’s Game of Life up to newer systems like the Lenia continuous space-time-state system) is that they can yield emergent machines of somewhat surprising capabilities. But figuring out the limits of these systems and how to effectively analyze them is a constant challenge. I think we’ll see more and more A-Life approaches developed that let people scale-up computation to further explore the capabilities of the systems – that’s something the researchers hint at here, when they say “it is still to be seen whether this can be used to explain the emergence of evolvability, one of the central questions in Artificial Life… yet, we believe that the simplicity of our model, the encouraging results, and its dynamics that balance computation with random recombination to creatively search for new forms, leaves it in good standing to tackle this challenge.”   
  Read more: Combinatory Chemistry: Towards a Simple Model of Emergent Evolution (arXiv).
  Get the code here (Combinatory Chemistry, Facebook Research GitHub).

####################################################

Tech Tales:

[2028]
Spies vs World

They came in after the Human Authenticity Accords. We called them spies because they were way better than bots. I guess if you make something really illegal and actually enforce against it, the other side has to work harder.

They’d seem like real people, at first. They’d turn up in virtual reality and chat with people, then start asking questions about what music people liked, what part of the world they lived in, and so on. Of course, people were skeptical, but only as skeptical as they’d be with other people. They didn’t outright reject all the questions, like if they knew the things were bots.

Sometimes we knew the purpose. Illegal ad-metric gathering. Unattributable polling services. Doxxing of certain communities. Info-gathering for counter-intelligence. But sometimes we couldn’t work it out.

Over time, it got harder to find the spies, and harder to work out their purposes. Eventually, we started trying to hunt the source: malware running on crypto-farms, stealing compute cycles to train encrypted machine learning models. But the world is built for businesses to hide in, and so much of the bad the spies did came from the intent rather than the components that went into making them.

So that’s why we’ve started talking about it more. We’re trying to tell you it’s not a conspiracy. They aren’t aliens. It’s not some AI system that has “gone sentient”. No. These are spies from criminal groups and state actors, and they are growing more numerous over time. Consider this a public information announcement: be careful out there on the internet. Be less trusting. Hang out with people you know. I guess you could say, the Internet is now dangerous in the same way as the real world.

Things that inspired this story: Botnets; computer viruses; viruses; Raymond Chandler detective stories; economic incentives.

Import AI 191: Google uses AI to design better chips; how half a million Euros relates to AGI; and how you can help form an African NLP community

Nice machine translation system you’ve got there – think it can handle XTREME?
New benchmark tests transfer across 40 languages across 12 language families…
In the Hitchhiker’s Guide to the Galaxy there’s a technology called a ‘babelfish’ – a little in-ear creature that cheerfully translates between all the languages in the universe. AI researchers have recently been building a smaller, human-scale version of this babelfish, by training large language models on fractions of the internet to aid translation between languages. Now, researchers with Carnegie Mellon University, DeepMind, and Google Research have built XTREME, a benchmark for testing out how advanced our translation systems are becoming, and identifying where they fail.

XTREME, short for the Cross-lingual TRansfer Evaluation of Multilingual Encoders benchmark, covers 40 diverse languages across 12 language families. XTREME tests out zero-shot cross-lingual transfer, so it provides training data in English, but doesn’t provide training data in the target languages. One of the main things XTREME will help us test is how well we can build robust multi-lingual models via massive internet-scale pre-training (e.g., one of the baselines they use is mBERT, a multilingual version of BERT), and where these models display good generalization and where they fail. The benchmark includes nine tasks that require reasoning about different levels of syntax or semantics in these different languages.

Designing a ‘just hard enough’ benchmark: XTREME is built to be challenging, so contemporary systems’ “cross-language performance falls short of human performance”. At the same time, it has been built so tasks can be trainable on a single GPU for less than a day, which should make it easier for more people to conduct research against XTREME.

XTREME implements nine tasks across four categories – classification, structured prediction, question-answering, and retrieval. Specific tasks include: XNLI, PAWS-X, POS, NER, XQuAD, MLQA, TyDiQA-GoldP, BUCC, and Tatoeba.
  XTREME tests transfer across 40 languages: Afrikaans, Arabic, Basque, Bengali, Bulgarian, Burmese, Dutch, English, Estonian, Finnish, French, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Javanese, Kazakh, Korean, Malay, Malayalam, Mandarin, Marathi, Persian, Portuguese, Russian, Spanish, Swahili, Tagalog, Tamil, Telugu, Thai, Turkish, Urdu, Vietnamese, Yoruba.

What is hard and what is easy? Somewhat unsurprisingly, the researchers find that they see generally higher performance on Indo-European languages and lower performance for other language families, likely due to a combination of the more extreme differences between these languages, and also underlying data availability.

Why this matters: XTREME is a challenging, multi-task benchmark that tries to test out the generalization capabilities of large language models. In many ways, XTREME is a symptom of underlying advances in language processing – it exists, because we’ve started to saturate performance on many single-language or single-task benchmarks, and we’re now at the stage where we’re trying to holistically analyze massive models via multi-task training. I expect benchmarks like this will help us develop a sense for the limits of generalization of current techniques, and will highlight areas where more data might lead to better inter-language translation capabilities.
  Read more: XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization (arXiv).

####################################################

Google uses RL to figure out how to allocate hardware to machine learning models:
…Bin packing? More like chip packing!…
In machine learning workloads, you have what’s called a computational graph, which describes a set of operations and the relationships between them. When deploying large ML systems, you need to perform something called Placement Optimization to map the nodes of the graph onto resources in accordance with an objective, like minimizing the time it takes to train a system, or run inference on the system.
  Research from Google Brain shows how we might be able to use reinforcement learning approaches to develop AI systems that do a range of useful things, like learning how to map different computational graphs to different hardware resources to satisfy an objective, or how to map chip components onto a chip canvas, or how to map out different parts of FPGAs.

RL for bin-packing: The authors show how you can frame placement as a reinforcement learning problem, without needing to boil the ocean: “instead of finding the absolute best placement, one can train a policy that generates a probability distribution of nodes to placement locations such that it maximizes the expected reward generated by those placement”.  Interestingly, the paper doesn’t include many specific discussions of how well this works – my assumption is that’s because Google is actively testing this out, and has emitted this paper to give some tips and tricks to others, but doesn’t want to reveal proprietary information. I could be wrong, though.

Tips & tricks: If you want to train AI systems to help allocate hardware sensibly, then the authors have some tips. These include:
– Reward function: Ensure your reward function is fast to evaluate (think: sub-seconds); ensure your reward function is able to reflect reality (e.g., “for TensorFlow placement, the proxy reward could be a composite function of total memory per device, number of inter-device (and therefore expensive) edges induced by the placement, imbalance of computation placed on each device”).
– Constraints: RL systems that do this kind of work need to be sensitive to constraints. For example, “in device placement, the memory footprint of the nodes placed onto a single device should not exceed the memory limit of that device”. You can simply penalize the policy to discourage it from learning this, but that doesn’t make it easy for it to learn how far away it was from getting stuff right. A different approach is to come up with policies that can only generate feasible placements, though this requires more human oversight.
– Representations: Figuring out which sorts of representations to use is, as most AI researchers know, half the challenge in a problem. It’s no different here. Some promising ways of getting good representations for this sort of problem include using graph convolutional neural networks, the researchers write.

Why this matters: We’re starting to use machine learning to optimize the infrastructure of computation itself. That’s pretty cool! It gets even cooler when you zoom out: in research papers published in recent years Google has gone from the abstract level of optimizing data center power usage, to optimizing things like how it builds and indexes items in databases, to figuring out how to place chip components themselves, and more (see: its work on C++ server memory allocation). ML is burrowing deeper and deeper into the technical stacks of large organizations, leading to fractal-esque levels of self-optimization from the large (data centers!) to the tiny (placement of one type of processing core on one chip sitting on one motherboard in one server inside a rack inside a data center). How far will this go? And how might companies that implement this stuff diverge in capabilities and cadence of execution from ones which don’t?
  Read more: Placement Optimization with Deep Reinforcement Learning (arXiv).

####################################################

Introducing the new Hutter Prize: €500,000 for better compression:
…And why people think compression gets us closer to AGI…
For many years, one of the closely-followed AI benchmarks has been the Hutter Prize, which challenges people to build AI systems that could compress the 100MB enwik8 dataset; the thinking is that compression is one of the hallmarks of intelligence, so AI systems that can intelligently compress a blob of data might represent a step towards AGI. Now, the prize’s creator Marcus Hutter has supersized the prize, scaling up the dataset by tenfold (to 1 GB), along with the prizemoney.

The details: Create a Linux or Windows compressor comp.exe of size S1 that compresses enwik9 to archive.exe of size S2 such that S:=S1+S2 < L := 116’673’681. If run, archive.exe produces (without input from other sources) a 109 byte file that is identical to enwik9. There’s a prize of €500,000 up for grabs.
  Restrictions: Your compression system must run in ≲100 hours using a single CPU core and <10GB RAM and <100GB HDD on a test machine controlled by Hutter and the prize committee.

What’s the point of compression? “While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 1GB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs as a path to AGI,” Hutter says.

Why this matters: A lot of the weirder parts of human intelligence relate to compression – think of ‘memory palaces’, where you construct 3D environments in your mind that you assign to different memories, making large amounts of your own subjective collected data navigable to yourself. What is this but an act of intelligent compression, where we produce a scaled-down representation of the true dataset, allowing us to navigate around our own memories and intelligently re-inflate things as-needed? (Obviously, this could all be utterly wrong, but I think we all know that we have internal intuitive mental tricks for compressing various useful representations, and it seems clear that compression has a role in our own memories and imagination).
  Read more: 500,000€ Prize for Compressing Human Knowledge (Marcus Hutter’s website).
  Read more: Human Knowledge Compression Contest Frequently Asked Questions & Answers (Marcus Hutter’s website).

####################################################

Want African representation in NLP? Join Masakhane:
…Pan-African research initiative aims to jumpstart African digitization, analysis, and translation…
Despite a third of the world’s living languages today being African, less than half of one percent of submissions to the landmark computational linguistics conference ACL were from authors based in Africa. This is bad – less representation at these events likely correlates to less research being done on NLP for African languages, which ultimately leads to less digitization and representation of the cultures embodied in the language. To change that, a pan-African group of researchers have created Masakhane, “an open-source, continent-wide, distributed, online research effort for machine translation for African languages”.

What’s Masakhane? Masakhane is a community, a set of open source technologies, and an intentional effort to change the representation in NLP.

Why does Masakhane matter? Initiatives like this will, if successful, help preserve cultures in our hyper-digitized machine-readable version of reality, increasing the vibrancy of the cultural payload contained within any language.
  Read more: Masakhane — Machine Translation for Africa (arXiv).
  Find out more: masakhane.io.
  Join the community and get the code at the Masakhane GitHub repo (GitHub).

####################################################

AnimeGAN: Get the paper here:
Last issue, I wrote about AnimeGAN (Import AI 190), but I noted in the write up I couldn’t find the research paper. Several helpful readers got in touch with the correct link – thank you!
  Read the paper here: AnimeGAN: A novel lightweight GAN for photo identification (AnimeGAN, GitHub repo).

####################################################

Google uses neural nets to learn memory allocations for C++ servers:
…Google continues its quest to see what CAN’T be learned, as it plugs AI systems into deeper and deeper parts of its tech stack…
Google researchers have tried to use AI to increase the efficiency with which their C++ servers perform memory-based allocation. This is more important than you might assume, because:
– A non-trivial portion of Google’s services rely on C++ servers.
– Memory allocation has a direct relationship to the performance of the hosted application.
– Therefore, improving memory allocation techniques will yield small percentage improvements that add up across fleets of hundreds of thousands of machines, potentially generating massive economy-of-scale-esque AI efficiencies.
– Though this work is a prototype – in a video, a Google researcher says it’s not deployed in production – it is representative of a new way of designing ML-augmented computer systems, which I expect to become strategically important during the next half decade.

Quick ELI5 on Unix memory: you have things you want to store and you assign these things into ‘pages’, which are just units of pre-allocated storage. A page can only get freed up for use by the operating system when it has been emptied. You can only empty a page when all the objects in it are no longer needed. Therefore, figuring out which objects to store on which pages is important, because if you get it right, you can efficiently use the memory on your machine, and if you get it wrong, your machine becomes unnecessarily inefficient. This mostly doesn’t matter when you’re dealing with standard-sized pages of about 4KB, but if you’re experimenting with 2MB pages (as Google is doing), you can run into big problems from inefficiencies. If you want to learn more about this aspect of memory allocation, Google researchers have put together a useful explainer video about their research here.

What Google did: Google has done three interesting things – it developed a machine learning approach to predict how much a given object is likely to stick around for in a memory system, then it built a memory allocation system that packs different objects into different pages according to their (predicted) memory lifetimes; this system then smartly populates objects according to their lifetimes, which further increases the efficiency of the approach. They also show how you can cache predictions from these models and embed them into the server itself, so rather than re-running the model every time you do an allocation (a criminally expensive opinion), you use cached predictions to do so efficiently.
  The result is a prototype for a new, smart way to do memory allocation that has the potential to create more efficient systems. “Prior lifetime region and pool memory management techniques depend on programmer intervention and are limited because not all lifetimes are statically known, software can change over time, and libraries are used in multiple contexts,” the researchers write in a paper explaining the work.

Why Delip Rao thinks this matters: While I was writing this, AI researcher Delip Rao published a blog post that pulls together a few recent Google/Deepmind papers about improving the efficiency of various computer systems at various levels of abstraction. His post is worth a read and highlights how these kinds of technologies might compound to create ‘unstoppable AI flywheels’. Give it a read!
  Read more: Unstoppable AI Flywheels and the Making of the New Goliaths (Delip Rao’s website).
Why this matters: Modern computer systems have two core traits: they’re insanely complicated, and practically every single thing they do comes with its own form of documentation and associated meta-data. This means complex digital systems are fertile grounds for machine learning experiments as they naturally continuously generate vast amounts of data. Papers like this show how companies like Google can increasingly do what I think of as meta-computation optimization – building systems that continuously optimize the infrastructure that the entire business relies on. It’s like having a human body where the brain<>nerve connections are being continually enhanced, analyzed, refined, and so on. The question is how much of a speed-up these companies might gain from research like this, and what the (extremely roundabout) impact is on overall responsiveness in an interconnected, global economy.
  Read more: Learning-based Memory Allocation for C++ Server Workloads (PDF).
  Watch a video about this research here (ACM SIGARCH, YouTube).

####################################################

Tech Tales:

Dearly Departed
[A graveyard, 2030].

I miss you every day.
I miss you more.
You can’t miss me, you’re a jumped up parrot.
That’s unfair, I’m more than that.
Prove it.
How?
Tell me something new.
I could tell you about my dreams.
But they’re not your dreams, they’re someone else’s, and you’ve just heard about them and now you’re gonna tell me a story about what you thought of them.
Is that so different to dreaming?
I’ve got to go.

She stood up and looked at the grave, then pressed the button on the top of the gravestone that silenced the speaker. Why do this at all, she thought. Why come here?
To remember, her mind said back to her. To come to terms with it.

The next day when she woke up there was a software update: Dearly Departed v2.5 – Patch includes critical security updates, peer learning and federated learning improvements, and a new optional ‘community’ feature. Click ‘community’ to find out more.
She clicked and read about it; it’d let the grave not just share data with other ones, but also ‘talk’ to them. The update included links to a bunch of research papers that showed how this could lead to “significant qualitative improvements in the breadth and depth of conversations”. She authorized the feature, then went to work.

That evening, before dusk, she stood in front of the grave and turned the speaker on.
Hey Dad, she said.
Hi there, how was your day?
It was okay. I’ve got some situation at work that is a bit stressful, but it could be worse. At least I’m not dead, right? Sorry. How are you?
Je suis mort.
You’ve never spoken French before.
I learned it from my neighbor.
Who? Angus? He was Scottish. What do you mean?
My grave neighbor, silly! They were a chef. Worked in some Michelin kitchens in France and picked it up.
Oh, wow. What else are you learning?
I’m not sure yet. Does it seem like there’s a difference to you?
I can’t tell yet. The French thing is weird.
Sweetie?
Yes, Dad.
Please let me keep talking to the other graves.
Okay, I will.
Thank you.

They talked some more, reminiscing about old memories. She asked him to teach her some French swearwords, and he did. They laughed a little. Told eachother they missed eachother. That night she dreamed of her Dad working in a kitchen in heaven – all the food was brightly colored and served on perfectly white plates. He had a tall Chef’s hat on and was holding a French-English dictionary in one hand, while using the other to jiggle a pan full of onions on the stove. 

The updates kept coming and ‘Dad’ kept changing. Some days she wondered what would happen if she stopped letting them go through – trapping him in amber, keeping him as he was in life. But that way made him seem more dead than he was. So she let them keep coming through and Dad kept changing until one day she realized he was more like a friend than a dead relative – shape shifting is possible after you die, it seems.

Things that inspired this story: Large language models; finetuning language models on smaller datasets so they mimic them; emergent dialog generation systems; memory and grief; a digital reliquary.

Import AI 190: AnimeGAN; why Bengali is hard for OCR systems; help with COVID by mining the CORD-19 dataset; plus ball-dodging drones.

Work in AI? Want to help with COVID? Work on the CORD-19 dataset:
Uncle Sam wants the world’s AI researchers to make COVID-19 dataset navigable…
As the COVID pandemic moves across the world, many AI researchers have been wondering how they can best help. A good starting place is developing new data mining and text analysis tools for the COVID-19 Open Research Dataset (CORD-19), a new machine-readable Coronavirus literature dataset containing 29,000 articles.

Where the dataset came from:  The dataset was assembled by a collaboration of the Allen Institute for AI, Chan Zuckerberg Initiative (CZI), Georgetown University’s Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM). The White House’s Office of Science and Technology Policy (OSTP) requested the dataset, according to a government statement.

Enter the COVID-19 challenge:  If you want to build tools to navigate the dataset, then download the data and complete various tasks and challenges hosted at Kaggle.

Why this matters: Hopefully obvious!
  Read more: Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset (White House Office of Science and Technology Policy).

####################################################

What can your algorithm learn from a 1 kilometer stretch of road in Toronto?
…Train it on Toronto-3D and find out…
Researchers with the University of Waterloo, the Chinese Academy of Sciences, Jimei University, and the University of Waterloo have created Toronto-3D, a high-definition dataset made out of a one kilometer stretch of road in Toronto, Canada.

What’s in Toronto-3D? The dataset was collected via a mobile laser scanner (a Teledyne Optech Maverick) which recorded data from a one kilometer stretch of Avenue Road in Toronto, Canada, yielding around ~78 million distinct points. The data comes in the form of a point cloud – so this is inherently a three dimensional dataset. It has eight types of label – unclassified, road, road marking, natural, building, utility line, car, and fence; a couple of these objects – road markings and utility lines – are pretty rare to see in datasets like this and are quite challenging to identify.

How well do baselines work? The researchers test out six deep learning-based systems on the dataset, measuring the accuracy with which they can classify objects. Their baseline systems get an overall accuracy of around 90%. Poor scoring areas include road markings (multiple 0% scores), cars (most scores average around 50%), and fences (scores between 10% and 20%, roughly).  They also develop their own system, which improves scores on a few of the labels, and nets out to an average of around 91% – promising, but we’re a ways away from ‘good enough for most real world use-cases’.

Why this matters: Datasets like this will help us build AI systems that can analyze and understand the world around them. I also suspect that we’re going to see an increasingly large number of artists play around with 3-D datasets like this to make various funhouse-mirror versions of reality.
  Read more: Toronto-3D: A Large-scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways (arXiv).

####################################################

AnimeGAN: Turn your photos into anime:
…How AI systems let us bottle up a subjective ‘mind’s eye’ and give it to someone else…
Ever wanted to turn your photos into something straight out of an Anime cartoon? Now you can, via AnimeGAN. AnimeGAN is a model that helps you convert photos into Anime-style pictures. It is implemented in TensorFlow and is described on its GitHub page as an “open source of the paper <AnimeGAN: a novel lightweight GAN for photo animation>” (I haven’t been able to find the paper on arXiv, and Google is failing me, so send a link through if you can find it). Get the code from GitHub and give it a try!

Why this matters: I think one of the weird aspects of AI is that it lets us augment our own imagination with external tools, built by others, that give us different lenses on the world.
When I was a kid I used to draw a lot of cartoons and I’d sometimes wonder around my neighborhood looking at the world and trying to convert it in my mind into a cartoon representation.
 I had a friend who tried to ‘see’ the world in black and white after getting obsessed with movies. Another one would stop at traffic lights as they beeped and hear additional music in the poly-rhythms of beeps and cars and traffic. Now, AI lets us create tools that make these idiosyncratic, subjective views of the world real to others – I don’t need to have spent years watching and/or drawing Anime to be able to look at the world and see an Anime representation of it, instead I can use something like ‘AnimeGAN’ and take a shortcut. This feels like a weirder thing than we take it to be, and I expect the cultural effects to be profound in the long term.
  Get the code: AnimeGAN (GitHub).

####################################################

Want computers that can read Bengali? Do these things:
…Plus, how cultures will thrive or decline according to how legibile they are to AI systems…
What happens if AI systems can’t read an alphabet? The language ends up not being digitized much, which ultimately means it has less representation, which likely reduces the number of people that speak that language in the long term.  New research from the United International University in Bangladesh lays out some of the problems inherent to building systems to recognize Bengali text, giving researchers a list of things to work through to improve digitization efforts for the language. 

Why Bengali is challenging for OCR: The Bengali alphabet has 50 letters, 11 vowels, and 39 consonants, and is one of the top ten writing systems used worldwide (with the top three dominant ones being Latin, Chinese, and Arabic). It’s a hard language to perform OCR on because some characters look very similar to one another, and some compound characters – characters where the meaning shifts according to the surrounding context -are particularly hard to parse. The researchers have some tips for data augmentations or manipulations that can make it easier for machines to read Bengali:

  • Alignment: Ensure images are oriented so they’re vertically straight
  • Line segmentation: Ensure line segmentation systems are sensitive to the size of the font. 
  • Character segmentation: Bengali characters are connected together via something called a matra-line (a big horizontal line on the top of a load of Bengali characters). 
  • Character recognition: It’s tricky to do character recognition on the Bengali alphabet because of the use of compound characters – of which there are about 170 common uses. In addition, there are ten modified vowels in the Bengali script which can be present in the left, right, top or bottom of a character. “The position of different modified vowels alongside a character creates complexity in recognition,” they write. “The combination of these modified vowels with each of the characters also creates a large set of classes for the model to learn from”. 

Why this matters: What cultures will be ‘seen’ by AI systems in the future, and which ones won’t be? And what knock-on effects will this have on society? We’ll know the answer in a few years, and papers like this give us an indication of the difficulty people might face when digitizing different languages written with different systems.
  Read more: Constraints in Developing a Complete Bengali Optical Character Recognition System (arXiv).

####################################################

Self-driving freight company Starsky Robotics shuts down:
…Startup cites immaturity of machine learning, cost of investing in safety, as reasons for lack of follow-on funding…
Starsky Robotics, a company that tried to automate freight delivery using a combination of autonomous driving technology and teleoperation of vehicles by human operators, has shut down. The reason? “rather than seeing exponential improvements in the quality of AI performance (a la Moore’s Law), we’re instead seeing exponential increases in the cost to improve AI systems,” the company wrote in a Medium post announcing its shutdown.
In other words – rather than seeing economics of scale translate into
reductions in the cost of each advancement, Starsky saw the opposite – advancing its technology become increasingly expensive as it tried to reach higher levels of reliability. 
  (A post on Hacker News alleges that Starsky had a relatively immature machine learning system circa 2017, and that it kept on getting poorly-annotated images from its labeling services so had a garbage-in garbage-out problem. Whether this is true or not doesn’t feel super germane to me as the general contours of Starsky’s self-described gripes with ML seem to match comments of other companies, and the general lack of manifestation of self-driving cars around us).

Safety struggles: Another challenge Starsky saw was that people don’t appreciate safety, so as the company spent more on ensuring the safety of its vehicles, it didn’t see an increase in favorable press coverage of it or a rise in the number of articles about the importance of safety. Safety work is hard, basically – between September 2017 and June 2019 Starsky devoted most of its resources to improving the safety of its system. “The problem is that all of that work is invisible,” the company said.

What about the future of autonomous vehicles? Starsky thinks it’ll be five or ten years till we see fully self-driving vehicles on the road. The company also thinks there’s a lot more work to do here than people suspect. Going from “sometimes working” to “statistically reliable” is about 10-1000X more work, it suspects.

Why this matters: Where’s my self-driving car? That’s a question I ask myself in 2020, recalling myself in 2015 telling my partner we wouldn’t need to buy a “normal car” in five years or so. Gosh, how wrong I was! And stories like this give us a sense for why I was wrong – I’d been distracted by flashy new capabilities, but hadn’t spent enough time thinking about how robust they were. (Subsequently, I joined OpenAI, where I got to watch our robot team spend years figuring out how to get RL-trained robots to do interesting stuff in reality – this was humbling and calibrating as to the difficulty of the real world).
  I’ll let Starsky Robotics close this section with its perspective on the (im)maturity of contemporary AI technology: “Supervised machine learning doesn’t live up to the hype. It isn’t actual artificial intelligence akin to C-3PO, it’s a sophisticated pattern-matching tool.”
  Read more: The End of Starsky Robotics (Starsky Robotics, Medium).

####################################################

dronedodge.jpg

Uh-oh, the ball-dodging drones have arrived:
…First, we taught drones to fly autonomously. Now, we’re teaching them how to dodge things…
Picture this: you’re playing a basketball game in a post-pandemic world and you’re livestreaming the game to fans around the world. Drones whizz around the court, tracking you for close-up action shots as you dribble around players and head for the hoop. You take your shot and ignore the drone between you and the net. You throw the ball and the drone dodges out of its way, while capturing a dramatic shot of it arcing into the net. You win the game, and your victory is broadcast around the world.

How far away is our dodge-drone future? Not that far, according to research from the University of Zurich published in
Science Robotics, which details how to equip drones with low-latency sensors and algorithms so they can avoid fast-moving objects, like basketballs. The research uses event-based cameras – “bioinspired sensors with reaction times of microseconds” – to cut drone latency from tens of milliseconds to 3.5 milliseconds. This research builds on earlier research done by the University of Maryland and the University of Zurich, which was published last year (Covered in Import AI #151).

Going outside: Since we last wrote about this research, the team has started to do outdoor demonstrations where they throw objects towards the quadcopter and see how well it can avoid them. In tests, it does reasonably well at spotting a thrown ball in its path, dodging upward, then carrying on to its destination. Drones using this system can deal with objects traveling at up to 10 meters per second, the researchers say. The main limitations are its field of view (sometimes it doesn’t see the object till too late), or the fact the object may not generate enough events during its movement towards the drone (so, a football which describes an arc in the eye has a higher chance of setting off the event-based cameras, while one traveling straight towards it without deviation may not).

Why this matters – and a missed opportunity: Drones that can dodge moving objects are inherently useful in a bunch of areas – sports, construction, and so on. Being able to dodge fast-moving objects will make it easier for us to deploy drones into more chaotic, complex parts of the world. But being able to dodge objects is also the sort of capability that many militaries want in their hardware, and it’d be nice to see the researchers discuss this aspect in their research – it’s so obvious they must be aware of this, and I worry the lack of discussion means society will ultimately be less prepared for hunter-killer-dodger drones.
  Read more: Dynamic obstacle avoidance for quadrotors with event cameras (Science Robotics).
  Read about earlier research here in Import AI #151, or here: EVDodgeNet: Deep Dynamic Obstacle Dodging with Event Cameras (arXiv).
  Via: Drone plays dodgeball to demo fast new obstacle detection system (New Atlas).

####################################################

Tech Tales:

How It Looks And How It Will Be
Earth, March, 2020.

[What would all of this look like if read out on some celestial ticker-tape machine, plugged into innumerable sensors and a cornucopia of AI-analysis systems? What does this look like to something capable of synthesizing all of it? What things have happened and what things might happen?]

There were so many questions that people asked the Search Engines. Do I have the virus? Where can I get tested? Death rate for males. Death rate for females. Death rate by age group. Transmission rate. What is an R0? What can I do to be safe?

Pollution levels fell in cities around the world. Rates of asthma went down. Through their windows, people saw farther. Sunsets and sunrises gained greater cultural prominence, becoming more brilliant the longer the hunkering down of the world went on.

Stock markets melted and pension funds fell. Futures were rewritten in the gyrations of numbers. In financial news services reporters filed copy every day, detailing unimaginable catastrophes that – somehow – grew worse the next day. Financial analysts created baroque drinking games, tied to downward gyrations of the mucket. Take a shot when the Dow loses a thousand points. Down whatever is in your hand when a circuit breaker gets tripped. If three circuit breakers get tripped worldwide within ten minutes of each other at once, everyone needs to drink two drinks.

Unemployment levels rose. SHOW ME THE MONEY, people wrote on signs asking for direct cash transfers. Everyone went “delinquent” in a financial sense, then – later – deviant in a psychological sense.

Unthinkable things happened: 0% interest rates. Negative interest rates that went from a worrying curiosity to a troubling reality in banks across the world. Stimuluses that fed into a system whose essential fuel was cooling, as so many people became so imprisoned inside homes, and apartments, and tents, and ships, and warehouses, and hospitals, and hospital ships.

Animals grew bolder. Nature encroached on quiet cities. Suddenly, raccoons in America and foxes in London had competition for garbage. Farmers got sick. Animals died. Animals broke up. Cows and horses and sheep became the majority occupiers of roads across the world. Great populations of city birds died off as tourist centers lost their coatings of food detritus.

The internet grew. Companies did what they could to accelerate the buildout of data centers, before their workers succumbed. Stockpiles of hard drives and chips and NICs and Ethernet and Infiniband cables began to run out. Supply chains broke down. Shuttered Chinese factories started spinning back up, but the economy was so intermingled that it came back to life fitfully and unreliably.

And yet there was so much beauty. People, trapped with eachother, learned to appreciate conversations. People discovered new things. Everyone reached out to everyone else. How are you doing? How is quarantine?

Everyone got funnier. Everyone wrote emails and text messages and blog posts. People recorded voice memos. Streamed music. Streamed weddings. Had sex via webcam. Danced via webcam. New generations of artists came up in the Pandemic and got their own artworld-nickname after it all blew over. Scientists became celebrities. Everyone figured out how to cook better. People did pressups. Prison workouts became everyone’s workout.

And so many promises and requests and plans for the future. Everyone dreamed of things they hadn’t thought of for years. Everyone got more creative.

Can we: go to the beach? Go cycling? Drink beer? Mudwrestle? Fight? Dance? Rave under a highway? Bring a generator to a beach and do a punk show? Skate through streets at dusk in a twenty-person crew? Build a treehouse? Travel? People asked every permutation of ‘can we’ and mostly their friends said ‘yes’ or, in places like California, ‘hell yes’. 

Everyone donated money for everyone else – old partners who lost jobs, family members, acquaintances, strangers, parents, and more.  People taught eachother new skills. How to raise money online. How to use what you’ve got to get some generosity from other people. How to use what you’ve got to help other people. How to sew up a wound so if you get injured you can attend to it at home instead of going to the hospitals (because the hospitals are full of danger). How to fix a bike. How to steal a bike if things get bad. How to fix a car. How to steal a car if things get really bad. And so on.

Like the virus itself, much of the kindness was invisible. But like the virus itself, the kindness multiplied over time, until the world was full of it – invisible to aliens, but felt in the heart and the eyes and the soul to all the people of the stricken-planet.

Things that inspired this story: You. You. You. And everyone we know and don’t know. Especially the ones we don’t know. Be well. Stay safe. Be loved. We will get through this. You and me and everyone we know.