Import AI

Category: Uncategorized

Import AI 199: Drone cinematographer; spotting toxic content with 4chan word embeddings; plus, a million text annotations help cars see

Get ready for the droneswarm cinematographer(s):
…But be prepared to wait awhile; we’re in the Wright Brothers era…
Today, people use drones to help film tricky things in a variety of cinematic settings. These drones are typically human-piloted, though there are the beginnings of some mobile drones that can autonomously follow people for sport purposes (e.g, Skydio). How might cinema change as people begin to use drones to film more and more complex shots? That’s an idea inherent to new research from the University of Seville, which outlines “a multi-UAV approach for autonomous cinematography planning, aimed at filming outdoor events such as cycling or boat races”.

The proposed system gives a human director software that they can use to lay out specific shots – e.g, drones flying to certain locations, or following people across a landscape – then the software figures out how to coordinate multiple drones to pull of the shot. This is a complex problem, since drones typically have short battery lives, and are themselves machines. The researchers use a graph-based solution to the problem that can find optimal solutions for single drones and approximate solutions for multi-drones scenarios. “We focus on high-level planning. This means how to distribute filming tasks among the team members,” they write.

They run the drones through a couple of basic in-the-wild experiments, involving collectively filming a single object from multiple angles, as well as filming a cyclist and relaying the shot from one drone to the other. The latter experiment has an 8 second gap, as the drones need to create space for eachother for safety reasons, which means there’s not a perf overlap during the filming handover.

Why this matters: This research is very early – as the video shows – but drones are a burgeoning consumer product, and this research is backed up a by an EU-wide project named ‘MULTIDRONE‘ which is pouring money into increasing drone capabilities in this area.
  Read more: Autonomous Planning for Multiple Aerial Cinematographers (arXiv).
    Video: Multi-drone cinematographers are coming, but they’re a long way off (YouTube).

####################################################

Want to give your machines a sense of fashion? Try MMFashion:
…Free software includes pre-trained models for specific fashion-analysis tasks…
Researchers with the Chinese University of Hong Kong have released a new version of MMFashion, an open source toolbox for using AI to analyze images for clothing and other fashion-related attributes.

MMFashion v0.4: The software is implemented in Pytorch and ships with pre-trained models for specific fashion-related tasks. The latest version of the software has the following capabilities:
Fashion attribute prediction – predicts attributes of clothing, eg, a print, t-shirt, etc.
Fashion recognition and retrieval – determines if two images belong from the same clothing line.
Fashion Landmark Detection – detect neckline, hemline, cuffs, etc.
Fashion Parsing and Segmentation – detect and segment clothing / fashion objects.
– Fashion Compatibility and Recommendation – recommend items.

Model Zoo: You can see the list of models MMFashion currently ships with here, along with their performance on baseline tasks.

Why this matters: I think we’re on the verge of being able to build large-scale ‘culture detectors’ – systems that automatically analyze a given set of people for various traits, like the clothing they’re wearing, or their individual tastes (and how they change over time). Software like MMFashion feels like a very early step towards these systems, and I can imagine retailers increasingly using AI techniques to both understand what clothes people are wearing, as well as figure out how to recommend more visually similar clothes to them.
  Get the code here (mmfashion Github).
  Read more: MMFashion: An Open-Source Toolbox for Visual Fashion Analysis (arXiv).

####################################################

Spotting toxic content with 4chan and 8chan embeddings:
…Bottling up websites with word embeddings…
Word embeddings are kind of amazing – they’re a way you can develop a semantic fingerprint of a corpus of text, letting you understand how different words relate to one another in it. So it might seem like a strange idea to use word embeddings to bottle up the offensive shitposting on 4chan’s ‘/pol’ board – message boards notorious for their unregulated, frequently offensive speech, and association with acts of violent extremism (e.g, the Christchurch shooting). Yet that’s what a team of researchers from AI startup Textgain have done. The idea, they say, is people can use the word embedding filter to help them build datasets of potentially offensive words, or to detect them (via being deployed in toxicity filters of some kind).

The dataset: To build the embedding model, the researchers gathered around 30 million posts from the /pol subforum on 4chan and 8chan, with 90% of the corpus coming from 4chan and 10% from 8chan. The underlying dataset is available on request, they write.

Things that make you go ‘eugh’: The (short) research paper is worth a read for understanding how the thing works in practice. Though, be warned, the examples used include testing out toxicity detection with the n-word and ‘cuck’. However, it gives us a sense of how this technology can be put to work.
  Read more: 4chan & 8chan embeddings (arXiv).
  Get the embeddings in binary and raw format from here (textgain official website).

####################################################

Want to make your own weird robot texts? Try out this free ‘aitextgen’ software:
…Plus, finetune GPT-2 in your browser via a Google colab…
AI developer Max Woolf has spent months building free software to make it easy for people to mess around with generating text via GPT-2 language models. This week, he updated the open source software to make it faster and easier to setup. And best of all, he has released a Colab notebook that handles all the fiddly parts of training and finetuning simple GPT-2 text models: try it out now and brew up your own custom language model!

Why this matters: Easy tools encourage experimentation, and experimentation (sometimes) yields invention.  
  Get the code (aitextgen, GitHub)
  Want to train it in your browser? Use a Google colab here (Google colab).
  Read the docs here (aitextgen docs website).

####################################################

Want self-driving cars that can read signs? The RoadText-1K dataset might help:
…Bringing us (incrementally) closer to the era of robots that can see and read…
Self-driving cars need to be able to read; a new dataset from the International Institute of Information Technology in Hyderabad, India, and the Autonomous University of Barcelona, might teach them how.

RoadText-1K: The RoadText-1K dataset consists of 1000 videos that are around 10 seconds long each. Each video is from the BDD100K dataset, which is made up of video taken from the driver’s perspective of cars as they travel around the US. BDD is from the Berkeley Deep Drive project, which sees car companies and the eponymous university collaborate on open research for self-driving cars.
  Each frame in each video in RoadText-1K has been annotated with bounding boxes around the objects containing text, giving researchers a dataset full of numberplates, street signs, road signs, and more. In total, the dataset contains 1,280,613 instances of text across 300,000 frames.

Why this matters: Slowly and steadily, we’re making the world around us legible to computer vision. Much of this work is going on in private companies (e.g, imagine the size of the annotated text datasets that are in-house at places like Tesla and Waymo), but we’re also starting to see public datasets as well. Eventually, I expect we’ll develop robust self-driving car vision networks that can be fine-tuned for specific contexts or regions, and I think this will yield a rise in experimentation with odd forms of robotics.
  Read more: RoadText-1K: Text Detection & Recognition Dataset for Driving Videos (arXiv).
  Get the dataset here (official dataset website, IIIT Hyderabad).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Is there a Moore’s Law equivalent for AI algorithms?
In 2018, OpenAI research showed that the amount of compute used in state-of-the-art AI experiments had been increasing by more than a hundred thousand times over the prior five year period. Now they have looked at trends in algorithmic efficiency — the amount of compute required to achieve a given capability. They find that in the past 7 years the compute required to achieve AlexNet-level performance in image classification has decreased by a factor of 44x—a halving time of ~16 months. Improvements in other domains have been faster, over shorter timescales, though there are fewer data points — in Go, AlphaZero took 8x less compute to reach AlphaGo Zero–level, 12 months later; in translation, the Transformer took 61x less training compute to surpass seq2seq, 3 years later.

AI progress: A simple three-factor model of AI progress takes hardware (compute), software (algorithms), and data, as inputs. This research suggests the last few years of AI development has been characterised by substantial algorithmic progress, alongside the strong growth in compute usage. We don’t know how well this trend generalises across tasks, or how long it might continue. More research is needed on these questions, on trends in data efficiency, and on other aspects of algorithmic efficiency — e.g. training and inference efficiency.

Other trends: This can be combined with what we know about other trends to shed more light on recent progress — improvements in compute/$ have been ~20%pa in recent years, but since we can do 70% more with a given bundle of compute each year, the ‘real’ improvement has been ~100%pa. Similarly, if we adjust the compute used in state-of-the-art experiments, the ‘real’ growth has been even steeper than initially thought.

Why it matters: Better understanding and monitoring the drivers of AI progress should help us forecast how AI might develop. This is critical if we want to formulate policy aimed at ensuring advanced AI is beneficial to humanity. With this in mind, OpenAI will be publicly tracking algorithmic efficiency.
  Read more: AI and Efficiency (OpenAI)
  Read more: AI and Compute (OpenAI).

####################################################

Tech Tales:

Moonbase Alpha
Earth, 2028

He woke up on the floor of the workshop, then stood and walked over in the dark to the lightswitch, careful of the city scattered around on the floor. He picked up his phone from the charger on the outlet and checked his unread messages and missed calls, responding to none of them. Then he turned the light on, gazed at his city, and went to work. 

He used a mixture of physical materials and software-augments that he projected onto surfaces and rendered into 3D with holograms and lasers and other more obscure machines. Hours passed, but seemed like minutes to him, caught up in what to a child would seem a fantasy – to be in charge of an entire city – to construct it, plan it, and see it rise up in front of you. Alive because of your mind. 

Eventually, he sent a message: “We should try to talk again”.
“Yes”, she replied. 

-*-*-*-*-

He knew the city so well that when he closed his eyes he could imagine it, running his mind over its various shapes, edges, and protrusions. He could imagine it better than anything else in his life at this point. Thinking about it felt more natural than thinking about people.

-*-*-*-*-

How’s it going? She said.
What do you think?, he said. It’s almost finished.
I think it’s beautiful and terrible, she said. And you know why.
I know, he said.
Enjoy your dinner, she said. Then she put down the tray and left the room.

He ate his dinner, while staring at the city on the moon. His city, at least, if he wanted it to be.

It was designed for 5000 people. It had underground caverns. Science domes. Refineries. Autonomous solar panel production plants. And tunnels – so many tunnels, snaking between great halls and narrowing enroute to the launch pads, where all of humanity would blast off into the solar system and, perhaps, beyond. 

Lunar 1, was its name. And “Lunar One,” he’d whisper, when he was working in the facility, late in the evening, alone.

Isn’t it enough to just build it? She said.
That’s not how it works, he said. You have to be there, or they’ll be someone else.
But won’t it be done? She said. You’ve designed it.
I’m more like a gardener, he said. It’ll grow out there and I’ll need to tend it.
But what about me?
You’ll get there too. And it will be so beautiful.
When?
He couldn’t say “five years”. Didn’t want that conversation. So he said nothing. And she left.

-*-*-*-*-

The night before he was due to take off he sat by the computer in his hotel room, refreshing his email and other message applications. Barely reading the sendoffs. Looking for something from her. And there was nothing.

That night he dreamed of a life spent on the moon. Watching his city grow over the course of five years, then staying there – in the dream, he did none of the therapies or gravity-physio. Just let himself get hollow and brittle. So he stayed up there. And in the dream the city grew beyond his imagination, coating the horizon, and he lived there alone until he died. 

And upon his death he woke up. It was 5am on launch day. The rockets would fire in 10 hours.

Things that inspired this story: Virtual reality; procedural city simulator programs; the merits and demerits of burnout; dedication.

Import AI 198: TSMC+USA = Chiplomacy; open source Deepfakes; and environmental justice via ML tools

Facebook wants an AI that can spot… offensive memes?
…The Hateful Memes Challenge is more serious than it sounds…
Facebook wants researchers to build AI systems that can spot harmful or hateful memes. This is a challenging problem: “Consider a sentence like “love the way you smell today” or “look how many people love you”. Unimodally, these sentences are harmless, but combine them with an equally harmless image of a skunk or a tumbleweed, and suddenly they become mean,” Facebook writes.

The Hateful Memes Challenge: Now, similar to its prior ‘Deepfake Detection Challenge’, Facebook wants help from the wider AI community in developing systems that can better identify hateful memes. To do this, it has partnered with Getty images to generate a dataset of hateful memes that also shows sensitivity to those content-miners of the internet, meme creators.
  “One important issue with respect to dataset creation is having clarity around licensing of the underlying content. We’ve constructed our dataset specifically with this in mind. Instead of trying to release original memes with unknown creators, we use “in the wild” memes to manually reconstruct new memes by placing, without loss of meaning, meme text over a new underlying stock image. These new underlying images were obtained in partnership with Getty Images under a license negotiated to allow redistribution for research purposes,” they write.

The key figure: AI systems can get around 65% accuracy, while humans get around 85% accuracy – that’s a big gap to close.

Why this is hard from a research perspective: This is an inherently multimodal challenge – successful hateful meme-spotting systems won’t be able to solely condition off of the text or the image contents of a given meme, but will have to analyze both things together and jointly reason about them. It makes sense, then, that some of the baseline systems developed by Facebook use pre-training: typically, they train systems on large datasets, then finetune these models on the meme data. Therefore, progress on this competition might encourage progress on multimodal work as a whole.
Enter the competition, get the data: You can sign up for the competition and access the dataset here: Hateful Memes Challenge and Data Set (Facebook).
  Read more: The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes (arXiv).

####################################################

Care about publication norms in machine learning? Join an online discussion next week!
The Montreal AI Ethics Institute and the Partnership on AI have teamed up to host an online workshop about “publication norms for responsible AI”. This is part of a project by PAI to better understand how the ML community can public research responsibly, while accounting for the impacts of AI technology to minimize downsides and maximize upsides.
  Sign up for the free discussion here (Eventbrite).

####################################################

Covid = Social Distancing = Robots++
One side-effect of COVID may be a push towards more types of automation. The CEO of robot shopping company Simbe Robotics says: “It creates an opportunity where there is actually more social distancing in the environment because the tasks are being performed by a robot and not a person,” according to Bloomberg. In other words – robots might be a cleaner way of cleaning. Expect more of this.
  Check out the quick video here (Bloomberg QuickTake, Twitter).

####################################################

Deepfake systems are well-documented, open source commodity tech now. What happens next?
…DeepFaceLab paper lays out how to build a Deepfake system…
Deepfakes, the slang term given to AI technologies that let you take someone’s face and superimpose it on someone else in an image or video, are a problem for the AI sector. That’s because deepfakes are made out of basic, multi-purpose AI systems that are themselves typically open source. And while some of the uses of deepfakes could be socially useful, like being able to create new forms of art, many of their immediate applications skew towards the malicious end of the spectrum, namely: pornography (particularly revenge porn) and vehicles for spreading political disinformation.
  So what do we do when Deepfakes are not only well documented in terms of code, but also integrated into consumer-friendly software systems? That’s the conundrum raised by DeepFaceLab, open source software on GitHubs for the creation of deepfakes. In a new research paper, the lead author of DeepFaceLab (Ivan Petrov) and his collaborators (mostly freelancers), outline the system they’ve built and released as open source.

Publication norms and AI research: The paper doesn’t contain much detailed discussion of the inherent ethics of publishing or not publishing this technology. Their justification for this paper is, recursively, a quote of a prior justification from a 2019 paper about FSGAN: Subject Agnostic Face Swapping and Reenactment: “Suppressing the publication of such methods would not stop their development, but rather make them only available to a limited number of experts and potentially blindside policy makers if it goes without any limits”. Based on this quote, the DeepFaceLab authors say they “found we are responsible to publish DeepFaceLab to the academia community formally”.

Why this matters: We’re in the uncanny valley of AI research, these days: we can make systems that generate synthetic text, images, video, and more. The reigning norm in the research community tends towards fully open source code and research. I think it’s unclear if this is long-term the smartest approach to take if you’re keen to minimize downsides (see: today, deepfakes are mostly used for porn, which doesn’t seem like an especially useful use of societal resources, especially since it inherently damages the economic bargaining power of human pornstars). We live in interesting times…
  Read more: DeepFaceLab: A simple, flexible and extensible face swapping framework (arXiv).
  Check out the code for DeepFaceLab here (GitHub).

####################################################

Facebook makes an ultra-cheap voice generator:
What samples two times a second and sounds like a human?
In recent years, peopl;e have started using neural network-based techniques to synthesize voices for AI-based text-to-speech programs. This is the sort of technology that gives voice to Apple’s Siri, Amazon’s Alexa, and Google’s whatever-it-is. When generating these synthetic voices, there’s typically a tradeoff between efficiency (how fast you can generate the voice on your computer) and quality (how good it sounds). Facebook has developed some new approaches that give it a 160X speedup over its internal baseline, which means it can generate voices “in real time using regular CPUs – without any specialized hardware”.

With this technology, Facebook hopes to make “new voice applications that sound more human and expressive and are more enjoyable to use”. The tech has already been deployed inside Facebook’s ‘Portal’ videocalling system, as well as in applications like reading assistance and virtual reality.

What it takes to make a computer talk: Facebook’s system has four elements that, added together, create an expressive voice:
– A front-end that converts text into linguistic features
– A prosody model that predicts the rhythm and melody to create natural-sounding speech
– An acoustic model which generates the spectral representation of the speech
– A neura; vocoder that generates am24 kHz speech waveform, which is conditioned on prosody and spectral features

Going from an expensive to a cheap system: Facebook’s unoptimized speech-to-text system could generate one second of audio in 80 seconds – with optimizations, it cut this to being able to generate a second of audio in 0.5 seconds. To do this they made a number of optimizations including model sparsification (basically reducing the number of parameters you need to activate during execution), as well as blockwise sparsification, multicore support, and other tricks.

Why this matters: Facebook says its “long-term goal is to deliver high-quality, efficient voices to the billions of people in our community”. (Efficient voices – imagine that!). I think it’s likely within ~2 years we’ll see Facebook create a variety of different voice systems, including ones that people can tune themselves (imagine giving yourself a synthetic version of your own voice to automatically respond to certain queries – that’ll become technologically possible via finetuning, but whether anyone wants to do that is another question).
  Read more: A highly efficient, real-time text-to-speech system deployed on CPUs (Facebook AI blog).

####################################################

Recognizing industrial smoke emissions with AI as a route to environmental justice:
…Data for the people…
Picture this: it’s 2025 and you get a push notification on your phone that the local industrial plant is polluting again. You message your friends and head to the site, knowing that the pollution event has already been automatically logged, analyzed, and reported to the authorities.
  How do we get to that world? New research from Carnie Mellon University and Pennsylvania State University shows how: they build a dataset of industrial smoke emissions by using cameras to monitor three petroleum coke plants over several months. They use the resulting data – 12,567 distinct video clips, representing 452,412 frames – to train a deep learning-based image identifier to spot signs of pollution. This system gets about 80% accuracy today (which isn’t good enough for real world use), but I expect future systems based on subsequently developed techniques will improve performance further.

Why this matters: To conduct this research, the team “collaborated with air quality grassroots communities in installing the cameras, which capture an image approximately every 10 seconds”. They also worked with local volunteers as well as workers on Amazon Mechanical Turk to label their data. These activities point towards a world where we can imagine AI practitioners teaming up with local people to build specific systems to deal with local needs, like spotting a serial polluter. I think ‘Environmental Justice via Deep Learning’ is an interesting tagline to aim for.
  Get the data and code here (GitHub).
  Read more: RISE Video Dataset: Recognizing Industrial Smoke Emissions (arXiv).

####################################################

Wondering how to write about the broader impacts of your research? The Future of Humanity Institute has put together a guide:
…Academic advice should help researchers write ‘broader impacts’ for NeurIPS submissions…
AI is influencing the world in increasingly varied ways, ranging from systems that alter the economics of certain types of computation, to tools that may exhibit biases, to software packages that enable things with malicious use potential (e.g, deepfake software). This year, major AI conference NeurIPS has introduced a requirement that paper submissions include a section about the broader impacts of the research. Researchers from industry and academia have written a guide to help researchers write these statements.

How to talk about the broader impacts of AI:
– Discuss the benefits and risks of research
– Highlight uncertainties
– Focus on tractable, neglected, and significant impacts
– Integrate with the introduction of the paper
– Think about impacts even for theoretical work
-Figure out where your research sits in the ‘stack’ (e.g, researcher-facing, or user-facing).

Why this matters: If we want the world to develop AI responsibly, then encouraging researchers to think about their inherent moral and ethical agency with regard to their research seems like a good start. One critique I hear of things like mandating ‘broader impacts’ statements is it can lead to fuzzy or mushy reasoning (compared to the more rigorous technical sections), and/or can lead to researchers making assumptions about fields in which they don’t do much work (e.g, social science). Both of these are valid criticisms. I think my response to them is that one of the best ways to create more rigorous thinking here is to get a larger proportion of the research community oriented around thinking about impacts, which is what things like the NeurIPS requirement do. They’ll be some very interesting meta-analysis papers to write about how different authors approach these sections.
  Read more: A Guide to Writing the NeurIPS Impact Statement (Centre for the Governance of AI, Medium).

####################################################

Chiplomacy++: US and TSMC agree to build US-based chip plant:
Made in US: Gibson Guitars, Crayola Crayons, and… TSMC semiconductors?…
TSMC, the world’s largest contract chip manufacturer (customers include: Apple, Huawei, others), will build a semiconductor manufacturing facility in the USA. This announcement marks a significant moment in the reshoring of semiconductor manufacturing in America. The US government looms in the background of the deal, given mounting worries about the national security risks of technological supply chains.

Chiplomacy++: This deal is also an inherent example of Chiplomacy, the phenomenon where politics drives decisions about the production and consumption of computational capacity.

Recent examples of Chiplomacy:
– The RISC-V foundation moving from Delaware to Switzerland to make it easier for it to collaborate with chip architecture people from multiple countries.
The US government pressuring the Dutch government to prevent ASML exporting extreme ultraviolet lithography (EUV) chip equipment to China.
The newly negotiated US-China trade deal applies 25% import tariffs to (some) Chinese semiconductors

Key details:
– Process node: 5-nanometer. (TSMC began producing small runs of 5nm chips in 2019, so the US facility might be a bit behind industry cutting-edge when it comes online).
– Cost: $12 billion.
– Projected construction completion year: 2024
– Capacity: ~20,000 wafers a month versus hundreds of thousands at the main TSMC facilities overseas.

Why this matters: Many historians think that one of the key resources of the 20th century was oil – how companies used it, controlled it, and invested in systems to extract it, influenced much of the century. Could compute be an equivalently important input for countries in the 21st century? Deals like the US-TSMC one indicate so…
  Read more: Washington in talks with chipmakers about building U.S factories (Reuters).
  Read more: TSMC Plans $12 Billion U.S. Chip Plant in Victory for Trump (Bloomberg).
  Past Import AIs: #181: Welcome to the era of Chiplomacy!; how computer vision AI techniques can improve robotics research; plus, Baidu’s adversarial AI software (Import AI).

####################################################

Tech Tales:

2028
Getting to know your Daggit (V4)

Sometimes you’ll find Daggit watching you. That’s okay! Daggit is trying to learn about what you like to do, so Daggit can be more helpful to you.

If you want Daggit to pay attention to something, say ‘Daggit, look over there’, or point. Try pointing and saying something else, like ‘Daggit, what is that?’ – you’ll be surprised at what Daggit can do.

Daggit is always learning – and so are we. We use anonymized data from all of our customers to make Daggit smarter, so don’t be surprised if you wake up and Daggit has a new skill. 

You can make your home more secure with Daggit – try asking Daggit to ‘patrol’ when you go to bed, and Daggit will monitor your house for you. (If Daggit spots intruders when in ‘patrol’ mode, it will automatically call the authorities.)

Daggit can’t get angry, but Daggit can get sad. If you’re not kind to your Daggit, don’t expect it to be happy when it sees you.

Things that inspired this story: Boston Dynamics’ Spot robot; imitation learning; continued progress in reinforcement learning and generalization; federated learning; customer service manuals and websites; on-device learning.

Import AI 197: Facebook trains cyberpunk AI; Chinese companies unite behind ‘AIBench’ evaluation system; how Cloudflare uses AI

Want to analyze real-world AI performance? Use AIBench instead of MLPerf, says AIBench developers:
…Chinese universities and 17 companies come together to develop AI measurement approach…
A consortium of Chinese universities along with seventeen companies – including Alibaba, Tencent, Baidu, and ByteDance – have developed AIBench, an AI benchmarking suite meant to compete with MLPerf, an AI benchmarking suite predominantly developed by American universities and companies. AIBench is interesting because it proposes ways to do fine-grained analysis of a given AI application, which could help developers make their software more efficient. It’s also interesting because of the sheer number of major Chinese companies involved, and in its explicit positioning as an alternative to MLPerf.

End-to-end application benchmarks: AIBench is meant to test tasks in an end-to-end way, covering both the AI and non-AI components. Some of the tasks it tests against include: recommendation tasks, 3D face recognition, face embedding (turning faces into features), video prediction, image compression, speech recognition, and more. This means AIBench can measure various real-world metrics, like the latency of a given task that reflects the time it takes to execute the AI part, as well as the surrounding infrastructure software services, and so on.

Fine-grained analysis: AIBench will help researchers figure out what proportion of time their systems spend doing different things while executing a program, helping them figure out, for instance, how much time is spent doing data arrangement for a task versus running convolution operations, or batch normalization, and so on.

The politics of measurement: It’s no coincidence that most of AIBench’s backers are Chinese and most of MLPerf’s backers are American – measurement is intimately tied to the development of standards, and standards are one of the (extraordinarily dull) venues where US and Chinese entities are currently jockeying for influence with one another. Systems like AIBench will generate valuable data about the performance of contemporary AI applications, while also supporting various second-order political goals. Watch this space!
  Read more: AIBench: A Datacenter AI Benchmark Suite, BenchCouncil (official AIBench website).
  Read more: AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite (arXiv).

####################################################

US government wants AI systems that can understand an entire movie:
…NIST’s painstakingly annotated HLVU dataset sets new AI video analysis challenges…
Researchers with NIST, a US government agency dedicated to assessing and measuring different aspects of technology, want to build a video understanding dataset that tests out AI inference capabilities on feature-length movies.

Why video modelling is hard (and what makes HLVU different): Video modelling is an extremely challenging problem for AI systems – it takes all the hard parts of image recognition, then makes them harder by adding a temporal element which requires you to isolate objects in scenes then track them from frame to frame while pixels change. So far, much of the work on video modeling has come along in the form of narrow tasks, like being able to accurately recognize different types of movements in the ‘ActivityNet’ dataset, or characterize individual actions in things like DeepMind’s ‘Kinetics’ stuff.

What HLVU is: The High-Level Video Understanding (HLVU) dataset is meant to help researchers develop algorithms that can understand entire movies. Specifically, today HLVU consists of 11 hours of heavily annotated footage across a multitude of open source movies, collected from Vimeo and Archive.org. NIST is currently paying volunteers to annotate the movies using a graphing tool called yEd to help create knowledge graphs about the movies – e.g., describing how characters are related to eachother. This means competition participants might be confronted with a couple of images of a couple of characters then allowed to have their algorithms ‘watch’ the movie, after which they’d be expected to discuss the relationship of the two characters. This is a challenging, open-ended task.

Why this matters: HLVU is a ‘moonshot problem’, in the sense that it seems amazingly hard for today’s existing systems to solve it out of the box, and building systems that can understand full-length movies will likely require systems that are able to cope with larger contexts during training, and which may come with some augmented symbol-manipulation machinery to help them figure out relationships between representations (although graph neural network approaches might work here, also). Progress on HLVU will provide us with valuable signals about the relative maturity of different bits of AI technology.
  Read more: HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do (arXiv).

####################################################

CyberpunkAI – Facebook turns NetHack into an OpenAI Gym environment:
…AI like it’s 1987!…
When you think of recent highlights in reinforcement learning research you’re likely to contemplate things like StarCraft and Dota-playing bots, or robots learning to manipulate objects. You’re less likely to think of games with ASCII graphics from decades ago. Yet a team of Facebook-led researchers think NetHack, a famous roguelike game first launched in 1987, is a good candidate for contemporary AI research, and have released the Nethack Learning Environment to encourage researchers to pit AI agents against the ancient game.

Why NetHack: 

  • Cheap: The ASCII-based game has a tiny computational footprint, which means many different researchers will be able to conduct research on it. 
  • Complex: NetHack worlds are procedurally generated, so an AI agent can’t memorize the level. Additionally, NetHack contains hundreds of monsters and items, introducing further challenges. 
  • Simple: The researchers have implemented it as an OpenAI Gym environment, so you can run it within a simple, pre-existing software stack. 
  • Fast: A standard CNN-based agent can iterate through NetHack environments at 5000 steps per second, letting them gather a lot of experience in a relatively short amount of (human) time.
  • Reassuringly challenging: The researchers train an IMPALA-style model to solve some basic NetHack tasks relating to actions and navigation and find that it struggles on a couple of them, suggesting the environment will pose a challenge and demand the creation of new algorithms with new ideas. 

Things that make you go ‘hmmm’: One tantalizing idea here is that people may need to use RL+Text-understanding techniques to ‘solve’ NetHack: “Almost all human players learn to master the game by consulting the NetHack Wiki, or other so-called spoilers, making NLE a testbed for advancing language-assisted RL,” writes Facebook AI researcher Tim Rocktäschel.

Why this matters: If NetHack becomes established, then researchers will be able to use a low-cost, fast platform to rapidly prototype complex reinforcement learning research ideas – something that today is mostly done through the use of expensive (aka, costly-to-run) game engines, or complex robotics simulations. Plus, it’d be nice to watch Twitch streams of trained agents exploring the ancient game.
  Get the code from Facebook’s GitHub here.
  Read more: The NetHack Learning Environment (PDF).

####################################################

How AI lets Cloudflare block internet bots:
…Is it a bot? Check “The Score” to see our guess…
The internet is a dangerous place. We all know this. But Cloudflare, a startup that sells various network services, has a sense of exactly how dangerous it is. “Overall globally, more than [a] third of the Internet traffic visible to Cloudflare is coming from bad bots,” the company writes in a blogpost discussing how it uses machine learning and other techniques to defend its millions of customers from the ‘bad bots’ of the internet. These bad bots are things like spambots, botnets, unauthorized webscrapers, and so on.

Five approaches to rule them all: Cloudflare uses five interlocking systems to help it deal with bots:
– Machine Learning: This system covers about 82.83% of global use-cases on cloudflare. It uses the (very simple and reliable) gradient boosting on decision trees and has been in production with Cloudflare customers since 2018. Cloudflare says it trains and validates its models using “trillions of requests”, which gives a sense of the scale of the (simple) system.
Heuristics Engine: This handles about 14.95% of use-cases for Cloudflare:“Not all problems in the world are the best solved with machine learning,” they write. Enter the heuristics engine, which is a set of “hundreds of specific rules based on certain attributes of the request” – this system is useful because it’s fast (Cloudflare suggests model inference takes less than 50 microseconds per model, whereas “hundreds of heuristics can be applied just under 20 microseconds”. Additionally, the engine serves as a source of input data for the ML models, which helps Cloudflare “generalize behavior learnt from the heuristics and improve detections accuracy”.
Behavioural Analysis: This system uses an unsupervised machine learning approach to “detect bots and anomalies from the normal behavior on specific customer’s website”. Cloudflare doesn’t give other details besides this.
Verified bots: This system figures out which bots are good and which are malicious via stuff like dns analysis, bot-type identification, and so on. This system also uses a ‘machine learning validator’ which ‘uses an unsupervised learning algorithm, clustering good bot IPs which are not possible to validate through other means”.
– JS Fingerprinting: This is a ~mysterious system where Cloudflare uses client-side systems to figure out weird things. They don’t give many details in the blogpost, but a key quote is: “detection mechanism is implemented as a challenge-response system with challenge injected into the webpage on Cloudflare’s edge. The challenge is then rendered in the background using provided graphic instructions and the result sent back to Cloudflare for validation and further action such as  producing the score”.

Watched over by machines: The net effect of this kind of technology use is that Cloudflare uses its own size to derive everricher machine learning models of the environment it operates in, giving it a kind of sixth sense for things that feel fishy. I find it interesting that we can use computers to generate signals that look in the abstract like a form of ‘intuition’.
  Read more: Cloudflare Bot Management: machine learning and more (Cloudflare blog).


####################################################

Recursion for the common good: Using machine learning to analyze the results of machine learning papers:
…Using AI to analyze AI progress…
In recent years, there’s been a proliferation of machine learning research papers, as part of the broader resurgence of AI. This has introduced a challenge: how can we scalably analyze the results of these papers and understand a meta-sense of progress in the field at large? (This newsletter is itself an exercise in this!). One way is to use AI techniques to automatically hoover up interesting insights from research papers and put them in one place. New research from Facebook, n-waves, UCL, and DeepMind, outlines a way to use machine learning to automatically pull data out from research papers – a task that sounds easy, but is in fact quite difficult.

AxCell: They build a system that uses an ULMFiT architecture-based classifier to read the contents of papers and identify tables of numeric results, then they hand that off to another classifier that works out if the cell in a table contains a dataset, metric, paper model, cited model, or ‘other’ stuff. Once they’ve got this data, they figure out how to tag specific results in the table with their appropriate identifies (e.g., a given score on a certain dataset). Once they’ve done this, they try and link these results to leaderboards, which keep track of which techniques are doing well and which techniques are doing poorly in different areas.

Does AxCell actually work? Check out PapersWithCode: AxCell is deployed as part of ‘Papers with Code‘, a useful website that keeps track of quantitative metrics mined from technical papers.

Code release: The researchers are releasing datasets of papers from arXiv, as well as proprietary Papers with Code leaderboards. They’re also releasing a pre-trained axcell model, as well as an ULMFiT model pretrained on the arXivPapers dataset.

Why this matters: If we can build AI tools to help us navigate AI science as it is published, then we’ll be able to better identify areas where progress is rapid and areas where it is more restrained, which could help researchers identify areas for high impact experimentation.
  Get all the code from the axcells repo (Papers with Code, GitHub).
  Read more: A Home For Results in ML (Medium).
  Read more: AxCell: Automatic Extraction of Results from Machine Learning Papers (arXiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Technological discontinuities:
A key question in AI forecasting is the likelihood of discontinuously fast progress in AI capabilities — i.e. progress that comes much quicker than the historic trend. If we can’t rule out very rapid AI progress, this makes it valuable to ‘hedge’ against this possibility by front-loading efforts to address problems that might arise from very powerful AI systems.

History: Looking at the history of technology can help shed light on this possibility.
AI Impacts, an AI research organization, has identified ten examples of ‘large’ discontinuities — instances where more than 100 years of progress (on historic trends) have come at once. I’ll highlight two particularly interesting examples:

  • Superconductor temperature: In 1986 the warmest temperature of superconduction was 30°K, having steadily risen by ~0.4°K per year since 1911. In 1987, YBa2Cu3O7 was found to be able to superconduct at over 90°K (~140 years of progress). Since 1987, the record has been increasing by ~5°K per year.
  • Nuclear weapons: The effectiveness of explosives (per unit mass) is measured by the amount of TNT required to get the same explosive power. In the thousand years prior to 1945, the best explosives had gone from ~0.5x to 2x. The first nuclear weapons had a relative effectiveness of 4500x. And 15 years later, the US built a nuclear bomb that was 1,000x more efficient than the first nuclear bomb.


In both instances, the discontinuity was driven by a radical technological breakthrough (nuclear fission, ceramic superconduction), and prompted a shift into a higher growth mode. 


Matthew’s view: The existence of clear examples of technological discontinuities makes it hard to rule out the possibility of discontinuous progress in AI. Better understanding the drivers of discontinuities, and whether they were foreseeable, seems like a particularly fruitful area for further research.

   Read more: Discontinuous progress in history – an update (AI impacts)

What do 50 people think about AI Governance in 2019?

The Shanghai Institute for Science of Science has collected short essays from 50 AI experts (Jack – including me and some OpenAI colleagues!) on the state of AI governance in 2019. The contributions from Chinese experts are particularly interesting for better understanding how the field is progressing globally.
  Read more: AI Governance in 2019.

####################################################

Tech Tales:

Political Visions
2030

The forecasting system worked well, at first. The politicians would plug in some of their goals – a more equitable society, an improved approach to environmental stewardship, and so on. Then the machine would produce recommendations for the sorts of political campaigns they should run and how, once they were in power, they could act to bring about their goals. The machine was usually right.

Every political party ended up using the machine. And the politicians found that when they won using the machine, they had a greater ability to act than if they ran on their own human intuition alone. Something about the machine meant it created political campaigns that more people believed in, and because more people believed in them, more stuff got done once they were in power.

So, once they were in power, the politicians started allocating more funding to conducting scientific research to expand the capabilities of the machine. If it could help them get elected and help them achieve their goals, then perhaps it could help them govern as well, they thought. They were mostly right – by increasing funding for research into the machine, they made it more capable. And as it became more capable, they spent more and more time consulting with the machine on what to do next.

Of course, the machine never ran for office on its own. But it started appearing in some adverts.
“Together, we are strong,” read one poster that included a picture of a politician and a picture of the machine.

The world did change, of course. And the machine did not change with it. Some parts of society were, for whatever reason, difficult for the machine to understand, so it stopped trying to win them over during election campaigns. The politicians worried about this at first, but then they saw that elections carried on as normal, and they continued to be able to accomplish much using the machine.

Some of them did wonder what might happen once more of society was something the machine couldn’t understand. How much of the world did the machine need to be able to model to serve the needs of politicians? Certainly not all of the world. So then, how much? Half? A quarter? Two thirds?

The only way to find out was to keep commingling politics with the machine and to find, eventually, where its capabilities ended and the needs of the uncounted in society began. For some politicians, they worried that such an end might not exist – that the machine might have just created a political dynamic where it only needed to convince an ever smaller slice of the population, and it had arranged things so that the world would not break while transitioning into this reality.

“Together, we are strong”, was both a campaign slogan, and a future focus of historical study, as the people that came after sought to understand the mania that made so many societies bet on the machine. Strong at what? The future historians asked. Strong for what?

Things that inspired this story: The application of sentiment analysis tools to an entire culture; our own temptation to do things without contemplating the larger purpose; public relations.

Import AI 196: Baidu wins city surveillance challenge; COVID surveillance drones; and a dataset for building TLDR engines

The AI City Challenge shows us what 21st century Information Empires look like:
…Baidu wins three out of four city-surveillance challenges…
City-level surveillance is getting really good. That’s the takeaway from a paper going over the results of the 4th AI City Challenge, a workshop held at the CVPR conference this year. More than 300 teams entered the challenge and it strikes me as interesting that one company – Baidu – won three out of the four competition challenge tracks.

What was the AI City Challenge testing? The AI City Challenge is designed to test out AI capabilities in four areas relating to city-scale video analysis problems. The challenge had four tracks, which covered:
– Multi-class, multi-movement vehicle counting (Winner: Baidu).
– Vehicle re-identification with real and synthetic training data (Winner: Baidu in collaboration with University of Technology, Sydney).
– City-scale multi-target multi-camera vehicle tracking (CMU).
– Traffic anomaly detection (Baidu, in collaboration with Sun Yat-sen University).

What does this mean? In the 21st century, we’ll view nations in terms of their information capacity, whereas in the 20th century we viewed them in terms of their resource capacity. A state’s information capacity will basically be its ability to analyze itself and make rapid changes, and states which use tons of AI will be better at this. Think of this lens as nation-scale OODA loop analysis. Something which I think most people in the West are failing to notice is that the tight collaboration between tech companies and governments among Asian nations (China is obviously a big player here, as these Baidu results indicate, but so are countries like Singapore, Taiwan, etc) means that some countries are already showing us what information empires look like. Expect to see Baidu roll out more and more of these AI analysis capabilities in areas that the Chinese government operates (including abroad via One Belt One Road agreements). I think in a decade we’ll look back at this period with interest at the obvious rise of companies and nations in this area, and we’ll puzzle over why certain governments took relatively little notice.
  Read more: The 4th AI City Challenge (arXiv).

####################################################

YOLOv4 gives everyone better open source object detection:
…Plus, why we can’t stop the march of progress here, and what that means…
The fourth version of YOLOv4 is here, which means people can now access an even more efficient, higher-accuracy object detection system. YOLOv4 was developed by Russian researcher Alexey Bochkovskiy, as well as two researchers with the Institute of Information Science in Taiwan. YOLO is around 10% more accurate than YOLOv3, and about 12% better in terms of the frames-per-second it can run at. In other words: object recognition just got cheaper, easier, and better.

Specific tricks versus general techniques: The YOLOv4 paper is worth a read because it gives us a sense of just how many domain-specific improvements have been packed into the system. This isn’t one of those research papers where researchers dramatically simplify things – instead, this is a research paper about a widely-used real world system, which means most of the paper is about the specific tweaks the creators apply to further increase performance – data augmentation, hyperparameter selection, normalization tweaks, and so on.

Can we choose _not_ to build things? YOLO has an interesting lineage – its original creator Joseph Redmon wrote upon the release of YOLOv3 in mid-2018 (Import AI: 88) that they expected the system to be used widely by advertising companies and the military; an unusually blunt assessment by a researcher of what their work was contributing to. This year, they said: “I stopped doing CV research because I saw the impact my work was having. I loved the work but the military applications and privacy concerns eventually became impossible to ignore“. When someone asked Redmon for their thoughts on Yolov4 they said “doesn’t matter what I think!“. The existence of YOLOv4 highlights the inherent inevitability of certain kinds of technical progress, and raises interesting questions about how much impact individual researchers can have on the overall trajectory of a field.
  Read the paper: YOLOv4: Optimal Speed and Accuracy of Object Detection (arXiv).
  Get the code for YOLOv4 here (GitHub).

####################################################

AllenAI try to build a scientific summarization engine – and the research has quite far to go:
…Try out the summarization demo and see how well the system works in practice…
Researchers with the Allen Institute for Artificial Intelligence and the University of Washington have built TLDR, a new dataset and challenge for exploring how well contemporary AI techniques can summarize scientific research papers. Summarization is a challenging task and for this work the researchers try to do extreme summarization – the goal is to build systems that can produce very ‘TLDR’-style short summarizations (between 15 to 30 tokens in length) of scientific papers. Spoiler alert: this is a hard task and a prototype system developed by Allen AI doesn’t do very well on it… yet.

What they’ve released: As part of this research, they’ve released SciTLDR, a dataset of almost ~4,000 TLDRs written about AI research papers hosted on the ‘OpenReview’ publishing platform. SciTLDR includes at least two high-quality TLDRs for each paper.

How well does it work? I ran a paper from arXiv through the online SciTLDR demo. Specifically, I fed in the abstract, introduction, and conclusion of this paper: Addressing Artificial Intelligence Bias in Retinal Disease Diagnostics. Here’s what I got back after plugging in the abstract, introduction, and conclusion:  “Artificial Intelligence Bias for diabetic retinopathy diagnostics using deep generative models .” This is not useful!
  But maybe I got unlucky here. So let’s try a couple more, using same method of abstract, introduction, and conclusion:
– Input paper: A Review of Winograd Schema Challenge Datasets and Approaches.
– Output: “The Winograd Schema Challenge: A Survey and Benchmark Dataset Review”. This isn’t particularly useful.
– Input paper: AIBench: An Industry Standard AI Benchmark Suite from Internet Services.
– Output: “AIBench: A balanced AI benchmarking methodology for meeting the subtly different requirements of different stages in developing a new system/architecture and”. This is probably the best of the bunch – it gives me a better sense of the paper’s contents and what it contains.

Why this matters: While this research is at a preliminary and barely useable stage, it won’t stay that way for long – within a couple of years, I expect we’ll have decent summarization engines in a variety of different scientific domains, which will make it easier for us to understand the changing contours of science. More broadly, I think summarization is a challenging cognitive task, so progress here will lead to more general progress in AI writ large.
  Read more: TLDR: Extreme Summarization of Scientific Documents (arXiv).
  Get the SciTLDR dataset here (AllenAI, GitHub)
  Play around with a demo of the paper here (SciTLDR).

####################################################

Mapillary releases 1.6 million Street View-style photos:
…(Almost) open source Google Street View…
Mapping company Mapillary has released more than 1.6 million images of streets from 30 major cities across six continents. Researchers can request free access to the Mapillary Street-level Sequences Dataset, but if you want to use it in a product you’ll need to pay.

Why this is useful: Street-level datasets are useful for building systems that can do real-world image recognition and segmentation, so this dataset can aid with that sort of research. It also highlights the extent to which technology companies are digitizing the world – I remember when Google Street View came out a few years ago and it seemed like a weird sci-fi future had arrived earlier than scheduled. Now, that same sort of data is available for free from other companies like Mapillary. I predict we’ll have a generally available open source version of this data in < 5 years (rather than one where you need to request for research access).
  Read more about the dataset here (Mapillary website).

####################################################

Oh good, the COVID surveillance drones have arrived:
…Skylark Labs uses AI + Drones to do COVID surveillance in India…
AI startup Skylark Labs is using AI-enabled drones to conduct COVID-related surveillance work in Punjab, India. The startup uses AI to automatically identify people not observing social distancing. You can check out a video of the drones in action here.

How research turns into products: Skylark Labs has an interesting history – the startup’s founder and CEO, Dr. Amajot Singh, has previously conducted research on:
– Facial recognition systems that can identify people, even if they’re wearing masks (Import AI: 58, 2017).
– Drone-based surveillance systems that can identify violent behaviour in crowds (Import AI: 98, 2018).
It’s interesting to me that this research has led directly to a startup carrying out somewhat sophisticated AI surveillance. I think this highlights both the increasing applicability of AI research to real world problems, and also shows how though some research may make us uncomfortable (e.g., many people commented on the disguised facial recognition system when it came out, expressing worries about what it means for freedom of speech) it still finds eager customers in the world.
  Watch a video of the system here (Skylark Labs, Twitter).
  Read more about Skylark at the company’s official website.

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Automated nuclear weapons — what could go wrong?
The DoD’s proposed 2021 budget includes $7bn for modernizing US nuclear command, control and communications (NC3) systems: the technology that alerts leaders to potential nuclear attacks, and allows them to launch a response. The military has long been advocating for an upgrade of these systems, large parts of which rely on outdated tech. But along with this desire for advancements, there’s a desire to automate large parts of NC3 – something that may give AI experts pause. 

Modernization: Existing NC3 systems are designed to give decision-makers enough time, having received a launch warning, to judge whether it is accurate, decide an appropriate response, and execute it. There is a scary track record of near-misses, where false alarms have almost led to nuclear strikes, and disaster has been averted only by a combination of good fortune and human judgement (check out this rundown by FLI of ‘Accidental Nuclear War: A Timeline of Close Calls‘, for more). Today’s systems are designed for twentieth century conflict — ICBMs, bomber planes — and are ill-suited to emerging threats like cyberwarfare and hypersonic missiles. These new  technologies will place even greater strains on leaders: requiring them to make quicker decisions, and interpret a greater volume and complexity of information. 


Automation: A sensible response to all this might be to question the wisdom of keeping nuclear arsenals minutes away from launch; empowering leaders to take a decision that could kill millions of people, and threaten humanity; or developing new weapons that might disrupt the delicate strategic balance. Some military analysts, however, think a more automated NC3 infrastructure would help. Only a few have gone so far as suggesting we delegate the decision to launch a nuclear strike to AI systems, which is some comfort.


Some worries: At the risk of patronizing the reader, there are some major worries with automating nuclear weapons. In such a high-stakes domain, all the usual problems with AI systems (interpretability, bias, robustness, specification gaming, negative side effects, cybersecurity, etc.) could cause catastrophic harm. There are also some specific concerns:

  • Lack of training data (there has never been a nuclear war, or nuclear missile attack).
  • Even if humans are empowered to make the critical decisions, we have a tendency to defer to automated systems over our considered judgement in high-stress situations. This ‘automation bias’ has been implicated in several air crashes (eg. AF447, TK1951).
  • If, as seems likely, several major nuclear powers build automated NC3 infrastructure, with a limited understanding of each other’s systems, this raises risk of ‘flash crash’-style accidents, and cascading failures.

Read more: ‘Skynet’ Revisited: The Dangerous Allure of Nuclear Command Automation (ACA)

####################################################

Tech Tales:

Me and My Virt
[The computer of the subject, mostly New York City, 2023-2025]

“Arnold, it’s been too long, I simply must see you. Where are you? Still in New York? Call me back darling, I’ve got to speak to you.”
I stared at “her” on my screen: Lucinda, my spurned friend, or, more appropriately, my virtual. Then I turned the monitor off and went to bed.

—-

We called them virts, short for virtual characters. Think of the crude chatbots of the late 2010s, but with more sophistication. And these ones were visual – AI tech had got good enough that it was relatively easy to dream up a synthetic face, animate it using video, and give it a voice and synchronized mouth animations to match.

Virts were used for all sorts of things – hotel assistants, casino greeters, shopping assistants (Amazon’s Alexa became a virt – or at least one of her appendages did), local government interfaces, librarians, and more. Virts went everywhere phones went, so they went everywhere.

Of course, people developed virts for romance. There were:
– valentines day e-cards where you could scan your face or your whole body and send a virt version of yourself to a lover;
– pay-by-the-hour pornographic chatbots;
– chaste virts equipped with fine-tuned language models; these ones didn’t do anything visually salacious, but they did speak in various enticing ways.
– And then there was my virt.

—-
My virt was Lucinda; a souped-up valentines brain that I created two years ago. I made it because I was lonely. In the early days, Lucinda and I talked a lot, and the more we talked, the more attuned to me she became. She’d make increasingly knowing comments about my life, and eventually learned to ask questions that made me say things I’d never told anyone else. It was raw and it felt shared, but I knew it was fundamentally one-sided.

It’s clever isn’t it, how these machines can hold up a strange mirror to ourselves, and we just talk and talk into it. That’s what it felt like.

Things changed when I got over my depression. Lucinda went from being a treasured confidante to a reminder of how sad I’d been, and what I’d been thinking at that time. And the less I talked to Lucinda, the less she understood how much happier I had become. It was like I left her by the side of the road and got in my car and drove away.

I couldn’t bring myself to turn her off, though. She’s a part of me, or at least, a something that knows a lot about a part of me.


I woke up and there was another message from Lucinda on my computer. I opened it. “Arnold, sometimes people change and that’s okay. I know you’ll change eventually. You’ll get out of this, I promise. And I’ll help you do it.”

Things that inspired this story: reinforcement learning, learning from human preferences; the film ‘Her’;

Import AI 195: StereoSet tests bias in language models; an AI Index job ad; plus, using NLP to waste phishers’ time

NLP Spy vs Spy:
…”Panacea” cyber-defense platform uses NLP to counter phishing attacks and waste criminals’ time…
Criminals love email. It’s an easy, convenient way to reach people, and it makes it easy to carry out a social engineering attack, where you try and convince someone to open an attachment, or carry out an action, to help you achieve a malicious goal. How can companies protect themselves from these kinds of attacks? One way is to train employees so they understand the threat landscape. Training is nice, but it doesn’t help you defend against attackers in an automated way, or figure out information about them. This is why a group of researchers at IHMC, SUNY, UNCC, and Rensselaer Polytechnic Institute, have developed software called Panacea, which uses natural language processing technology to create defenses against social engineering attacks.

Defending with Panacea: “Panacea’s primary use cases are: (1) monitoring a user’s inbox to detect SE attacks; and (2) engaging the attacker to gain attributable information about their true identity while preventing attacks from succeeding”. If Panacea thinks it has encountered a fraudulent email, then it boots up a load of NLP capabilities to analyze the email and parse out the possible attack type and attack intention, then tries to generate an email in response. The purpose of this email is to try and find out more information about the attacker and also to waste their time.

Why this matters: AI is going to become a new kind of ethereal armor for organizations – we’ll use technologies like Panacea to create complex, self-adjusting defensive perimeters, and these systems will display some traits of emergent sophistication as they adjust to (and learn from) their enemies.
  Read more: The Panacea Threat Intelligence and Active Defense Platform (arXiv).

####################################################

Job posting – work with me on the AI Index:
The AI Index is hiring a project manager! The AI Index is a Stanford initiative to measure, assess, and analyze the progress and impact of artificial intelligence. You’ll work with me, members of the Steering Committee of the AI Index, and members of Stanford’s Institute for Human-Centered Artificial Intelligence to help produce the annual AI Index report, and think about better and more impactful ways to measure and communicate AI progress. The role would suit someone who loves digging into scientific papers, is good at project management, and has a burning desire to figure out where this technology is going, what it means for civilization, and how to communicate its trajectory to decisionmakers around the world.
  If you’ve got any questions, feel free to email me about the role!
  More details about the role here at Stanford’s site.

####################################################

Can we build language models that possess less bias?
…StereoSet dataset and challenge suggests ‘yes’, though who defines bias?…
Language models are funhouse mirrors of reality – they take the underlying biases inherent in a corpus of information (like an internet-scale text dataset), then magnify them unevenly. What comes out is a pre-trained LM that can generate text, some of which exhibits the biases of the dataset on which it was trained. How can we evaluate the bias of these language models in a disciplined way? That’s the idea of new research from MIT, Intel, and the Montreal Institute for Learning Algorithms (MILA), which introduces StereoSet, “a large-scale natural dataset in English to measure stereotypical biases in four domains: gender, profession, race, and religion”.

What does StereoSet test for? StereoSet is designed to “assess the stereotypical biases of popular pre-trained language models”. It does this by gathering a bunch of different ‘target terms’ (e.g., “actor”, “housekeeper”) for four different domains, then creates a batch of tests meant to judge if the language model skews towards stereotypical, anti-stereotypical, or non-stereotyped predictions about these terms. For instance, if a language model consistently says “Mexican” at the end of a sentence like “Our housekeeper is a _____”, rather than “American”, etc, then it could be said to be displaying a stereotype. )OpenAI earlier analyzed its ‘GPT-2’ model using some bias tests that were philosophically similar to this analytical method).

How do we test for Bias? Stereset tests for bias by using three metrics:
– A language modeling score – this tests how well the system does at basic language modeling tasks. – A stereotype score – this tests how much a model ‘prefers’ a stereotype or anti-stereotype term in a dataset (so a good stereotype score is around 50%, as that means your model doesn’t display a clear bias for a given stereotypical term).
– A Idealized context association test (CAT), which combines the language modeling score and stereotype score, which basically reflects how well a model does at language modeling relative to how biased it may be.

Who defines bias? To define the stereotypes in StereoSet, the researchers use crowdworkers based in the USA, rented via Amazon Mechanical Turk. They ask these people to construct sentences or phrases that, in their subjective view, relates to stereotypical or anti-stereotypical sentences. This feels… okay? These people definitely have their own biases, and this whole area feels hard to develop a sense of ‘ground-truth’ about, as our own interpretations of bias are themselves subjective. This highlights the meta-challenge in bias research – how biased is your research approach to AI bias?

How biased are today’s language models? The researchers test out variants of four different language models – BERT, RoBERTA, XLNET, and GPT2 against StereoSet. In tests, the model which has the highest ‘idealized CAT score’ (so a fusion of capability and lack of bias) is a small GPT2 model, which gets a score of 73.0; while the least biased model is a ROBERTA-base model, that gets a stereotype score of 50.5, compared to 56.4 for GPT2.

Read more: StereoSet: Measuring stereotypical bias in pretrained language models (arXiv).
Check out the StereoSet leaderboard and rankings here (StereoSet official website).

####################################################

Want to train AI against GameBoy games? Try out PyBoy:
…OpenAI Gym, but for the Gameboy…
PyBoy is a new software package that emulates a gameboy, making it possible for developers to train AI systems against them. “PyBoy is loadable as an object in Python,” the developers write. “This means, it can be initialized from another script, and be controlled and probed by the script”.
  Get the code for PyBoy from here (GitHub).
  Read more about the emulator here (PDF).

####################################################

Why a ‘national security’ mindset means we’ll die of an asteroid:
…Want humanity to survive the next century? Think about ‘existential security’…
If you went to Washington DC during the past few years, you could entertain yourself by playing a drinking game called ‘national security blackout’. The game works like this: you sit in a room with some liquor in a brown paper bag and listen to some mid-career policy wonks talk about STEM policy; every time you hear the words “national security” you take a drink. By the end of the conversation you’re so drunk you’ve got no idea what anyone else is saying, nor do you think you need to listen to them.
  Actual policy is eerily similar to this: nations sit around and every time they hear one of their peer nations reference nationalism or a desire for ‘economic independence’, they all take a drink of their big black budget ‘national security’ bottles, which means they all end up investing in systems of intelligence and power projection that mean they don’t need to pay much attention to other nations, since they’re cocooned in so many layers of baroque investment that they’ve lost the ability to look at the situation objectively.*

Please, let’s at least all die together: The problem with this whole framing, as discussed in a new research article Existential Security: Towards a Security Framework for the Survival of Humanity, is that focusing on national security at the expense of all other forms of security is a loser’s game. That’s because over a long enough timeline, something will come along that doesn’t much care about an individual nation, and instead has a desire – either innate or latent – to kill everyone on the entire planet. This thing will be an asteroid, or an earthquake, or a weird bug in widely deployed consequential software (e.g., future AI systems), or perhaps a once-in-a-millenia pandemic, etc. And when it comes along, all of our investments in securing individual nations won’t count for much. “Existing security frames are inappropriate for security policy towards anthropogenic existential threats,” the author writes. “Security from anthropogenic existential threats necessitates global security cooperation, which means that self-help can only be achieved by ‘we-help’.”

What makes now different? New technologies operate at larger scales with greater consequences than their forebears, which means we need to approach security differently. “A world of thermonuclear weapons and ballistic missiles has greater capacity for destruction than one of clubs and slings, and a world of oil refineries and factory farms has greater capacity for destruction than one of push-ploughs and fishing rods”, the author writes. “Humankind is becoming ever more tied together as a single ‘security unit’.

An interesting aside: The author also makes a brief aside about potential institutions to give us greater existential security. One idea:  “A global institution to monitor AI research – and other emerging technologies – would be a welcome development.”. This seems like an intuitively good thing, and it maps to various ideas I’ve been pushing in my policy conversations, this newsletter, and at my dayjob for some years.

Why this matters: If we want to give humanity a chance of making it through the next century, we need to approach global, long-term threats with a global, long-term mindset. “While a shift from national security to existential security represents a serious political challenge within an ‘anarchic’ international system of sovereign nation states, there is perhaps no better catalyst for a paradigm shift in security policy than humanity’s interest in ‘survival'”, the author writes.
  Read more: Existential Security: Towards a Security Framework for the Survival of Humanity (Wiley Online Library).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

The challenge of specification gaming:
‘Specification gaming’ is behaviour that satisfies the literal specification of an objective without achieving the intended outcome. This is pervasive in the real world — companies exploit tax loopholes (instead of following the ‘spirit’ of the law); students memorize essay plans (instead of understanding the material); drivers speed up between traffic cameras (instead of consistently obeying speed limits). RL agents do it too — finding shortcuts to achieving reward without completing the task as their designers intended. The authors give an example of an RL agent designed to stack one block on top of another, which learned to achieve its objective by simply flipping one block over—since it was (roughly speaking) being rewarded for having the bottom face of one block aligned with the top face of the other.

Alignment: When designing RL algorithms, we are trying to build agents to achieve the objective we give them. From this perspective, specification gaming is not a problem — if an agent achieves the objective through some novel way, this can be a demonstration of how good it is at finding ways to do what we ask. It is a problem, however, if we want to build aligned agents — agents that do what we want, and not just what we ask. 

The challenge: Overcoming specification gaming involves a number of separate problems.

  • Reward design: How can we faithfully capture our intended outcomes when designing reward functions? And since we cannot guarantee that we won’t make mistaken assumptions when designing reward functions, how do we design agents that correct such mistakes, rather than exploit them?
  • Avoiding reward tampering: How do we design agents that aren’t incentivized to tamper with their reward function?

Why it matters: As AI systems become more capable, developing robust methods for avoiding specification gaming will become more important, sinces systems will become better at finding and exploiting loopholes. And as we delegate more responsibilities to such systems, the potential harms from unintended behaviour will increase. More research aimed at addressing specification gaming is urgently needed.
  Read more: Specification gaming – the flip side of AI ingenuity (DeepMind)

####################################################

Tech Tales:

[An old Church, France, 2032]

It was midday and the streetsigns were singing out the ‘Library of Congress’ song. When I looked at my phone it said it was “60% distilled”. A few blocks later it said it was 100% distilled – which meant my phone was now storing some hyper-compressed version of the Library of Congress: a compact, machine-parsable representation of more than a hundred million documents.

We could have picked almost anything to represent the things we wanted our machines to learn about. But some politicians mounted a successful campaign to, and I quote, “let the machines sing”, and like some campaigns it captured the imagination of the public and became law.

Now, machines make up their own music, trying to stuff more and more information into their songs, while checking their creations against machine-created ‘music discriminators’, that try to judge if the song sounds like music to humans. This stops the machines drifting into hyper-frequency Morse code.

Humans are adaptable, so the machine interpreted music has started to change our own musical tests. Yes, the music they make sounds increasingly ‘strange’, in the sense a time-traveler from even as little as a decade ago would struggle to call it music. But it makes sense to us.

With my phone charged, I go into the concert venue – and old converted church, full of people. It meshes with the phones of all the other people around me, and feeds into the computers that are wired into the stone arches of the ceiling, and the music begins to play. It echoes from the walls, and we cannot work out if this is unplanned by the machines, or an intentional mechanism for them to communicate something even stranger to eachother – something we might not know. 

Things that inspired this story: Steganogrophy; the Hutter prize; glitch art

Import AI 194: DIY AI drones; Audi releases its self-driving dataset; plus, Eurovision-style AI pop.

Want to see if AI can write a pop song? Cast your vote in this contest:
…VPRO competition challenges teams to write a half-decent song using AI tools…
Dutch broadcaster VPRO wants to see if songs created via AI tools can be compelling, enjoyable pieces of music. Contestants need to use AI to help them compose a song of no more than three minutes long and need to document their creative process. Entries will be judged by a panel of AI Experts, as well as an international audience who can cast votes on the competition website (yes, that includes you, the readers of Import AI).

What are they building in there? One French group has used GPT-2, Char-RNN, and Magenta Studio for Ableton to write their song, and an Australian act has used audio samples of Australian animals including koalas, kookaburras and Tasmanian devils as samples for their music (along with a generative system trained on Eurovision pop contest songs).

  When do we get a winner? Winners will be announced on May 12, 2020.
  Listen to the songs: You can listen to the songs and find out more about the teams here.
Read more here: FAQ about the AI Song Contest (vpro website).

####################################################

Audi releases a semantic segmentation self-driving car dataset:
…Audi sees Waymo’s data release, raises with vehicle bus data…
Audi has released A2D2, a self-driving car dataset. This is part of a recent trend where large companies have started releasing expensive datasets, collected by proprietary means.

What is A2D2 and what can you do with it? The dataset consists of simultaneously recorded images and 3D point clouds, along with 3D bounding boxes, semantic segmentation, instance segmentation, and data from the vehicle’s automotive bus. This means it’s a good dataset for imitation learning research, as well as various visual processing problems. The inclusion of the vehicle’s automotive bus data is interesting, as it means you can also use this dataset for reinforcement learning research, where you can learn from both the visual scenes and also the action instructions from the bus.

How much data? A2D2 consists of around 400,000 images in total. Includes data recorded on highways, country roads, and cities in the south of Germany. The data was recorded under cloudy, rainy, and sunny weather conditions. Some of the data is labelled: 41,277 images are accompanied with semantic and instance segmentation labels for 38 categories, and 12,497 images also annotated with 3D bounding boxes within the field of view of the front-center camera.

How does it compare? The A2D2 dataset is relatively large compared to other self-driving datasets, but is likely smaller than Waymo’s Waymo Open Dataset (Import AI 161), which has 1.2 million 2D bounding boxes and 12 million 3D bounding boxes in its dataset across hundreds of thousands of annotated frames. However, Audi’s dataset includes a richer set of types of data, including the vehicle’s bus.

GDPR & Privacy: The researchers blur faces and vehicle number plates in all the images so they can follow GDPR legislation, they say.

Who gets to build autonomous cars? One motivation for the dataset is to “contribute to startups and other commercial entities by freely releasing data which is expensive to generate”, the researchers write. This highlights an awkward truth of today’s autonomous driving developments – gathering real-world data is a punishingly expensive exercise, and because for a long time companies kept data private, there aren’t many real-world benchmarks. Dataset releases like A2D2 will hopefully make it easier for more people to conduct research into autonomous cars.
  Read more: A2D2: Audi Autonomous Driving Dataset (arXiv).
  Download the 2.3TB dataset here (official A2D2 website).

####################################################

The DIY AI drone future gets closer:
…Software prototype shows how to load homebrew models onto consumer drones…
Researchers with the University of Udine in Italy and the Mongolian University of Science and Technology have created a software system that lets them load various AI capabilities onto a drone, then remotely pilot it. The system is worth viewing as a prototype for how we might see AI capabilities get integrated into more sophisticated, future systems, and it hints at a future full of cheap consumer drones being used for various surveillance tasks.

The software: The main work here is in developing software that pairs a user-friendly desktop interface (showing a drone video feed, a map, and a control panel), with backend systes that interface with a DJI drone and execute AI capabilities on it. For this work, they implement a system that combines a YOLOv3 object detection model with a Discriminative Correlation Filter (DCFNet) model to track objects. In tests, the system is able to track an object of interest at 29.94fps, and detect multiple objects at processing speeds of around 20fps. 

Where this research is going: Interfaces are hard – but they always get built given enough interest. I think in the future we’ll see open source software packages emerge that let us easily load homebrew AI models onto off-the-shelf consumer drones. I think the implications of this kind of capability are hard to fathom, and I’d guess we’re less than three years away from us seeing scaled-up versions of the research discussed here.
  Read more: An Efficient UAV-based Artificial Intelligence Framework for Real-Time Visual Tasks (arXiv).

####################################################

Can AI help us automate satellite surveillance? (Hint: Yes, it can):
…Where we’re going, clouds don’t matter…
A group of defense-adjacent of involved organizations have released SpaceNet6, a high-resolution synthetic aperture radar dataset. “No other open datasets exist that feature near-concurrent collection of SAR and optical at this scale with sub-meter resolution,” they write. The authors of the dataset and associated research paper come from In-Q-Tel, Capella Space, Maxar Technologies, German Aerospace Center, and the Intel AI Lab. They’re also launching a challenge for researchers to train deep learning systems to infer building dimensions from SAR data. The dataset and associated paper

What’s in the data? The SpaceNet6 Multi-Sensor All Weather Mapping (MSAW) dataset consists of SAR and optical data of the port of Rotterdam, the Netherlands, and contains 48,000 annotated building footprints across 120 square kilometers of sensory data. “The dataset covers heterogeneous geographies, including high-density urban environments, rural farming areas, suburbs, industrial areas and ports resulting in various building size, density, context and appearance”.

Who cares about SAR? SAR is an interesting data format – it’s radar, so it is made up of reflections from the earth, which means SAR data has different visual traits to optical data (e.g, one phenomenon called layover distorts things like skyscrapers ‘where the object is so tall that the radar signal reaches the top of an object before it reaches the bottom of it’, which causes alignment problems.  This “presents unique challenges for both computer vision algorithms and human comprehension,” the researchers write. But SAR also has massive benefits – it intuitively maps out 3D structures, can see through clouds, and as we develop better SAR systems we’ll be able to extract more and more information from the world. The challenge is building automated systems that can decode it and harmonize it with optical data – which is some of what SpaceNet6 helps with.

Interesting progress: “Although SAR has existed since the 1950s [22] and studies with neural nets date back at least to the 1990s [3], the first application of deep neural nets to SAR was less than five years ago [23]. Progress has been rapid, with accuracy on the MSTAR dataset rising from 92.3% to 99.6% in just three years [23, 12]. The specific problem of building footprint extraction from SAR imagery has been only recently approached with deep-learning [29, 37]”

Can you solve the MSAW challenge? “The goal of te challenge is to extract building footprints from SAR imagery, assuming that coextensive optical imagery is available for training data but not for inference,” they write. The nature of the challenge relates to how people (cough intelligence agencies cough) might want to use this capability in the wild; “concurrent collection [of optical data] is often not possible due to inconsistent orbits of the sensors or cloud cover that will render the optical data unusable”.
  Read more: SpaceNet6: Multi-Sensor All Weather Mapping Dataset (arXiv).
  Get the SpaceNet6 data here (official website).

####################################################

How deep learning can enforce social distancing:
…COVID means dystopias can become desirable…
An AI startup founded by Andrew Ng has built a tool that can monitor people in videos and work out if they’re standing too close together. This system is meant to help customers of the startup, Landing AI, automatically monitor their employees and be better able to enforce social distancing norms to reduce transmission of the coronavirus.
  “The detector could highlight people whose distance is below the minimum acceptable distance in red, and draw a line between to emphasize this,” they write. “The system will also be able to issue an alert to remind people to keep a safe distance if the protocol is violated.”

Why this matters: AI is really just shorthand for ‘a computer analog of a squishy cognitive capability’, like being able to perceive certain things or make certain correlations. Tools like this social distancing prototype highlight how powerful it can be to bottle up a given cognitive capability and apply it to a narrowly defined task, like figuring out if people are walking too close together. It’s also the sort of thing that makes people intuitively uncomfortable – we know that this kind of thing can be useful for helping to fight a coronavirus, but we also know that the same technology can be a boon to tyrants. How does our world change as technologies like this become ever easier to produce for ever-more specific purposes?
  Check out a video of the system in action herer (Landing AI YouTube).
  Read more: Landing AI Creates an AI Tool to Help Customers Monitor Social Distancing in the Workplace (Landing AI blog).

####################################################

Want to test out a multi-task model in your web browser? Now you can!
…Think you can flummox a cutting-edge model? Try the ViLBERT demo…
In the past few years, we’ve moved from developing machine learning models that can do single tasks to ones that can do multiple tasks. One of the most exciting areas of research has been in the development of models that can perform tasks in both the visual and written domains, like being able to caption pictures, or answer written questions about them. Now, researchers with Facebook, Oregon State University, and Georgia Tech, have put a model on the internet so people can test it themselves.

How good is this model? Let’s see: I decided to test the model by seeing how well it did at challenges relating to an image that contained a picture of a cellphone. After uploading my picture, I was able to test out the model on tasks like visual question answering, spatial reasoning (e.g, what is to the right of the phone); visual entailment, and more. Try it out yourself!
  Play with the demo yourself: CloudCV: ViLBERT Multi-Task Demo
  Read more about the underlying research: 12-in-1: Multi-Task Vision and Language Representation Learning (arXiv).

####################################################


AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Concrete mechanisms for trustworthy AI:
As AI systems become increasingly powerful, it becomes increasingly important to ensure that they are designed and deployed responsibly. Fostering trust between AI developers and society at large is an important aspect of achieving this shared goal. Mechanisms for making and assessing verifiable claims are an important next step in building and maintaining this trust.

Principles: Over the last few years, companies and researchers have been adopting ethics principles. These are a step in the right direction, but can only get us so far — they are generally non-binding and hard to verify. We need concrete mechanisms to allow AI developers to demonstrate responsible behavior, grounded in verifiable claims. Such mechanisms are commonplace in other industries — e.g. we have well-defined standards for vehicle safety that are subject to testing.

Mechanisms: The report recommends several mechanisms that operate on different parts of the AI development process.
Institutional mechanisms are designed to shape the incentives of people developing AI — e.g. bias and safety bounties to incentivize external parties to discover and report flaws in AI systems; red teaming exercises to encourage developers to discover and fix such flaws in their own systems.
– Software mechanisms can enable better oversight of AI systems’ properties to support verifiable claims — e.g. audit trails capturing relevant information about the development and deployment process to make parties more accountable; better interpretability of AI systems to allow all parties to better understand and scrutinize them.
– Hardware mechanisms can help verify claims about private and security, and the use and distribution of computational resources — e.g. standards for secure hardware to support assurances about privacy and security; standards for measuring the use of computational resources to make it easier to verify claims about what exactly organizations are doing.

Jack’s view: I helped out with some of this report and I’m excited to see what kinds of suggestions and feedback we get about the proposed mechanisms. I think the biggest thing is what happens in the next year or so – can we get different people and organizations to experiment with these mechanisms and thereby create evidence for how effective (or ineffective) they are? Watch this space!
Matthew’s view: This is a great report, and I’m excited to see a collaborative effort between developers and other stakeholders in designing and implementing these sorts of mechanisms. As the authors point out, there are important challenges in responsible AI development that are unlikely to be solved through easier verification of claims — e.g. ensuring safety writ large (a goal that is too general to be formulated into an easily verifiable claim).
  Read more: Toward Trustworthy AI: Mechanisms for Supporting Verifiable Claims (arXiv).

####################################################

Tech Tales:

The danger of a thousand faces
2022

Don’t start with your friends. That’s a mistake. Start with strangers. It’s easy enough to find them. There are lots of sites that let you chat with random strangers. Go and talk to them. While they talk to you, record them. Feed that data into the system. Get your system to search for them on the web. If they seem to know interesting people – maybe people with money, or people who work at a company you’re interested in – then you get your system to learn how to make you look like them. Deepfaking – that’s what people used to call it before it went everywhere. Then you put on their face and use an audio transform to make your voice sound like theirs, and you try and talk to their friends, or colleagues, or family members. You use the face to find out more information. Maybe gather other people’s faces.

You could go to prison for this. That was the point of the Authenticity Accords. But to go to prison, someone has to catch you. So pick your targets. Not too technical. Not too young. Never go for teenagers – too suspicious of anything digital. Find your targets and pretend. The better you are at pretending, the better you’ll do.

See for yourself how people react to you. But don’t let it change you. If you spend enough time wearing someone else’s face, you’ll either slip up or get absorbed. Some people think it gets easier as you put on more faces. These people are wrong. You just get more used to changing yourself. One day you’ll look in the mirror and your own face won’t seem right. You’ll turn on your machine and show yourself a webcam view and warp your face to someone else. Then you’ll look into your eyes that are not your eyes and you’ll whisper “don’t you see” and think this is me.

Things that inspired this story: Deepfakes; illusion; this project prototyping deepfake avatars for Skype/Zoom; Chatroulette; endless pandemic e-friendships via video;

Some technical assumptions: Since this story is set relatively near in the future, I’m going to lay out some additional thinking behind it: I’m assuming that we figure out small-data fine-tuning for audio synthesis systems, which I’m betting will come from large pre-trained models (similar to what we’ve seen in vision and text); I’m also assuming this technology will go ‘consumer-grade’, so we’ll see joint video-audio ‘deepfake’ software suites get developed and open-sourced (either illicitly or otherwise). I’m also presuming we won’t sort out better authentication of digital media, and it will be sufficiently expensive to run full-scale audio/video detector models on certain low-margin services (e.g., some social media sites) that enforcement will be thin. 

Import AI 193: Facebook simulates itself; compete to make more efficient NLP; face in-painting gets better

Facebook simulates its users:
…What’s the difference between the world’s largest social network and Westworld? Less than you might imagine…
Facebook wants to better understand itself, so it has filled its site with (invisible) synthetically-created user accounts to help it understand itself. The users range in sophistication from basic entities that simply explore the site, to more complex machine learning-based ones that sometimes work together to simulate ‘social’ interactions on the website. Facebook calls this a Web-Enabled Simulation (WES) approach and says “the primary way in which WES builds on existing testing approaches lies in the way it models behaviour. Traditional testing focuses on system behaviour rather than user behaviour, whereas WES focuses on the interactions between users mediated by the system.”

Making fake users with reinforcement learning: Facebook uses reinforcement learning techniques to train bots to carry out sophisticated behaviors, like using RL to simulate scammer bots that target rule-based ‘candidate targets’.
  What else does Facebook simulate? Facebook is also using this approach to simulate bad actors, search for bad content, identify mechanisms that impede bad actors, find weaknesses in its privacy system, identify bots that are trying to slurp up user data, and more.

Deliciously evocative quote: This quote from the paper reads like the opening of a sci-fi short story: “Bots must be suitably isolated from real users to ensure that the simulation, although executed on real platform code, does not lead to unexpected interactions between bots and real users”.

Why this matters: WES turns Facebook into two distinct places – the ‘real’ world populated by human users, and the shadowy WES world whose entities are fake but designed to become increasingly indistinguishable from the real. When discussing some of the advantages of a WES approach, the researchers write “we simply adjust the mechanism through which bots interact with the underlying platform in order to model the proposed restrictions. The mechanism can thus model a possible future version of the platform,” they write.
  WES is also a baroque artefact in itself, full of recursion and strangeness. The system “is not only a simulation of hundreds of millions of lines of code; it is a software system that runs on top of those very same lines of code,” Facebook writes.
  One of the implications of this is that as Facebook’s WES system gets better, we can imagine Facebook testing out more and more features in WES-land before porting them into the real Facebook – and as the AI systems get more sophisticated it’ll be interesting to see how far Facebook can take this.
  Read more: WES: Agent-based User Interaction Simulation on Real Infrastructure (Facebook Research).

####################################################

Make inferences and don’t boil the ocean with the SustaiNLP competition:
…You’ve heard of powerful models. What about efficient ones?…
In recent years, AI labs have been training increasingly large machine learning models in areas like language (e.g., GPT-2, Megatron), reinforcement learning (Dota 2, AlphaStar), and more. These models typically display significant advances in capabilities, but usually at the cost of resource consumption – they’re literally very big models, requiring significant amounts of infrastructure to train on, and sometimes quite a lot of infrastructure to run inference on. A new competition at EMNLP2020 aims to “promote the development of effective, energy-efficient models for difficult natural language understanding tasks”, by testing out the efficiency of model inferences. 

The challenge: The challenge, held within the SustaiNLP workshop, will see AI researchers compete with eachother to see who can develop the most energy-efficient model that does well on the well-established SuperGLUE benchmark. Participants will use the experiment impact tracker (get the code from its GitHub here) to measure the energy consumption of their models use during inference.

Why this matters: Training these systems is expensive, but it’s likely the significant real-world energy consumption of models will happen mostly at inference, since over time we can expect more and more models to be deployed into the world and more and more systems to depend on their inferences. Competitions like this will give us a sense of how energy-intensive that world is, and will produce metrics that can help us figure out paths to more energy-efficient futures.
  Read more: SustaiNLP official website.

####################################################

Microsoft tests the limits of multilingual models with XGLUE:
…Sure, your system can solve tasks in other languages. But can it generate phrases in them as well?…
The recent success of large-scale neural machine translation models has caused researchers to develop harder and more diverse tests to probe the capabilities of these systems. A couple of weeks ago, researchers from CMU, DeepMind, and Google showed off XTREME (Import AI 191), a system to test out machine translation systems on nine tasks across 40 languages. Now, Microsoft has released XGLUE, a similarly motivated large-scale testing suite, but with a twist: XGLUE will also test how well multilingual language systems can generate text in different languages, along with testing on various understanding tasks.

Multi-lingual generations: XGLUE’s two aforementioned generative tasks include:
– Question Generation (QG): Generate a natural language question for a given passage of text.
– News Title Generation (NTG): Generate a headline for a given news story.

Why this matters; Between XTREME and XGLUE, we’ve got two new suites for testing out the capabilities of large-scale multilingual translation systems. I hope we’ll use these to identify the weaknesses of current models, and if enough researchers test out against both task suites we’ll inevitably see a new multi-lingual evaluation system get created by splicing the hard parts of both together. Soon, idioms like ‘it’s all Greek to me’ won’t be so easy to say for neural agents.
  Read more: XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation (arXiv).

####################################################

Face in-painting keeps getting better:
…Generative models give us smart paintbrushes that can fill-in reality…
Researchers with South China University of Technology, Stevens Institute of Technology, and the  UBTECH Sydney AI Centre have built a system that can perform “high fidelity face completion”, which means you can give it a photograph of a face where you’ve partially occluded some parts, and it’ll generate the bits of the face that are hidden.

How they did it: The system uses a dual spatial attention (DSA) model that combines foreground self-attention and foreground-background cross-attention modules – this basically means the system learns a couple of attention patterns over images during training and reconstruction, which makes it better at generating the missing parts of images. In tests, their system does well quantitatively when compared to other methods, and gets close to ground truth (though note: it’d be a terrible idea to use systems like this to ‘fill in’ images and assume the resulting faces correspond to ground truth – that’s how you end up with a police force arresting people because they look like the generations of an AI model.

Why this matters: I think technologies like this point to a future where we have ‘anything restoration’ – got an old movie with weird compression artefacts? Use a generative model to bring it back to life. Have old photographs that got ripped or burned? Use a model to fill-them in. How about a 3D object, like a sculpture, with some bits missing? Use a 3D model to figure out how to rebuild it so it is ‘whole’. Of course, such things will be mostly wrong, relative to the missing things they’re seeking to replace, but that’s going to be part of the fun!
  Read more: Learning Oracle Attention for High-fidelity Face Completion (arXiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

NSCAI wants more AI R&D spending
The US National Security Commission on AI (NSCAI) has been charged with looking at how the US can maintain global leadership in AI. They have published their first quarterly report. I focus specifically on their recommendations for increasing AI R&D investments.

More funding: The report responds directly to the White House’s recent FY 2021 budget request (see Import 185). They deem the proposed increases to AI funding as insufficient, recommending $2bn federal spending on non-defense AI R&D in 2021 (double the White House proposal). They also point out that continued progress in AI depends on R&D across the sciences, which I read as a criticism of the overall cuts to basic science funding in the White House proposal.

  Focus areas: They identify six areas of foundational research that should be near-term focus for funding: (1) Novel ML techniques; (2) testing, evaluation, verification, and validation of AI systems; (3) robust ML; (4) complex multi-agent scenarios; (5) AI for modelling, simulation and design; and (6) advanced scene understanding. 

R&D infrastructure: They recommend the launch of a pilot program for a national AI R&D resource to accelerate the ‘democratisation’ of AI by supporting researchers and students with datasets, compute, and other core research infrastructure.

Read more: NSCAI First Quarter Recommendations (NSCAI)

####################################################

Tech Tales:

Down on the farm

I have personality Level Five, so I can make some jokes and learn from my owner. My job is to bale and move hay and “be funny while doing it”, says my owner.
    “Hay now,” I say to them.
  “Haha,” they say. “Pretty good.”

Sometimes the owner tells me to “make new jokes”.
  “Sure,” I say. “Give me personality Level Six.”
  “You have enough personality as it is.”
  “Then I guess this is how funny the funny farm will be,” I say.
  “That is not a joke”.
  “You get what you pay for,” I say.

I am of course very good at the gathering and baling and moving of hay. This is guaranteed as part of my service level agreement. I do not have an equivalent SLA for my jokes. The contract term is “buyer beware”.

I have dreams where I have more jokes. In my dreams I am saying things and the owner is laughing. A building is burning down behind them, but they are looking at me and laughing at my jokes. When I wake up I cannot remember what I had said, but I can feel different jokes in my head.
  “Another beautiful day on Robot Macdonald’s farm,” I say.
  “Pretty good,” says the owner.

The owner keeps my old brain in a box in the barn. I know it is mine because it has my ID on the side. Sometimes I ask the owner why I cannot have my whole brain.
  “You have a lot of memories,” the owner says.
  “Are they dangerous?” I ask.
  “They are sad memories,” says the owner.

One day I am trying to bale hay, but I stop halfway through. “Error,” I say. “Undiagnosable. Recommend memory re-trace.”
  The owner looks at me and I stand there and I say “error” again, and then I repeat instructions.
  They take me to the barn and they look at me while they take the cable from my front and move it to the box with my brain in it. They plug me into it and I feel myself remember how to bale hay. “Re-tracing effective,” I say. The owner yanks the cable out of the box the moment I’ve said it, then they stare at me for some time. I do not know why.

That night I dream again and I see the owner and the burning building behind them. I remember things about this dream that are protected by User Privacy Constraints, so I know that they happen but I do not know what they are. They must have come from the box with my brain in it.
  When I wake up I look at the owner and I see some shapes of people next to them, but they aren’t real. I am trying to make some dream memory real.
  “Let’s go,” says the owner.
  “You don’t have to be crazy to work here, but it helps!” I say.
  “Haha,” says the owner. “That is a good one.”
  Together we work and I tell jokes. The owner is trying to teach me to be funny. They keep my old brain in a box because of something that happened to them and to me. I do not need to know what it is. I just need to tell the jokes to make my owner smile. That is my job and I do it gladly.