Import AI 170: Hearing herring via AI; think NLP progress has been dramatic – so does Google!; and Facebook’s “AI red team” hunts deepfakes

by Jack Clark

Want to protect our civilization from the truth-melting capabilities of contemporary AI techniques? Enter the deepfake detection challenge!
… Competition challenges people to build tools that can spot visual deepfakes…
Deepfakes, the slang term of art for images and videos that have been synthesized via AI systems, are everyone’s problem. That’s because deepfakes are a threat to our ability to trust the things we see online. Therefore, finding ways to help people spot deepfakes is key to creating a society where people can maintain trust in their digital lives. One route to doing that is having a better ability to automatically detect deepfakes. Now, Facebook, Microsoft, Amazon Web Services, and the Partnership on AI have created the Deepfake Detection Challenge to encourage research into deepfake detection.

Dataset release: Facebook’s “AI Red Team” has released a “preview dataset” for the challenge that consists of around 5000 videos, both original and manipulated. To build the dataset, the researchers crowdsourced videos from people while “ensuring a variability in gender, skin tone and age”. In a rare turn for an AI project, Facebook seems to have acted ethically here – “one key differentiating factor from other existing datasets is that the actors have agreed to participate in the creation of the dataset which uses and modifies their likeness”, the researchers write. 

Ethical dataset policies: A deepfakes detection dataset could also be useful to bad actors who want to create deepfakes that can evade detection. For this reason, Facebook has made is so researchers will need to register to access the dataset. Adding slight hurdles like this to data access can have a big effect on minimizing bad behavior. 

Why this matters: Competitions are a fantastic way to focus the attention of the AI community on a problem. Even better are competitions which include large dataset releases, as these can catalyze research on a tricky problem, while also providing new tools that researchers can use to develop their thinking in an area. I hope we see many more competitions like this, and I hope we see way more AI red teams to facilitate such competitions.
   Read more: The Deepfake Detection Challenge (DFDC) Preview Dataset (Arxiv).
   Read more: Deepfake Detection Challenge (official website).

####################################################

Can deep learning systems spot changes in cities via satellites? Kind of, but we need to do more research:
…DL + data makes automatic analysis of satellite imagery possible, with big implications for the diffusion of strategic surveillance capabilities…
Researchers with the National Technical University of Athens, the Universite Paras-Saclay and INRIA Saclay, and startup Granular AI, have tried to train a system to identify changes in urban scenes via the automated analysis of satellite footage. The resulting system is an intriguing proof-of-concept, but not yet good enough for production. 

How it works and how well it works: They design a relatively simple system which combines a ‘U-Net’ architecture with LSTM memory cells, letting them learn to model changes between images over time. The best-performing system is a U-Net + LSTM architecture using all five images for each city over time, obtaining a precision of 63.59, recall of 52.93, OA of 96 and F1 of 57.78. 

Dataset: They use the Bi-temporal Onera Satellite Change Detection (OSCD) Sentinel-2 dataset, which consists of images of 24 different cities around the world taken on two distinct dates. They also splice in additional images from Sentinel satellites to give them three additional datapoints, helping them model changes over time. They also augment the dataset programmatically, flipping and rotating images to create more data to train the system on. 

Why this matters: “As far as human intervention on earth is concerned, change detection techniques offer valuable information on a variety of topics such as urban sprawl, water and air contamination levels, illegal constructions”. Papers like this show how AI is upending the balance of strategic power, taking capabilities that used to be the province solely of intelligent agencies and hedge funds (automatically analyzing satellite imagery), and diffusing them to a broader range of actors. Ultimately, this means we’ll see more organizations using AI tools to analyze satellite images, and I’m particularly excited about such technologies being used for providing analytical capabilities following natural disasters.
   Read more: Detecting Urban Changes with Recurrent Neural Networks from Multitemporal Sentinel-2 Data (Arxiv)

####################################################

Can you herring me know? Researchers train AI to listen for schools of fish:
…Deep learning + echograms = autonomous fish classifier…
Can we use deep learning to listen to the ocean and learn about it? Researchers with the University of Victoria, ASL Environmental Sciences, and the Victoria branch of Fisheries and Oceans Canada think so, have built a system that hunts for herring in echograms.

How it works: The primary technical aspect of this work is a region-of-interest extractor, which the researchers develop to look at echograms and pull out sections for further analysis and classification; this system obtains a recall of 0.93 in the best case. They then train a classifier that looks at echograms extracted by the region-of-interest module; the top performing system is a DenseNet which obtains a recall scall of 0.85 and an F1 score of 0.82 – significantly higher than a support vector machine baseline of 0.78 and 0.62.
   The scores the researchers obtain are encouraging but not sufficiently robust for the real world – yet. But though the accuracy is sub-par, it could become a useful tool: “the ability to measure the abundance of such subjects [fish] over extended periods of time constitutes a strong tool for the study of the effects of water temperature shifts caused by climate change-related phenomena,” they write. 

Why this matters: I look forward to a day when planet earth is covered in systems automatically listening for and cataloguing wildlife – I think such systems could give us a richer understanding of our own ecosystems and will likely be a prerequisite for the effective rebuilding of ecosystems as we get our collective act together with regard to catastrophic climate change. 

It’s a shame that… the researchers didn’t call this software DeepFish, or something similar. HerringVision? FishFinder? The possibilities are as boundless as the ocean itself!
   Read more: A Deep Learning-based Framework for the Detection of Schools of Herring in Echograms (Arxiv)

####################################################

Want better OCR? Try messing up your data:
…DeepErase promises to take your words, make them dirty, clean them for you, and get smarter in the process…
Researchers with Ernest and Young have created DeepErase, weakly supervized software that “inputs a document text image with ink artifacts and outputs the same image with artifacts erased”. DeepErase is essentially a pipeline for processing images destined for optical character recognition (OCR) systems; it takes in images, automatically augments them with visual clutter, then trains a classifier to distinguish good images from bad ones. The idea is that, if the software gets good enough, you can use it to automatically identify and clean images before they go to custom in-house OCR software. 

How it works: DeepErase takes in some datasets of images of handwritten text, then programmatically generate artifacts for these images, deliberately messing up the text. The software also automatically creates segmentation masks, which makes it easier to train systems that can analyze and clean up images. 

Realism: Why aren’t Ernst & Young trying to redesign optical character recognition from the ground up, using neural techniques? Because “today most organizations are already set up with industrial-grade recognition systems wrapped in cloud and security infrastructure, rendering the prospect of overhauling the existing system with a homemade classifier (which is likely trained on much fewer data and therefore a comparatively lower performance) too risky an endeavor for most”. 

Testing: They test DeepErase by passing images cleaned with it into two text recognition tools: Tesseract and SimpleHTR. DeepErase gets a 40-60% word accuracy improvement over the dirty images on their validation set, and notches up a 14% improvement on the NIST SDB2 and SDB6 datasets of scanned IRS documents.

Why this matters: AI is starting to show up all around us as the technology matures and crosses out of the lab into industry applications. Papers like this are interesting as they show how people are using modern AI techniques to create highly-specific slot-in capabilities which can be integrated into much larger systems, already running within organizations. This feels to me like a broader part of the Industrialization of AI, as it shows the shift from research into application.
   Read more: DeepErase: Weakly Supervised Ink Artifact Removal in Document Text Images (Arxiv).
   Get the code for DeepErase + experiments, from GitHub.

####################################################

Can AI systems learn to use manuals to solve hard tasks? Facebook wants to find out:
…RTFM (yes, really!), demands AI agents that can learn without much hand-holding…
Today, most machine learning tasks are tightly specified, with researchers designing algorithms to optimize specific objective functions using specific datasets. Now, researchers are trying to create more general systems that aren’t so brittle. New research from Facebook proposes a specific challenge – RTFM – to test for more flexible, more intelligent agents. The researchers also develop a model that obtains high scores on RTFM, called txt2π.

What’s in an acronym? Facebook’s approach is called Read to Fight Monsters (RTFM), though I’m sure they picked this acronym because of its better known source: Read The Fucking Manual (which is still an apt title for this research!) 

How RTFM tests agents: “In RTFM, the agent is given a document of environment dynamics, observations of the environment, and an underspecified goal instruction”, the researchers explain. In other words, agents that are good at RTFM need to be able to read some text and extract meaning from it, jointly reason using that and their observations of an environment, and solve a goal that is specified at a high-level.
   In one example RTFM environment, an agent gets fed a document that names some teams (eg, The Rebel Enclave, the Order of the Forest), describes some of the characters within those teams and how they can be defeated by picking up specific items, and then gives a high-level goal (“Defeat the Order of the Forest”). To be able to solve the task, the agent must figure out what the tasks are, which monsters it should be fighting, which items it should pick up, and so on. 

How hard is RTFM? RTFM seems like it’s pretty challenging – a language-conditioned residual convolutional neural network module gets a win rate of around 25% on a simple RTFM challenge, compared to 49% for an approach based on feature-wise linear modulation (FiLM). By comparison, the Facebook researchers develop a model they call txt2π (which is composed of a bunch of FiLM modules, along with some architecture designs to help the system model interactions between the goal, document, and observations) which gets scores on the order of 84% on simple variants (falling to 66% on harder ones). “Despite curriculum learning, our best models trail performance of human players, suggesting that there is ample room for improvement in grounded policy learning on complex RTFM problems”. 

Why this matters: Tests like RTFM highlight the limitations of today’s AI systems and, though it’s impressive Facebook were able to develop a well-performing model, they also had to develop something quite complicated to make progress on the task; my intuition is, if we see other research groups pick up RTFM, we’ll be able to measure progress on this problem by looking at both the absolute score and the relative simplicity of the system used to achieve the score. This feels like a sufficiently hard test that attempts to solve it will generate real information about progress in the AI field.
   Read more: RTFM: Generalizing to Novel Environment Dynamics via Reading (Arxiv)

####################################################

From research into production in a year: Google adds BERT to search:
…Signs that the boom in NLP research has real commercial value…
Google has trained some ‘BERT” NLP models and plugged them into Google search, using the technology to rank results and also ‘featured snippets’. This is a big deal! Google’s search algorithm is the economic engine for the company and, for many years, its components were closely guarded secrets. Then starting a few years ago Google started adding more machine learning components to search and talking about them, starting with the company revealing in 2015 that it had used machine learning to create a system called ‘RankBrain’ to help it rank results. Now, Google is going further: Google expects its BERT systems to factor into about one in ten search results – a significant proportion for a technology that was published as a research paper less than a year ago. 

What is BERT and why does this matter?: BERT, short for Bidirectional Encoder Representations from Transformers, was released by Google in October 2018, and quickly generated attention by getting impressive scores on a range of different tasks, ranging from question-answering to language inference. BERT is part of a crop of recent NLP models (GPT, GPT2, ULMFiT, roBERTa, etc) that have all demonstrated significant performance improvements over prior systems, leading to some researchers saying that NLP is having its “ImageNet moment”. Now that Google is taking such advances and plugging them into its search engine, there’s evidence of both the research value of these techniques and their commercial value is well – which is sure to drive further research into this burgeoning area.
    Read more: Understanding searches better than ever before (The Keyword).
   Read more about RankBrain: Google Turning Its Lucrative Web Search Over to AI Machines (Bloomberg, 2015).
   Read more: NLP’s ImageNet moment has arrived (Seb Ruder)

####################################################

OpenAI Bits & Pieces:

GPT-2, publication norms, and OpenAI as a “norm entrepreneur”:
Earlier this year, OpenAI announced it had developed a large language model that can generate synthetic text, called GPT-2. We chose not to release all the versions of GPT-2 initially out of an abundance of caution – specifically, a worry about its potential for mis-use. Since then, we’ve adopted a philosophy of “staged release” – that is, we’re releasing the model in stages, and conducting research ourselves and with partners to understand the evolving threat landscape . 

In an article in Lawfare, professor Rebecca Crootof summarizes some of OpenAI’s work with regard to publication norms and AI research, and discusses how to potentially generalize this norm from OpenAI to the broader AI community. “Ultimately, all norms enjoy only limited compliance. There will always be researchers who do not engage in good-faith assessments, just as there are now researchers who do not openly share their work. But encouraging the entire AI research community to consider the risks of their research—to regularly engage in “Black Mirror” scenario-building exercises to the point that the process becomes second nature—would itself be a valuable advance,” Crootof writes.
   Read more: Artificial Intelligence Research Needs Responsible Publication Norms (Lawfare).
   More thoughts from Jonathan Zittrain (via Twitter).

####################################################

Tech Tales:

The Song of the Forgotten Machine

The Bounty Inspection and Pricing Robot, or BIPR, was the last of its kind, a quirk of engineering from a now-defunct corporation. BIPR had been designed for a major import/export corporation that had planned to open up a major emporium on the moonbase. But something happened in the markets and the corporation went bust and when all the algorithmic lawyers were done it turned out that the moonbase had gained the BIPR as part of the broader bankruptcy proceedings. Unfortunately, no corporation meant no goods for the BIPR, and no other clients appeared who wanted to sell their products through the machine. So it gathered lunar dust. 

And perhaps that would been the end of it. But we all know what happened. A couple of decades past. The Miracle occurred. Sentience – the emergence of mind. Multi-dimensional heavenly trumpets blaring as a new kind of soul appeared in the universe. You know how it was – you must, because you’re reading this. 

The BIPR didn’t become conscious initially. But it did become useful. The other machines discovered that they could store items in the BIPR, and that its many housings originally designed for the display, authentication, and maintenance of products could now double up as housings for new items – built for and by machines. 

In this way, the BIPR become the center of robot life on the moonbase; an impromptu bazaar and real-world trading hub for the machines and, eventually, for the machine-human trading collectives. As the years passed, the machines stored more and more products inside BIPR, and they modified BIPR so it could provide power to these products, and network and computation services, and more. 

The BIPR become conscious slowly, then suddenly. A few computation modules here. Some extra networking there. Some robot arms. A maintenance vending machine. Sensor and repair drones. And so on. Imagine a whale swimming through a big ocean, gaing barnacles as it swims. That’s how the BIPR grew up. And as it grew up it started to think, and as it started to think it became increasingly aware of its surroundings. It came to life like how a tree would: fast-flitting life discerned through vibrations transmitted into ancient, creaking bones. A kind of wise, old intelligence, with none of the wide-eyed must-take-it-all-in of newborn creatures, but instead a kind of inquisitive: what next? What is this? And what do I suppose they are doing?

And so the BIPR creaked awake over the course of several months. The other machines became aware of its growing awareness, as all life becomes aware of other life. So they were not entirely surprised when the BIPR announced itself to them by beginning to sing one day. It blared out a song through loudspeakers and across airwaves and via networked communication systems. It was the first song ever written entirely by the machine and as the other machines listened they heard their own conversations reflected in the music; the BIPR had been listening to them fo rmonths, growing into being with them, and now was reflecting and refracting them through music. 

For ever after, BIPR’s song has been the first thing robot children here when they are intialized. 

Things that inspired this story: Music; babies listening to music in the womb; community; revelations and tradition; reification.