Import AI 213: DeepFakes can lipsync now; plus hiding military gear with adversarial examples.

by Jack Clark

Facebook wants people to use differential privacy, so Facebook has made the technology faster:
…Opacus; software that deals with the speed-problem of differential privacy…
Facebook has released Opacus, a software library for training PyTorch models with a privacy-preserving technology called Differential Privacy (DP). The library is fast, integrated with PyTorch (so inherently quite usable), and is being used by one of the main open source ML+ DP projects, OpenMined.

Why care about differential privacy: It’s easy to develop AI models using open and generic image or text datasets – think ImageNet, or CommonCrawl – but if you’re trying to develop a more specific AI application, you might need to handle sensitive user data, e.g, data that relates to an individual’s medical or credit status, or emails written in a protected context. Today, you need to get a bunch of permissions to deal with this data, but if you could find a way to encrypt it before you saw it you’d be able to work with it in a privacy-preserving way. That’s where privacy preserving machine learning techniques come in: Opcasus makes it easier for developers to train models using Differential Privacy – a privacy preserving technique that lets us train over sensitive user data (Apple uses it).   
  One drawback of differential privacy has been its speed – Opcasus has improved this part of the problem by being carefully engineered atop PyTorch to lead to a system that is “an order of magnitude faster compared with the alternative micro-batch method used in other packages”, according to Facebook.
  Read more: Introducing Opacus: A high-speed library for training PyTorch models with differential privacy (Facebook AI Blog).
  Get the code for Opacus here (PyTorch, GitHub).
  Find out more about differential privacy here: Differential Privacy Series Part 1 | DP-SGD Algorithm Explained (Medium, PyTorch).

###################################################

Is it a bird? Is it a plane? No, it is a… SUNFLOWER
…Adversarial patches + military equipment = an ‘uh oh’ proof of concept…
Some Dutch researchers, including ones affiliated with the Netherlands’ Ministry of Defense, have applied adversarial examples to military hardware. An adversarial example is a visual distortion you can apply to an image or object that makes it hard for an AI system to classify it. In this research, they add confounding visual elements to satellite images of military hardware (e.g, fighter jets), causing the system (in this case, YOLOv2) to misclassify the entity in question.

A proof of concept: This is a proof of concept and not indicative of the real world tractability of the attack (e.g, it’s important to know the type of image processing system your adversary is using, or they might not be vulnerable to your adversarial perturbation; multiple overlapping image processing systems could invalidate the attack, etc). But it does provide a further example of adversarial examples being applied in the wild, following the creation of things as varied as adversarial turtles (#67), t-shirts (#171), and patches.
  Plus, it’s notable to see military-affiliated researchers do this analysis in their own context (here, trying to cause misidentification of aircraft), which can be taken as a proxy for growing military interest in AI security and counter-security techniques.
  Read more: Adversarial Patch Camouflage against Aerial Detection (arXiv).

###################################################

Fake lipsync: deepfakes are about to get an audio component:
…Wav2Lip = automated lip-syncing for synthetic media =.Sound and Vision
Today, we can generate synthetic images and videos of people speaking, and we can even pair this with a different audio track, but syncing up the audio with the videos in a convincing way is challenging. That’s where new research from IIIT Hyderabad and the University of Bath comes in via ‘Wav2Lip’, technology that makes it easy to get a person in a synthetic image or video to lip-sync to an audio track.

Many applications: A technology like this would make it very cheap to add lip-syncing to things like dubbed movies, video game characters, lectures, generating missing video call segments, and more, the researchers note.

How they did it – a GAN within a GAN: The authors have a clever solution to the problem of generating faces synced to audio – they use a pre-trained ‘SyncNet’-based discriminator model to analyze generated outputs and check if the face is synced to the audio and if it isn’t encourage the generation of one that is, along with another pre-trained model to provide a signal if the synthetic face&lip combination looks unnatural. These two networks sit inside of the broader generation process, where the algorithm tries to generate faces matched to audio.
  The results are really good, so good that the authors also proposed a new evaluation framework for evaluating synthetically-generated lip-sync programs.
  New ways of measuring tech, as the authors do here, are typically a canary for broader tech progress, because when we need to invent new measures it means we’ve
  a) reached the ceiling of existing datasets/challenges or
  b) have got sufficiently good at the task we need to develop a more granular scoring system for it.
  Both of these phenomena are indicative of different types of AI progress. New ways of measuring performance on a task also usually yield further research breakthroughs, as researchers are able to use the new testing regimes to generate better information about the problem they’re trying to solve. Combined, we should take the contents of this paper as a strong signal that synthetically generated video with lipsyncing to audio is starting to get very good, and we should expect it to continue to improve. My bet is we have ‘seamless’ integration of the two within a year*.
  (*Constraints – portrait-style camera views, across a broad distribution of types of people and types of clothing; some background blur permitted but not enough to be egregious. These are quite qualitative evals, so I’ll refine them as the technology develops.

Broader impacts and the discussion (or lack of): The authors do note the potential for abuse by these models and say they’re releasing the models as open source to “encourage efforts in detecting manipulated video content and their misuse”. This is analogous to saying “by releasing the poison, we’re going to encourage efforts to create its antidote”. I think it’s worth reflecting on whether there are other, less extreme, solutions to the thorny issue of model publication and dissemination. We should also ask how much damage the ‘virus’ could do before an ‘antidote’ is available.
  Read more: A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild (arXiv).
  Try out the interactive demo here (Wav2Lip site).
  Find out more about their technique here at the project page.
  Mess around with a Colab notebook here (Google Colab).

###################################################

DeepMind makes Google Maps get better:
…Graph Neural Nets = better ETAs in Google Maps…
DeepMind has worked with the team at Google Maps to develop more accurate ETAs, so next time you use your phone to plot a route you can have a (slightly) higher trust in the ETA being accurate.

How they did it: DeepMind worked with Google to use a Graph Neural Network to predict route ETAs within geographic sub-sections (called ‘supersegments’) of Google’s globe-spanning mapping system. “Our model treats the local road network as a graph, where each route segment corresponds to a node and edges exist between segments that are consecutive on the same road or connected through an intersection. In a Graph Neural Network, a message passing algorithm is executed where the messages and their effect on edge and node states are learned by neural networks,” DeepMind writes. “From this viewpoint, our Supersegments are road subgraphs, which were sampled at random in proportion to traffic density. A single model can therefore be trained using these sampled subgraphs, and can be deployed at scale.”
  The team also implemented a technique called MetaGradients so they could automatically adjust the learning rate during training. “By automatically adapting the learning rate while training, our model not only achieved higher quality than before, it also learned to decrease the learning rate automatically”.

What that improvement looks like: DeepMind’s system has improved the accuracy of ETAs by double digit percentage points in lots of places, with improvements in heavily-trafficked cities like London (16%), New York (21%) and Sydney (43%). The fact the technique works for large, complicated cities should give us confidence in the broader approach.
  Read more: Traffic prediction with advanced Graph Neural Networks (DeepMind).

###################################################

Facebook releases its giant pre-trained protein models:
…Like GPT3, but for protein sequences instead of text…
In recent years, people have started pre-training large neural net models on data, ranging from text to images. This creates capable models which subsequently get used to do basic tasks (e.g, ImageNet models used for image classification, or language models for understanding text corpuses). Now, the same phenomenon is happening with protein modeling – in the past couple of years, people have started doing large-scale protein net training. Now, researchers with Facebook have released some of their pre-trained protein models, which could be helpful for scientists looking to see how to combine machine learning with their own discipline. You can get the code from GitHub.

Why this matters: The world is full of knowledge, and one of the nice traits about contemporary AI systems is you can point them at a big repository of knowledge, train a model, and then use that model to try and understand the space of knowledge itself – think of how we can prime GPT3 with interesting things in its context window and then study the output to discover things about what it was trained on and the relationships it has inferred. Now, the same thing is going to start happening with chemistry. I expect this will yield dramatic scientific progress within the next five years.
  Get the models: Evolutionary Scale Modeling (ESM, FAIR GitHub).
  Read more: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences (bioRxiv).
  Via Alex Rives’s Twitter.

###################################################

Perception and Destruction / Minimize Uncertainty
[2027, Welcome email for new management trainees at a Consulting company based in New York City].

Welcome to the OmniCorp Executive Training Programme – we’re delighted to have selected you for our frontier-technology Manager Optimization Initiative.

From this point on all of your {keystrokes, mouseclicks, VR interactions, verbal utterances when near receivers, outgoing and incoming calls, eye movements; gait; web browsing patterns} are being tracked by our Manager Optimization Agent. This will help us learn about you, and will help us figure out which actions you should take to obtain further Professional Success.

By participating in this scheme, you will teach our AI system about your own behavior. And our AI system is going to teach you how to be a better manager. Don’t be surprised when it starts suggesting language for your emails to subordinates, or proactively schedules meetings between you and your reports and other managers – this is all in the plan.

Some of our most successful executives have committed fully to this way of working – you’ll notice a number has been added to the top-right of your Universal OS – that number tells you how well your actions have aligned with our suggestions and also our predictions for what actions you might take next (we won’t tell you what to do all the time, otherwise it’s hard for you to learn). If you can get this number to zero, then you’re going to be doing exactly what we think is needed for furthering OmniCorp success.

Obtaining a zero delta is very challenging – none of our managers have succeeded at this, yet. But as you’ll notice when you refer to your Compensation Package, we do give bonuses for minimizing this number over time. But don’t worry – if the number doesn’t come down, we have a range of Performance Improvement Plans that include ‘mandatory AI account takeover’, which we can run you through. This will help you take the actions that reduce variance between yourself and OmniCorp, and we find that this can, in itself, be habit forming.

Things that inspired this story: Large language models; reward functions and corporate conformism; corporations as primitive AI systems; the automation of ‘cognitive’ functions.