Import AI #97: Faking Obama and Putin with Deep Video Portraits, Berkeley releases a 100,000+ video self-driving car dataset, and what happens when you add the sensation of touch to robots.

by Jack Clark

Try a little tenderness: researchers add touch sensors to robots.
…It’s easier to manipulate objects if you can feel them…
Researchers with the University of California at Berkeley have added GelSight touch sensors to a standard 7-DoF Rethink Robotics ‘Sawyer’ robot with an attached Weiss WSG-50 parallel gripper to explore how touch inputs can improve performance at grasping objects – a crucial skill for robots to have if used in commercial settings.
  Technique: The researchers construct four sub-networks that operate over specific data inputs (camera image, two GelSight images to model texture senses before and after contact, and an action network that processes 3D motion, in-plane rotation, and change in force. They link these networks together within a larger network and train the resulting model over a dataset of objects. The researchers pre-train the image components of the network with a model previously trained to classify objects on ImageNet. The approach yields a model that adapts to novel surfaces, learns interpretable policies, and can be taught to apply specific constraints when handling an object, like grasping it gently. 
  Results: The researchers test their model and find that systems trained with vision and action inputs get 73.03% accuracy, compared to 79.34% for systems trained on tactile inputs and action, compared to 80.28% for systems trained with tactile and vision and action.
   Harder than you think: This task, like most that require applying deep learning components to real-world systems, contains a few quirks which might seem non-obvious from the outset, for example: “The robot only receives tactile input intermittently, when its fingers are in contact with the object and, since each re-grasp attempt can disturb the object position and pose, the scene changes with each interaction”.
  Read more: More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch (Arxiv).

Want 100,000 self-driving car videos? Berkeley has you covered!
…”The largest and most diverse open driving video dataset so far for computer vision research”., according to the researchers..
Researchers with the University of California at Berkeley and Nexar have published BDD100K, a self-driving car dataset which BDD100K contains ~120,000,000 images spread across ~100,000 videos. “Our database covers different weather conditions, including sunny, overcast, and rainy, as well as different times of day including daytime and nighttime,” they say. The dataset is substantially larger than ones released by the University of Toronto (KITTI), Baidu (ApolloScape), Mapillary, and others, they say.
DeepDrive: The dataset release is significant for where it comes from: DeepDrive, a Berkeley-led self-driving car research effort with a vast range of capable partners, including automotive companies such as Honda, Toyota, and Ford. DeepDrive was set up partially so its many sponsors could pool research efforts on self-driving cars, seeking to close an implicit gap with other players.
  Rich data: The videos are annotated with hundreds of thousands of labels for objects like cars, trucks, persons, bicycles, and so on, as well as richer annotations for road lines drivable areas, and more; they also provide a subset of roughly ~10,000 images with full-frame instance segmentation.
  Why it matters – the rise of the multi-modal dataset: The breadth of the dataset with its millions of labels and carefully refined aspects will likely empower researchers in other areas of AI, as well as its obvious self-driving car audience. I expect that in the future these multi-modal datasets will become increasingly attractive targets to use to evaluate transfer learning from other systems, for instance by training a self-driving car model in a rich simulated world then applying it to real-world data, such as BDD100K.
  Challenges: The researchers are hosting three challenges at computer vision conference CVPR relating to the dataset, and are asking groups to compete to develop systems for road object detection, drivable area prediction, and domain adaptation.
  Read more: BDD100K: A Large-scale Diverse Driving Video Database (Berkeley AI Research blog).

KPCB’s Mary Meeker breaks down AI’s rise and China’s possible advantage in annual presentation:
…Annual slide-a-thon shows rise of China, points to image and speech recognition scores as evidence for impact of AI…
Mary Meeker’s annual presentation of research serves as a useful refresher for what is front-of-mind for venture capitalists focused on understanding the dynamics that affect the technology ecosystem. This year, at Code Conference in California, Meeker’s slides were distinguished via large sections spent on China, combined with a few notable slides situating AI progress metrics (specifically in object recognition and speech recognition) in relation to the growth of new markets for business.
  Read more: Mary Meeker’s 2018 internet trends report: All the slides, plus analysis (Recode).


An incomplete timeline of dubious things that people have synthesized via AI
– Early 2017:
Montreal Startup Lyrebird launches with audio recording featuring synthesized voices of Donald Trump, Barack Obama, Hillary Clinton.
– Late 2017:
“DeepFakes” arrive on the internet via Reddit with a user posting pornographic movies with celebrity faces animated onto them. A consumer-oriented free editing application follows and DeepFakes rapidly proliferate across the internet, then consumer sites start to clamp down on them.
– 2018:
Belgian socialist party makes a video containing a synthesized Donald Trump giving a (fake) speech about climate change. Party says video designed to create debate and not trick viewers.
– Listen: Politicians discussing about Lyrebird (Lyrebird Soundcloud).
– Read more: DeepFakes Wikipedia entry.
– Read more: Belgian Socialist Party Circulates “Deep Fake” Donald Trump Video (Politico Europe).

Why all footage of all politicians is about to become suspect:
…Think fake news is bad now? ‘Deep Video Portraits’ will make it much, much worse…
A couple of years ago European researchers caused a stir with ‘face2face’, technology which they demonstrated by mapping their own facial expressions onto synthetically rendered footage of famous VIPs, like George Bush, Barack Obama, and so on. Now, new research from a group of American and European researchers has pushed this fake-anyone technology further, increasing the fidelity of the rendered footage, reducing the amount of data needed to construct such convincing fakes, and also dealing with visual bugs that would make it easier to identify the output as being synthesized.
  In their words: “We address the problem of synthesizing a photo-realistic video portrait of a target actor that mimics the actions of a source actor, where source and target can be different subjects,” they write. “Our approach enables a source actor to take full control of the rigid head pose, face expressions and eye motion of the target actor”. (Emphasis mine.)
  Technique: The technique involves a few stages: first, the researchers track the people within the source and target videos via a monocular face reconstruction approach, which allows them to extract information about the identity, head pose, expression, eye gaze, and scene lighting for each video frame. They also separately track the gaze of each subject. They then essentially transfer the synthetic renderings of the input actor onto the target actor and perform a couple of clever tricks to make the resulting output high fidelity and less prone to synthetic tells like visual smearing/blurring of the background behind the manipulated actor.
  Why it matters: Techniques like this will have a bunch of benefits for people working in media and CGI, but they’ll also be used by nation states, fringe groups, and extremists, to attack and pollute information spaces and reduce overall trust in the digital infrastructure of societal discourse and information transmittion. I worry that we’re woefully unprepared for the ramifications of the rapid proliferation of these techniques and applications. (And controlling the spread of such a technology is a) extremely difficult and b) of dubious practicality and c) potentially harmful to broader beneficial scientific progress.)
  An astonishing absence of consideration: I find it remarkable that the researchers don’t take time in the paper to discuss the ramifications of this sort of technology, given that they’re demonstrating it by doing things like transferring President Obama’s expressions onto Putin’s, or Obama’s onto Reagan’s. They make no mention of the political dimension to this work in their ‘Applications’ section, which focuses on the technical details of the approach and how it can be used for applications like ‘visual dubbing’ (getting an actor’s mouth movements to map to an audio track’.
  Read more: Deep Video Portraits (Arxiv).
  Watch video for details: Deep Video Portraits – SIGGRAPH 2018 (YouTube).

DARPA to host synthetic video/image competition:
..US defense-oriented research organization to try and push state-of-the-art in creation and detection of fakes…
Nation states have become aware of the tremendous potential for subterfuge that this technology poses and are reacting by dumping research money into both exploiting this for gain and for defending against it. This summer, DARPA will hold a competition to see who can create the most convincing synthetic images, and also to see who can detect them.
  Read more: The US military is funding an effort to catch deepfakes and other AI trickery (MIT Technology Review).

Import AI Job Alert: I’m hiring an editor/sub-editor:
  I’m hiring someone to initially sub-edit and eventually help edit the OpenAI blog. The role will be a regularly compensated gig which should initially take about 1.5-2 hours every week, typically at around 9pm Pacific Time on Sunday Nights. If you’d be interested in this then please send me an email telling me why you’d be a good fit. The ideal candidate probably has familiarity with AI research papers, attention to detail, and experience fiddling with words in a deadline-oriented setting. I’ve asked around among some journalists and the fair wage seems to be about $25 per hour.
  Additional requirements: You’d need to be available via real-time communication such as WhatsApp or Slack during the pre-agreed upon editing window. Sometimes I may need to shift the time a bit if I’m traveling, but I’ll typically have advance warning.
  Send emails with subject line “Import AI editing job: [Your Name]” to

Predicting cyber attacks on utilities with variational auto-encoders:
…All watched over by (variational) machines of loving grace…
Researchers with water infrastructure company Xylem Inc have tested out a variational auto-encoder (VAE)-based system for detecting cyberattacks on utility systems. The research highlights a feature of contemporary AI methods that is both a drawback and a strength: their excellence at curve-fitting in big, high-dimensional spaces. Here, the researchers use a VAE to train a model on past observations that represent 43 variables within a municipal water network. They then study how this model reacts to unforeseen changes in the system that might indicate a cyberattack: the model works better than rule-based systems, with the VAE spitting out a constant logarithm of reconstruction probability (LRP) which tends to diverge when the underlying system departs from the norm.
  Strengths: “The model relies solely on sensor reads data in their raw form and requires no preprocessing, system knowledge, or domain expertise to function. It is generic and can be readily applied to a broad array of ICS’s in various industry sectors.”
  Weaknesses: “It is not perfect and has its own requirements (e.g., availability of vast amount of system observations data) and drawbacks (e.g., sensitivity to rare but planned operations such as activation of emergency booster pumps),” they write. This highlights one of the weaknesses of the great curve-fitting power of contemporary AI techniques (Judea Pearl has argued that curve-fitting is pretty much all these systems are capable of), which is that they’re naive as to changing circumstances and lack the common sense to distinguish malice from action.
  Why it matters: Techniques like this are pretty crude but they indicate that there’s a basic value in training basic machine learning systems on data to spot anomalies. This research to me is mostly interesting due to its context – its researchers are all linked to a traditional ‘non-tech’ organization and the technology is tested against real-world data. Part of the virtue of publishing a paper like this is probably to help with hiring, as the researchers will be able to point prospective candidates to this paper as an indication for why Xylem is an interesting place to work. It’s possible to imagine a future where basic predicting models are layered into the data streams of every town and utility, providing an additional signal to human overseers.
  Read more: Cyberattack Detection using Deep Generative Models with Variational Inference (Arxiv).


AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: …

Google will not renew contract for Project Maven, plans to release principles for how it approaches military and intelligence contracting:
….Big Tech continues to grapple with ethical challenges around military AI…
Google announced internally on Friday that it would not be renewing the contract it had with the US military for Project Maven, an AI-infused drone surveillance platform, according to Gizmodo. Google also said it is drafting ethical principles for how it approaches military and intelligence contracting.
  Why it matters: Military uses of AI remains one of the most contentious issues for the industry, and society, to grapple with. Tech giants will have a role in setting standards for the industry at large. (One wonders how much of a role Google can play here in the USA now, given that it will now be viewed as deeply partisan by the DoD – Jack) Given that AI is a particularly powerful ‘dual-use’ technology, direct military applications may end up being one of the easier ethical dilemmas the industry faces in the future.
  Read more: Google plans not to renew Project Maven contract (Gizmodo).
  Read more: How a Pentagon Contract Became an Identity Crisis for Google (NYT).

UK public opposed to AI decision-making in most parts of public life:
…The Brits don’t like the bots…
The RSA and DeepMind have initiated a project to create ‘meaningful public engagement on the real-world impacts of AI’. The project’s first report includes a survey of UK public attitudes towards automated decision-making systems.
  Lack of familiarity: With the exception of online advertising (48% familiar), respondents were overwhelmingly unfamiliar with the use of automated decision-making in key areas. Only 9% were aware of its use in the criminal justice system, 14% in immigration, and 15% in the workplace.
  Opposition to AI decision-making: Most respondents were opposed to the usage of these methods in most parts of society. The strongest opposition was in the usage of AI in the workplace (60% opposed vs 11% support) and criminal justice (60% opposed vs. 12% support).
  What the public want: While 29% said nothing would increase their support for automated decision-making, the poll pointed to a few potential re-mediations that people would support:
  36%: The right to demand an explanation for an automated decision.
  33%: Penalties for companies failing to monitor systems appropriately.
  24%: A set of common principles guiding the use of such systems.
  Why it matters: The report notes that a public backlash against these technologies cannot be ruled out if issues are not addressed. The RSA’s proposal for public engagement via deliberative processes and ‘citizens’ juries’, if successful, could provide a model for other countries.
   Read more: Artificial Intelligence – Real Public Engagement (RSA).

Open Philanthropy Project launches AI Fellows program:
Over $1 million in funding for high-impact AI researchers…
The Open Philanthropy Project, the grant-making foundation funded by Cari Tuna and Dustin Moskovitz, is providing $1.1m in PhD funding for seven AI/ML researchers focused on minimizing potential risks from advanced AI systems. “Increasing the probability of positive outcomes from transformative AI”, is one of Open Philanthropy’s priorities.
  Read more: Announcing the 2018 AI Fellows.
  Read more: AI Fellowship Program.

OpenAI Bits & Pieces:

OpenAI Fellows:
We designed this program for people who want to be an AI researcher, but do not have a formal background in the field. Applications for Fellows starting in September are open now and will close on July 8th at 12AM PST.
  Read more: OpenAI Fellows (OpenAI Blog).

Tech Tales:

Walking Through Shadows That Feel Like Sand

Yes, people died. What else would you expect?

People walked off cliffs. People walked into the middle of the street. People left stoves on. People forgot to eat. People lost their minds.

Yes, we punished the people that caused these people to die. We punished these people with lawsuits or criminal sentences. Sometimes we punished them with both.

But the technology got better.

Less people died.

At some point the technology started saving more people than it killed.

People stopped short of ledges. People pulled back from traffic. People remembered to turn stoves off. People would eat just the right amount. People healed their minds.

Was it perfect? No. Nothing ever is.

Did we adopt it? Yes, as we always do.

Are we happy? Yes, most of us are. And the more of us that are happy, the more likely everyone is going to be happy.

Now, where are we? We are everywhere.

We wear these goggles and we get to choose what we see.
We wear these clothes that let us feel additional sensations to supplement or replace the world.
We have these chips in our eardrums that let us hear the world better than dogs, or hear nothing at all, or hear something else entirely.

We walk through city streets and get to feel the density of other people via vibrations superimposed onto our bodies by our clothes.
We watch our own pet computer programs climbing the synth-neon signs that hang off of real church steeples.
We see sunsets superimposed on one another and we can choose whenever to see them, even in the thick of night.

When we are sad we diffuse our sadness into the world around us and the world responds back with rising violins or crashing waves.
Sometimes when we are sad the sun and the moon cry with us.
Sometimes we feel cold tears on the backs of our necks from the stars.

We are many and always growing and learning. What we experience is our choice. But our world grows richer by the day and we feel the world receding, as though a masterpiece overlaid with other paints from other artists, growing richer by the moment.

We do not feel this world, this base layer, so much anymore.

We worry we do not understand the people that choose to stay within it.
We worry they do not understand why we choose to stay above it.

Things that inspired this story: Augmented Reality, Virtual Reality, group AR simulations such as Pokemon Go, touch-aware clothing, force feedback, cameras, social and class and technological diffusion dynamics of the 21st century, self-adjusting feedback engines, communal escapism, cults.