Import AI 173: Here come the Chinese “violence detection” systems; how to use GPT-2 to spot propaganda; + can Twitter deal with deepfakes?

by Jack Clark

What happens when CCTV cameras come with automatic violence detectors?
…”Real-World Fighting” dataset sketches out a future of automated surveillance…
Researchers with Duke Kunshan University and Sun Yat-sen University in China have developed RWF-2000, a new dataset of “violent” behaviors collected from YouTube. They also train a classifier on this dataset, letting them (with quite poor accuracy) automatically identify violent behaviors in CCTV videos.

   The Real-World Fighting (RWF) dataset consists of 2000 video clips captured by surveillance cameras, collected from YouTube. Each clip is 5-seconds long and half contain “violent behaviors” while the others do not. RWF is twice as large as the ‘Hockey Fight’ dataset, and roughly 10X as large as other datasets (Movies Fight and Crowd Violence) used in this domain. 

   Can your algorithm spot violence? In tests, the researchers develop a system (which they call a “Flow Gated Network”) to categorize violent versus non-violent videos. They get a test accuracy of approximately 86.75% on the RWF-2000 dataset, and scores of 98%, 100%, and 84.44% on the Hockey Fight, Movies Fight, and Crowd Violence datasets.

   Why this matters: It seems like the ultimate purpose of systems like this will be automated surveillance systems which spot so-called violent behavior and likely flag it to humans (or, eventually, drones/robots) to intercede. The technology will need to mature before it becomes useful enough to be used in production, but papers like this sketch out how with a lot of data and a little bit of determination it’s becoming pretty easy to create systems to perform video classification. The next question is: which organizations or states will deploy this technology first, and how will people feel about it when it is deployed?
   Read more: RWF-2000: An Open Large Scale Video Database for Violence Detection (Arxiv).
   Get the RWF-2000 dataset from here (GitHub).

####################################################

Canada refuses visas to visiting AI researchers:
…In repeat of 2018 NeurIPS, Canadian officials withhold visas from African AI researchers…
Last year Canada’s PM, Justin Trudeau, was asked at a press conference if he knew about the fact multiple AI researchers associated with “Black in AI” had been refused visas to enter the country for the annual NeurIPS conference. Trudeau said he’d look into it. Clearly, someone forgot to write a memo, as the same thing is happening again ahead of NeurIPS 2019. Black in AI told the BBC that it was aware of around 30 researchers who had so far been unable to enter the country. 

   “The importance cannot be overstressed. It’s more and more important for AI to build a diverse body,” Black in AI organizer Charles Onu told the BBC.
   Read more: Canada refuses visas to over a dozen African AI researchers (BBC News).

####################################################

Fine-tuning language models to spot (and generate) propaganda:
…FireEye, GPT-2 and the Russian Internet Research Agency…
Researchers with security company FireEye have used the GPT2 language model to make a system that can help identify (and potentially generate) propaganda in the style of Russia’s Internet Research Agency.

   Making a troll-spotter: For this project, the researchers fine-tune GPT-2 so it can identify and generate synthetic text in the style of the IRA. To do this, they gather millions of tweets attributed to the IRA, then fine-tune GPT-2 against them. After fine-tuning, their model can spit out some IRA-esque tweets (e.g, “It’s disgraceful that our military has to be in Syria & Iraq”, “It’s disgraceful that people have to waste time, energy to pay lip service to #Junk-Science #fakenews”, etc).

   Building a propaganda detector: Once you can use a language model to generate something, you can use that same language model to try and detect its own generations. That’s basically what they do here by fine-tuning GPT-2 on a few distinct IRA datasets, then seeing how well they can distinguish synthetic tweets from real tweets. In experiments, they’re able to build a detector that can accurately classify some of the tweets. “The fine-tuned classifier should generalize well to newly ingested social media posts,” they write, “providing analysts a capability they can use to separate signal from noise”.

   Why this matters: “GPT-2’s authors and subsequent researchers have warned about potential malicious use cases enabled by this powerful natural language generation technology, and while it was conducted here for a defensive application in a controlled offline setting using readily available open source data, our research reinforces this concern,” they write.
   Read more: Attention Is All They Need: Combating Social Media Information Operations with Neural Language Models (FireEye).

####################################################

AI for reading lips in the wild:
…How computers can help deaf people and double up for other applications enroute…
In Stanley Kubrick’s 2001: A Space Odyssey two of the astronauts retreat to a small pod to hide from HAL, a faulty AI system running the spaceship. Unfortunately for them, though HAL can’t hear their conversation from within the pod, it can see their lips through a window. HAL reads their lips and figures out their plan, leading to some fairly gruesome consequences. Today, we’re starting to develop AI systems capable of accurate lip-reading under constrained circumstances. Now, researchers with Imperial College London, the University of Nottingham and the Samsung AI Center have extended a lip-reading dataset to make it easier for people to train systems that can read lips under a variety of circumstances.

   Expanding LRW: To do this, the researchers use a technology called a 3D morphable model (3DMM) to augment the data in LRW, a popular lip-reading dataset. LRW contains 1,000 speakers saying more than 500 distinct words, with 800 utterances for each word. Through the use of 3DMM, they augment the faces in LRW so that each face gets tilted in 3D space, creating a training dataset with more variety than the original LRW. They call this new dataset LRW in Large Pose (LP).

   Learning to read lips: In experiments, the researchers are able to use the augmented dataset to train systems to about 80% accuracy. Lip-reading is a very hard problem, though, and they obtain performance of near-60% accuracy on the Lip Reading Sentences 2 (LRS2) database, which mostly consists of footage from BBC TV shows and news and is therefore “very challenging due to a large variation in the utterance length and speaker’s head pose”. They also show that their system yields significant improvements when applied on heads with poses tilted far away from front-facing the camera.

   Why this matters: Lip-reading is a classic omni-use AI technology – the technology will eventually aid the deaf or hard-of-hearing, but it will also be inevitably used for surveillance and, most likely, advertising as well. We should generally prepare for a world where – eventually – anything attached to a camera has the capability to have “human-equivalent” sensing capabilities for things like lip-reading. Society will drastically alter in response.
   Read more: Towards Pose-invariant Lip-Reading (Arxiv).

####################################################

How should Twitter deal with deepfakes?
The social media company wants to hear from YOU!…
Twitter is currently figuring out what policies it should adopt for how it treats synthetic and manipulated media on its platform. That’s a problem which is going to become increasingly urgent, as AI technologies for generating fake audio, images, text, and – soon – video – mature. So Twitter is asking for public feedback on what it should do about synthetic media and has shared some ideas for how it plans to approach the problem.

Twitter’s prescription for a safe synth media landscape: Twitter says it may place a notice next to tweets that are sharing synthetic or manipulated media, might warn people if they want to share something it suspects is fake, or might add a link to news articles discussing the belief the media in question is synthetic. Twitter might also remove tweets is they contain synthetic or manipulated content that “is misleading and could threaten someone’s physical safety or lead to other serious harm”, they write.

Why this matters: What Twitter is trying to get ahead of here is the danger of false positive identification. I’m sure that if we could identify synthetic content with 99.9999999%+ accuracy, then the majority of companies would adopt a “take down first, appeal later” policy model. But we don’t live in that world. But we live in a world where our best systems are probably operating in the mid-90% of detection for things like deepfake detectors, and though accuracy can be increased by ensembling models and pairing them with metadata, etc, it’s unlikely we’re going to get to 99%+ classification. That puts platforms in a tough position where they’re going to be unwilling to automatically take stuff down because they’ll have a high enough false positive rate that they’ll irritate their users. So instead we’re going to exist in a halfway place for some years, where platforms are suffused with content that is a mixture of real and fake, while researchers develop more effective technical systems for automatic synthetic media identification.
   Let Twitter know your thoughts by filling out this survey.
   Read more: Help us shape our approach to synthetic and manipulated media (official Twitter blog).

####################################################

Should we be worried about the size of deep learning models?
…In DL, quantity does sometimes equate to quality. But at what cost?…
Machine learning models are getting bigger. Specifically, the models used to do things like image classification or text analysis are getting larger as a consequence of researchers training them on more data using more computation. This trend has some people worried.

   How scale could be a problem: Large-scale AI models could be problematic for a few different reasons, writes Jameson Toole in a post on Medium. They could: 

  • Hinder democratization, as large models are by nature expensive to train. 
  • Restricts deployment, as large models will be hard to deploy on low-compute devices, like phones and internet-of-things products.

   So, what do we do? We should try and develop efficient systems, like SqueezeNet, MobileNet, and others. Once we’ve trained our networks, we should use techniques like knowledge distillation, pruning, and quantization to further reduce them in size.

   Why this matters: In the coming years, we can expect AI developers to continue to scale-up the sizes of models they’re developing, which will likely create AI systems with unparallelled capabilities in tricky domains. The challenge will be figuring out how to make these models available to large numbers of developers, either via cloud services, or methods of tweaking the size of the model. Though I expect AI researchers will continue to push on efficiency, it’s likely that the scaling trend will continue for some years, also.
   Read more: Deep learning has a size problem (Jameson Toole, Medium).

#####################################################

How should journalists cover AI? Researchers have some suggestions:
…Suggestions range from the obvious and sensible, to difficult and abstract…
Skynet Today, an AI news publication predominantly written by AI CS/AI students, has published an editorial about the dos and don’ts of covering artificial intelligence. Close your eyes and think of all the things that seem troubling about AI coverage. Got it? This article is basically a list of those grievances: terminator photos, implications of autonomy where there isn’t anything, a request for clarity about the role humans play.

   Don’t do as we do, do as we say! The article includes some recommendations that highlight how tricky it is for AI journalists to cover AI in a way that researchers might approve of. For instance, it suggests journalists not say programs “learn”, then notes that “it’s true that we AI researchers are often the ones who make use of intuitive, but misleading, word choices in the first place”. Since journalists mostly default to quoting things, this is tricky advice for them to follow.

   One fun thought: How might AI researchers react to some journalists writing “AI Research Best Practices, According to AI Journalists”? Badly, I’d imagine!
   Read more: AI Coverage Best Practices, According to AI Researchers (Skynet Today).

####################################################

Want to make a self-driving car? Try using Voyage Deepdrive!
…Simulators as strategic data-generators…
Self-driving car startup Voyage has released Voyage Deepdrive, an open source 3D world simulator for training self-driving cars. Deepdrive is developed primarily by Craig Quitter, a longtime open-source developer, who now works for Voyage developing Deepdrive fulltime (disclaimer: Craig and I worked together a bit a few years ago when he was at OpenAI).

   What is Deepdrive? Deepdrive is a simulator for training self-driving cars via reinforcement launching. Voyage wants to use Deepdrive to help it make safer, more intelligent cars, and wants to maintain the simulator as open source so that other developers do research on a platform inspired by a realworld self-driving car company. (By comparison, Alphabet’s Waymo has a notoriously complex world simulator they use to train their cars, but they haven’t released it).

   Leaderboards: Voyage has created a Deepdrive leaderboard where people can compete to see who can build the smartest self-driving cars. This will likely help draw developers to work on the platform which could periodically boost Voyage’s own research via the generation of ideas by the external developer community. “In the coming months, Voyage will be hosting a series of competitions to encourage independent engineers and researchers to identify AI solutions to specific scenarios and challenges that actual self-driving cars face on the roads,” Voyage wrote in a Medium blog post.

   Why this matters: The curation and creation of datasets is a driver of research progress in supervized learning, as new datasets tend to create new challenges that highlight the drawbacks of contemporary techniques. In the same way, simulators are helping to drive (pun intended!) progress in reinforcement learning for robotics writ large. Systems like Deepdrive will help make the development of self-driving cars more transparent to the research community by providing an open simulator on which benchmarks can be developed. Let’s see if people use it!
   Read more: Introducing Voyage Deepdrive (Voyage, official Medium).
   Find out more at the official website (DeepDrive Voyage).
   Get the Deepdrive code here (official DeepDrive GitHub repository).
   Read about Waymo’s simulator here (The Atlantic).

 ####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

How the US military uses face recognition:
Newly released documents have revealed details about the military’s use of face recognition. The Automated Biometric Information System (ABIS) is a biometric database including 7.4 million individuals, storing information on anyone coming into contact with the US military, including allied soldiers. It is connected to the FBI’s central database, which is in turn linked with state and local databases across the US. In the first half of 2019, thousands of individuals were identified using the biometric watch list (a subset of ABIS). Between 2008 and 2017 the DoD added 213,000 individuals to the watch list, and 1,700 people were arrested or killed on the basis of biometric and forensic matches.
   Why it matters: Earlier this year, the Axon Ethics Board argued that face recognition technology is not yet reliable enough to be used on policy body-cams (see Import 154). The read-across to military cases is difficult, but accuracy is clearly important both for achieving military goals, and minimizing harm to civilians. It is important, therefore, that these technologies are not being used prematurely, and that their use is subject to proper oversight.
   Read more: This Is How the U.S. Military’s Massive Facial Recognition System Works (Medium).

####################################################

Tech Tales:

[2040]

The Ideation Filter

In the 20th century one big thing in popular psychology was the notion of ‘acting as if’. The thinking went like this: if you’re sad, act as if you’re happy. If you’re lazy, act as if you’re productive. If you’re weak, act as if you’re strong. And so on. 

In the late 20th century we started building AI systems and applied this philosophy to AI agents:

  • Can’t identify these images? We’ll push you into a classification regime where we get you to do your best, and we’ll set up an iterative system that continually improves your ability to classify things, so by ‘acting as if’ you can classify stuff, you’ll eventually be able to do it. 
  • Can’t move this robot hand? We’ll set you up with a simulator where you can operate the robot, and we’ll set up an iterative system that continually improves your ability to manipulate things. 
  • And so on. 

Would it surprise you to learn that as we got better at training AI agents, we also got better at training people using the same techniques? Of course, it took a lot longer than in simulation and required huge amounts of data. But some states tried it. 

Have a populace that doesn’t trust the government? Use a variety of AI techniques to ‘nudge’ them into thinking they might trust the government. 

Have a populace that isn’t performing at a desired economic efficiency level? Use a combination of automation, surveillance, and nudging tech to get them to be more productive, then slowly tweak things in the background until they’re consensually performing their services in the economy. 

Would it surprise you to learn that this stuff worked? We created our own clockwork societies. People could be pushed towards certain objectives and though at first it was acting it eventually became normal and once it became normal people forget they had ever acted. 

You can imagine how the politicians took advantage of these capabilities.
You can imagine how the market took advantage of these capabilities.
You can still imagine. That’s probably the difference between you and I.

Things that inspired this story: “Nudge” techniques applied in contemporary politics; reinforcement learning; pop psychology; applying contemporary administrative logic to an AI-infused future.