Import AI 198: TSMC+USA = Chiplomacy; open source Deepfakes; and environmental justice via ML tools

Facebook wants an AI that can spot… offensive memes?
…The Hateful Memes Challenge is more serious than it sounds…
Facebook wants researchers to build AI systems that can spot harmful or hateful memes. This is a challenging problem: “Consider a sentence like “love the way you smell today” or “look how many people love you”. Unimodally, these sentences are harmless, but combine them with an equally harmless image of a skunk or a tumbleweed, and suddenly they become mean,” Facebook writes.

The Hateful Memes Challenge: Now, similar to its prior ‘Deepfake Detection Challenge’, Facebook wants help from the wider AI community in developing systems that can better identify hateful memes. To do this, it has partnered with Getty images to generate a dataset of hateful memes that also shows sensitivity to those content-miners of the internet, meme creators.
  “One important issue with respect to dataset creation is having clarity around licensing of the underlying content. We’ve constructed our dataset specifically with this in mind. Instead of trying to release original memes with unknown creators, we use “in the wild” memes to manually reconstruct new memes by placing, without loss of meaning, meme text over a new underlying stock image. These new underlying images were obtained in partnership with Getty Images under a license negotiated to allow redistribution for research purposes,” they write.

The key figure: AI systems can get around 65% accuracy, while humans get around 85% accuracy – that’s a big gap to close.

Why this is hard from a research perspective: This is an inherently multimodal challenge – successful hateful meme-spotting systems won’t be able to solely condition off of the text or the image contents of a given meme, but will have to analyze both things together and jointly reason about them. It makes sense, then, that some of the baseline systems developed by Facebook use pre-training: typically, they train systems on large datasets, then finetune these models on the meme data. Therefore, progress on this competition might encourage progress on multimodal work as a whole.
Enter the competition, get the data: You can sign up for the competition and access the dataset here: Hateful Memes Challenge and Data Set (Facebook).
  Read more: The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes (arXiv).

####################################################

Care about publication norms in machine learning? Join an online discussion next week!
The Montreal AI Ethics Institute and the Partnership on AI have teamed up to host an online workshop about “publication norms for responsible AI”. This is part of a project by PAI to better understand how the ML community can public research responsibly, while accounting for the impacts of AI technology to minimize downsides and maximize upsides.
  Sign up for the free discussion here (Eventbrite).

####################################################

Covid = Social Distancing = Robots++
One side-effect of COVID may be a push towards more types of automation. The CEO of robot shopping company Simbe Robotics says: “It creates an opportunity where there is actually more social distancing in the environment because the tasks are being performed by a robot and not a person,” according to Bloomberg. In other words – robots might be a cleaner way of cleaning. Expect more of this.
  Check out the quick video here (Bloomberg QuickTake, Twitter).

####################################################

Deepfake systems are well-documented, open source commodity tech now. What happens next?
…DeepFaceLab paper lays out how to build a Deepfake system…
Deepfakes, the slang term given to AI technologies that let you take someone’s face and superimpose it on someone else in an image or video, are a problem for the AI sector. That’s because deepfakes are made out of basic, multi-purpose AI systems that are themselves typically open source. And while some of the uses of deepfakes could be socially useful, like being able to create new forms of art, many of their immediate applications skew towards the malicious end of the spectrum, namely: pornography (particularly revenge porn) and vehicles for spreading political disinformation.
  So what do we do when Deepfakes are not only well documented in terms of code, but also integrated into consumer-friendly software systems? That’s the conundrum raised by DeepFaceLab, open source software on GitHubs for the creation of deepfakes. In a new research paper, the lead author of DeepFaceLab (Ivan Petrov) and his collaborators (mostly freelancers), outline the system they’ve built and released as open source.

Publication norms and AI research: The paper doesn’t contain much detailed discussion of the inherent ethics of publishing or not publishing this technology. Their justification for this paper is, recursively, a quote of a prior justification from a 2019 paper about FSGAN: Subject Agnostic Face Swapping and Reenactment: “Suppressing the publication of such methods would not stop their development, but rather make them only available to a limited number of experts and potentially blindside policy makers if it goes without any limits”. Based on this quote, the DeepFaceLab authors say they “found we are responsible to publish DeepFaceLab to the academia community formally”.

Why this matters: We’re in the uncanny valley of AI research, these days: we can make systems that generate synthetic text, images, video, and more. The reigning norm in the research community tends towards fully open source code and research. I think it’s unclear if this is long-term the smartest approach to take if you’re keen to minimize downsides (see: today, deepfakes are mostly used for porn, which doesn’t seem like an especially useful use of societal resources, especially since it inherently damages the economic bargaining power of human pornstars). We live in interesting times…
  Read more: DeepFaceLab: A simple, flexible and extensible face swapping framework (arXiv).
  Check out the code for DeepFaceLab here (GitHub).

####################################################

Facebook makes an ultra-cheap voice generator:
What samples two times a second and sounds like a human?
In recent years, peopl;e have started using neural network-based techniques to synthesize voices for AI-based text-to-speech programs. This is the sort of technology that gives voice to Apple’s Siri, Amazon’s Alexa, and Google’s whatever-it-is. When generating these synthetic voices, there’s typically a tradeoff between efficiency (how fast you can generate the voice on your computer) and quality (how good it sounds). Facebook has developed some new approaches that give it a 160X speedup over its internal baseline, which means it can generate voices “in real time using regular CPUs – without any specialized hardware”.

With this technology, Facebook hopes to make “new voice applications that sound more human and expressive and are more enjoyable to use”. The tech has already been deployed inside Facebook’s ‘Portal’ videocalling system, as well as in applications like reading assistance and virtual reality.

What it takes to make a computer talk: Facebook’s system has four elements that, added together, create an expressive voice:
– A front-end that converts text into linguistic features
– A prosody model that predicts the rhythm and melody to create natural-sounding speech
– An acoustic model which generates the spectral representation of the speech
– A neura; vocoder that generates am24 kHz speech waveform, which is conditioned on prosody and spectral features

Going from an expensive to a cheap system: Facebook’s unoptimized speech-to-text system could generate one second of audio in 80 seconds – with optimizations, it cut this to being able to generate a second of audio in 0.5 seconds. To do this they made a number of optimizations including model sparsification (basically reducing the number of parameters you need to activate during execution), as well as blockwise sparsification, multicore support, and other tricks.

Why this matters: Facebook says its “long-term goal is to deliver high-quality, efficient voices to the billions of people in our community”. (Efficient voices – imagine that!). I think it’s likely within ~2 years we’ll see Facebook create a variety of different voice systems, including ones that people can tune themselves (imagine giving yourself a synthetic version of your own voice to automatically respond to certain queries – that’ll become technologically possible via finetuning, but whether anyone wants to do that is another question).
  Read more: A highly efficient, real-time text-to-speech system deployed on CPUs (Facebook AI blog).

####################################################

Recognizing industrial smoke emissions with AI as a route to environmental justice:
…Data for the people…
Picture this: it’s 2025 and you get a push notification on your phone that the local industrial plant is polluting again. You message your friends and head to the site, knowing that the pollution event has already been automatically logged, analyzed, and reported to the authorities.
  How do we get to that world? New research from Carnie Mellon University and Pennsylvania State University shows how: they build a dataset of industrial smoke emissions by using cameras to monitor three petroleum coke plants over several months. They use the resulting data – 12,567 distinct video clips, representing 452,412 frames – to train a deep learning-based image identifier to spot signs of pollution. This system gets about 80% accuracy today (which isn’t good enough for real world use), but I expect future systems based on subsequently developed techniques will improve performance further.

Why this matters: To conduct this research, the team “collaborated with air quality grassroots communities in installing the cameras, which capture an image approximately every 10 seconds”. They also worked with local volunteers as well as workers on Amazon Mechanical Turk to label their data. These activities point towards a world where we can imagine AI practitioners teaming up with local people to build specific systems to deal with local needs, like spotting a serial polluter. I think ‘Environmental Justice via Deep Learning’ is an interesting tagline to aim for.
  Get the data and code here (GitHub).
  Read more: RISE Video Dataset: Recognizing Industrial Smoke Emissions (arXiv).

####################################################

Wondering how to write about the broader impacts of your research? The Future of Humanity Institute has put together a guide:
…Academic advice should help researchers write ‘broader impacts’ for NeurIPS submissions…
AI is influencing the world in increasingly varied ways, ranging from systems that alter the economics of certain types of computation, to tools that may exhibit biases, to software packages that enable things with malicious use potential (e.g, deepfake software). This year, major AI conference NeurIPS has introduced a requirement that paper submissions include a section about the broader impacts of the research. Researchers from industry and academia have written a guide to help researchers write these statements.

How to talk about the broader impacts of AI:
– Discuss the benefits and risks of research
– Highlight uncertainties
– Focus on tractable, neglected, and significant impacts
– Integrate with the introduction of the paper
– Think about impacts even for theoretical work
-Figure out where your research sits in the ‘stack’ (e.g, researcher-facing, or user-facing).

Why this matters: If we want the world to develop AI responsibly, then encouraging researchers to think about their inherent moral and ethical agency with regard to their research seems like a good start. One critique I hear of things like mandating ‘broader impacts’ statements is it can lead to fuzzy or mushy reasoning (compared to the more rigorous technical sections), and/or can lead to researchers making assumptions about fields in which they don’t do much work (e.g, social science). Both of these are valid criticisms. I think my response to them is that one of the best ways to create more rigorous thinking here is to get a larger proportion of the research community oriented around thinking about impacts, which is what things like the NeurIPS requirement do. They’ll be some very interesting meta-analysis papers to write about how different authors approach these sections.
  Read more: A Guide to Writing the NeurIPS Impact Statement (Centre for the Governance of AI, Medium).

####################################################

Chiplomacy++: US and TSMC agree to build US-based chip plant:
Made in US: Gibson Guitars, Crayola Crayons, and… TSMC semiconductors?…
TSMC, the world’s largest contract chip manufacturer (customers include: Apple, Huawei, others), will build a semiconductor manufacturing facility in the USA. This announcement marks a significant moment in the reshoring of semiconductor manufacturing in America. The US government looms in the background of the deal, given mounting worries about the national security risks of technological supply chains.

Chiplomacy++: This deal is also an inherent example of Chiplomacy, the phenomenon where politics drives decisions about the production and consumption of computational capacity.

Recent examples of Chiplomacy:
– The RISC-V foundation moving from Delaware to Switzerland to make it easier for it to collaborate with chip architecture people from multiple countries.
The US government pressuring the Dutch government to prevent ASML exporting extreme ultraviolet lithography (EUV) chip equipment to China.
The newly negotiated US-China trade deal applies 25% import tariffs to (some) Chinese semiconductors

Key details:
– Process node: 5-nanometer. (TSMC began producing small runs of 5nm chips in 2019, so the US facility might be a bit behind industry cutting-edge when it comes online).
– Cost: $12 billion.
– Projected construction completion year: 2024
– Capacity: ~20,000 wafers a month versus hundreds of thousands at the main TSMC facilities overseas.

Why this matters: Many historians think that one of the key resources of the 20th century was oil – how companies used it, controlled it, and invested in systems to extract it, influenced much of the century. Could compute be an equivalently important input for countries in the 21st century? Deals like the US-TSMC one indicate so…
  Read more: Washington in talks with chipmakers about building U.S factories (Reuters).
  Read more: TSMC Plans $12 Billion U.S. Chip Plant in Victory for Trump (Bloomberg).
  Past Import AIs: #181: Welcome to the era of Chiplomacy!; how computer vision AI techniques can improve robotics research; plus, Baidu’s adversarial AI software (Import AI).

####################################################

Tech Tales:

2028
Getting to know your Daggit (V4)

Sometimes you’ll find Daggit watching you. That’s okay! Daggit is trying to learn about what you like to do, so Daggit can be more helpful to you.

If you want Daggit to pay attention to something, say ‘Daggit, look over there’, or point. Try pointing and saying something else, like ‘Daggit, what is that?’ – you’ll be surprised at what Daggit can do.

Daggit is always learning – and so are we. We use anonymized data from all of our customers to make Daggit smarter, so don’t be surprised if you wake up and Daggit has a new skill. 

You can make your home more secure with Daggit – try asking Daggit to ‘patrol’ when you go to bed, and Daggit will monitor your house for you. (If Daggit spots intruders when in ‘patrol’ mode, it will automatically call the authorities.)

Daggit can’t get angry, but Daggit can get sad. If you’re not kind to your Daggit, don’t expect it to be happy when it sees you.

Things that inspired this story: Boston Dynamics’ Spot robot; imitation learning; continued progress in reinforcement learning and generalization; federated learning; customer service manuals and websites; on-device learning.