Import AI 179: Explore Arabic text with BERT-based AraNet; get ready for the teenage-made deepfakes; plus DeepMind AI makes doctors more effective

by Jack Clark

Explore Arabic-language with AraNet:
…Making culture legible with pre-trained BERT models…
University of British Columbia researchers have developed AraNet, software to help people analyze Arabic-language text for identifiers like age, gender, dialect, emotion, irony and sentiment. Tools like AraNet help make cultural outputs (e.g., tweets) legible to large-scale machine learning systems and thereby help broaden cultural representation within the datasets and classifiers used in AI research.

What does AraNet contain? AraNet is essentially a set of pre-trained models, along with software for using AraNet via the command line or as a specific python package. The models have typically been fine-tuned from Google’s “BERT-Base Multilingual Case” model which was pre-trained on 104 languages. AraNet includes the following models:

  • Age & Gender: Arab-Tweet, a dataset of tweets from different users of 17 Arabic countries, annotated with gender and age labels. UBC Twitter Gender dataset, an in-house dataset with gender labels applied to 1,989 users from 21 Arab countries.
  • Dialect identification: It uses a previously developed dialect-identification model for the ‘MADAR’ Arabic Fine-Grained Dialect Identification.
  • Emotion: LAMA-DINA dataset where each tweet is labelled with one of eight primary emotions, with a mixture of human- and machine-generated labels. 
  • Irony: A dataset drawn from the IDAT@FIRE2019 competition, which contains 5,000 tweets related to events taking place in the Middle East between 2011 and 2018, labeled according to whether the tweets are ironic or non-ironic. 
  • Sentiment: 15 datasets relating to sentiment analysis, which are edited and combined together (with labels normalized to positive or negative, and excluding ‘neutral’ or otherwise-labeled samples).

Why this matters: AI tools let us navigate digitized cultures – once we have (vaguely reliable) models we can start to search over large bodies of cultural information for abstract things, like the presence of a specific emotion, or the use of irony. I think tools like AraNet are going to eventually give scholars with expert intuition (e.g., experts on, say, Arabic blogging during the Arab Spring) tools to extend their own research, generating new insights via AI. What are we going to learn about ourselves along the way, I wonder?
  Read more: AraNet: A Deep Learning Toolkit for Arabic Social Media (Arxiv).
   Get the code here (UBC-NLP GitHub) – note, when I wrote this section on Saturday the 4th the GitHub repo wasn’t yet online; I emailed the authors to let them know. 

####################################################

Deep learning isn’t all about terminators and drones – Chinese researchers make a butterfly detector!
…Take a break from all the crazy impacts of AI and think about this comparatively pleasant research…
I spend a lot of time in this newsletter writing about surveillance technology, drone/robot movement systems, and other symptoms of the geopolitical changes brought about by AI. So sometimes it’s nice to step back and relax with a paper about something quite nice: butterfly identification! Here, researchers with Beijing Jiaotong University publish a simple, short paper on using YOLOv3 for butterfly identification.

Make your own butterfly detector: The paper gives us a sense of how (relatively) easy it is to create high-performance object detectors for specific types of imagery. 

  1. Gather data: In this case, they label around ~1,000 photos of butterflies using data from the 3rd China Data Mining Competition butterfly recognition contest as well as images generated by searching for specific types of butterflies on the Baidu search engine. 
  2. Train and run models: Train multiple YOLO v3 models with different image sizes as input data, then combine results from multiple models to make a prediction. 
  3. Obtain a system that gets around 98% accuracy on locating butterflies in photos, with lower accuracies for species and subject identification. 

Why this matters: Deep learning technologies let us automate some (basic) human sensory capabilities, like certain vision or audio identification tasks. The 2020s will be the decade of personalized AI, in which we’ll see it become increasingly easy for people to gather small datasets and train their own classifiers. I can’t wait to see what people come up with!
   Read more: Butterfly detection and classification based on integrated YOLO algorithm (Arxiv)

####################################################

Prepare yourself for watching your teenage kid make deepfakes:
…First, deepfakes industrialized. Now, they’re being consumerized…
Tik Tok & Douyin: Bytedance, the Chinese company behind smash hit app TikTok, is making it easier for people to make synthetic videos of themselves. The company recently added code for a ‘Face Swap’ feature to the latest versions of its TikTok and Douyin Android apps, according to TechCrunch. This unreleased technology would, according to unpublished application notes, let a user take a detailed series of photos of their face, then they can easily morph their face to match a target video, like pasting themselves into scenes from the Titanic or reality TV.
   However, the feature may only come to the Chinese-version of the app (Douyin): “After checking with the teams I can confirm this is definitely not a function in TikTok, nor do we have any intention of introducing it. I think what you may be looking at is something slated for Douyin – your email includes screenshots that would be from Douyin, and a privacy policy that mentions Douyin. That said, we don’t work on Douyin here at TikTok”, a TikTok spokesperson told TechCrunch. “They later told TechCrunch that “The inactive code fragments are being removed to eliminate any confusion,” which implicitly confirms that Face Swap code was found in TikTok.”

Snapchat: Separately, Snapchat has acquired AI Factory, a company that had been developing AI tech to let a user take a selfie and paste and animate that selfie into another video, according to TechCrunch – this technology isn’t quite as amenable to making deepfakes out of the box as the potential Tik Tok & Douyin ones, but gives us a sense of the direction Snap is headed in.

Why this matters: For the past half decade, AI technologies for generating synthetic images and video have been improving. So far, many of the abuses of the technology have either occurred abroad (see: mysoginistic disinformation in India, alleged propaganda in Gabon), or in pornography. Politicians have become worried that they’ll be the next targets. No one is quite sure how to approach the challenge of the threats of deepfakes, but people tend to think awareness might help – if people start to see loads of deepfakes around them on their social media websites, they might become a bit more skeptical of deepfakes they see in the wild. If face swap technology comes to TikTok or Douyin soon, then we’ll see how this alters awareness of the technology. If it doesn’t arrive in these apps soon, then we can assume it’ll show up somewhere else, as a less scrupulous developer rolls out the technology. (A year and a half ago I told a journalist I thought the arrival of deepfake-making meme kids could precede further malicious use of the technology.)
   Read more: ByteDance & TikTok have secretly built a deepfakes maker (TechCrunch).

####################################################

Play AI Dungeon on your… Alexa?
…GPT-2-based dungeon crawler gets a voice mode…
Have you ever wanted to yell commands at a smart speaker like “travel back in time”, “melt the cave”, and “steal the cave”? If so, your wishes have been fulfilled as enterprising developer Braydon Batungbacal has ported AI Dungeon so it works on Amazon’s voice-controlled Alexa system. AI Dungeon (Import AI #176) is a GPT-2-based dungeon crawler that generates infinite, absurdly mad adventures. Play it here, then get the Alexa app.
   Watch the voice-controlled AI Dungeon video here (Braydon Batungbacal, YouTube).
   Play AI Dungeon here (AIDungeon.io).

####################################################

Google’s morals subverted by money, alleges former executive:
…Pick one: A strong human rights commitment, or a significant business in China…
Ross LaJeunesse, a former Google executive turned Democratic Candidate, says he left the company after commercial imperatives quashed the company’s prior commitment to “Don’t Be Evil”. In particular, LeJeuness alleges that Google prioritized growing its cloud business in China to the point it wouldn’t adopt strong language around respecting human rights (the unsaid thing here is that China carries out a bunch of government-level activities that appear to violate various human rights principles). 

Why this matters: Nationalism isn’t compatible with Internet-scale multinational capitalism – fundamentally, the incentives of a government like the USA have become different from the incentives of a multinational like Google. As long as this continues, people working at these companies will find themselves put in the odd position of trying to make moral and ethical policy choices, while steering a proto-country that is inexorably drawn to making money instead of committing to anything. “No longer can massive tech companies like Google be permitted to operate relatively free from government oversight,” LaJeuness writes. “I saw the same sidelining of human rights and erosion of ethics in my 10 years,” wrote Liz Fong-Jones, a former Google employee.
   Read more: I Was Google’s Head of International Relations. Here’s Why I Left (Medium)

####################################################

DeepMind makes human doctors more efficient with breast cancer-diagnosing assistant system:
…Better breast cancer screening via AI…
DeepMind has developed a breast cancer screening system that outperforms diagnoses made by individual human specialists. The system is an ensemble of three deep learning models, each of which operates at a different level of analysis (e.g., classifying individual lesions, versus breasts). The system was tested on both US and UK patient data, and was on par with human experts  in the case of UK data and superior to human experts when trained on US data. (The reason for the discrepancy between US and UK results is that patient records are typically checked by two people in the UK, versus one in the US).

How do you deploy a medical AI system? Deploying medical AI systems is going to be tricky – humans have different levels of confidence in machine versus human insights, and it seems like it’d be irresponsible to simply swap an expert with an AI system. DeepMind has experimented with using the AI system as an assistant for human experts, where its judgements can inform the human. In simulated experiments, DeepMind says “an AI-aided double-reading system could achieve non-inferior performance to the UK system with only 12% of the current second reader workload.” 

Why this matters: Life is a lot like land – no one is making any more of it. Therefore, people really value their ability to be alive. If AI systems can help people like longer through proactive diagnosis, then societal attitudes to AI will improve. For people to be comfortable with AI, we should find ways to heal and educate people, rather than just advertize and surveil them; systems like this from DeepMind give us these motivating examples. Let’s make more of them.
   Read more: International evaluation of an AI system for breast cancer screening (DeepMind)

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Offence-defence balance and advanced AI:
Adversarial situations can differ in terms of the ‘offense-defense balance’: the relative ease of carrying out, and defending against an attack – e.g. the invention of barbed wire and machine guns shifted the balance towards defense in European ground warfare. New research published in the Journal of Strategic Studies tries to work out how the offense-defense tradeoff works in successive conflict scenarios.

AI and scaling: The effects of new technologies (e.g. machine guns), and new types of conflict (e.g. trench warfare) on offense-defense balance are well-studied, but the effect of scaling up existing technologies in familiar domains has received less attention. Scalability is a key feature of AI systems. The marginal cost of improving software is low, and will decrease exponentially with the cost of computing, and AI-supported automation will reduce the marginal cost of some services (e.g. cyber vulnerability discovery) to close to zero. So understanding how O-D balance shifts as investments scale up is an important way of forecasting how adversarial domains like warfare and cybersecurity will behave as AI develops.

Offensive-then-defensive scaling: This paper develops a model that reveals the phenomenon of offensive-then-defensive scaling (‘O-D scaling’), whereby initial investments favour attackers, up until a saturation point, after which further investments always favour defenders. They show that O-D scaling is exhibited in land invasion and cybersecurity under certain assumptions, and suggest that there are general conditions where we should expect this dynamic – conflicts where there are multiple attack vectors, where these can be saturated by a defender, and where defense is locally superior (i.e. wins in evenly matched contests). They argue these are plausible in many real-world cases, and that O-D scaling is therefore a useful baseline assumption. 

Why it matters: Understanding the impact of AI on international security is important for ensuring things go well, but technology forecasting is difficult. The authors claim that one particular feature of AI that we can reliably foresee – its scalability – will influence conflicts in a predictable way. It seems like good news that if we pass through the period of offense-dominance, we can expect defense to dominate in the long-run, but the authors note that there is still disagreement on whether defense-dominated scenarios are more stable.
   Read more: How does the offense-defense balance scale? (Journal of Strategic Studies).
   Read more: Artificial Intelligence, Foresight, and the Offense-Defense Balance (War on the Rocks).

2019 AI safety literature review:
This is a thorough review of research on AI safety and existential risk over the past year. It provides an overview of all the organisations working in this small but fast-growing area, an assessment of their activities, and some reflections on how the field is developing. It is an invaluable resource for anyone considering donating to charities working in these areas, and for understanding the research landscape.
   Read more: 2019 AI Alignment Literature Review and Charity Comparison (LessWrong).

####################################################

Tech Tales:

Digital Campaign
[Westminster, London. 2025]

I don’t remember when I stopped caring about the words, but I do remember the day when I was staring at a mixture of numbers on a screen and I felt myself begin to cry. The numbers weren’t telling me a poem. They weren’t confessing something from a distant author that echoed in myself. But they were telling me about resonance. They were telling me that the cargo they controlled – the synthetic movie that would unfold once I fired up this mixture of parameters – would inspire an emotion that registered as “life-changing” on our Emotion Evaluation Understudy (EEU) metric.

Verified? I said to my colleagues in the control room.
Verified, said my assistant, John, who looked up from his console to wipe a solitary tear from his eye.
Do we have cargo space? I asked.
We’ve secured a tranche of evening bot-time, as well as segments of traditional media, John said.
And we’ve backtested it?
Simulated rollouts show state-of-the-art engagement.
Okay folks, I said. Let’s make some art.

It’s always anticlimactic, the moment where you turn it on. There’s a lag from anywhere between a sub-second a full minute, depending on the size of the system. Then the dangerous part – it’s easy to get fixated on earlier versions of the output, easy to find yourself getting more emotional at the stuff you see early in training than the stuff that appears later. Easy to want to edit the computer. This is natural. This is a lot like being a parent, someone told you in a presentation on ‘workplace psychology for reliable science’. It’s natural to be proud of them when they’ve only just begun to walk. After that, everything seems easy.

We wait. Then the terminal prints “task completion”. We send our creation out onto the internet and the radio and the airwaves: full multi-spectrum broadcast. Everyone’s going to see it. We don’t watch the output ourselves – though we’ll review it in our stand-up meeting tomorrow.

Here, in the sealed bunker, I am briefly convinced I can hear cheering begin to come from the street outside. I am imagining people standing up, eyes welling with tears of laughter and pain, as they receive our broadcast. I am trying to imagine what a state-of-the-art Emotion Evaluation Understudy system means.

Things that inspired this story: AI+Creativity, taken to its logical conclusion; the ‘Two hands are a lot’ blog post from Dominic Cummings; BLEU scores and the general mis-leading nature of metrics; nudge campaigns; political messaging; synthetic text advances; likely advances in audio and video synthesis; a dream I had at the turn of 2019/2020 in which I found myself in a control room carefully dialing in the parameters of a language model, not paying attention to the words but knowing that each variable I tuned inspired a different feeling.