Import AI

Import AI 185: Dawn of the planetary search engine; GPT-2 poems; and the UK government’s seven rules for AI providers

Dawn of the planetary satellite-imagery search engine:
…Find more images like this, but for satellite imagery…
AI startup Descartes Labs has created a planet-scale search engine that lets people use the equivalent of ‘find more images like this’, but instead of uploading an image into a search engine and getting a response back, they upload a picture of somewhere on Earth and get a set of visually similar locations.

How they did it: To build this, the authors used four datasets – for the USA, they used aerial imagery from the National Agriculture Imagery Program (NAIP), as well as the Texas Orthoimagery Program. For the rest of the world, they used data from Landsat 8. They then took a stock 50-layer ResNet pre-trained on ImageNet and made a couple of tweaks – injecting noise during training to make it easier for the network to learn to make binary classification decisions, they also did light customization for extracting features from networks trained against different datasets. Through this, they gained a set of 512-bit feature vectors, which make it possible to search for complex things like visual similarity.

How well does it work: In tests, they get scores of reasonable but not stellar performance, obtaining top-30 accuracies of around 80% when dealing with things they’ve fine-tuned the network against. However, in qualitative tests it feels like its performance may be higher than this for most use cases – I’ve played around with the Descartes Labs website where you can test out the system; it does reasonably well when you click around, identifying things like intersections and football stadiums well. I think a lot of the places where it gets confused come from the relatively low resolution of the satellite imagery, making fine-grained judgements more difficult.

Why this matters: Systems like this give us a sense of how AI lets us do intuitive things easily that would otherwise be ferociously difficult – just twenty years ago, asking for a system that could show you similar satellite images would be a vast undertaking with significant amounts of hand-written features and bespoke datasets. Now, it’s possible to create this system with a generic pre-trained model, a couple of tweaks, and some generally available unclassified datasets. I think AI systems are going to unlock lots of applications like this, letting us query the world with the sort of intuitive commands (e.g., similar to), that we use our own memories for today.
  Read more: Visual search over billions of aerial and satellite images (arXiv).
  Try the AI-infused search for yourself here (Descartes Labs website).

####################################################

Generating emotional, dreamlike poems with GPT-2:
…If a poem makes you feel joy or sadness, then is it good?…
Researchers with Drury University and UC-Colorado Springs have created  a suite of fine-tuned GPT-2 models for generating poetry with different emotional or stylistic characteristics. Specifically, they create five separate data corpuses of poems that, in their view, represent emotions like anger, anticipation, joy, sadness, and trust. They then fine-tune the medium GPT-2 model against these datasets.
  Fine-tuned dream poetry: They also train their model to generate what they call “dream poems” – poems that have a dreamlike element. To do this, they take the GPT-2 model and train it on a corpus of first-person dream descriptions, then train it again on a large poetry dataset.

Do humans care? The researchers generated a batch of 1,000 poems, then presented four poems from each emotional category to a set of ten human reviewers. “Poems presented were randomly selected from the top 20 EmoLex scored poems out of a pool of 1,000 generated poems,” they write. The humans were asked to write the poems according to the emotions they felt after reading them – in tests, they classified the poems based on the joy and sad corpuses as reflecting those emotions 85% and 87.5% of the time, respectively. That’s likely because these are relatively easy emotions to categorize with relatively broad categories. By comparison, they correctly categorized things like Anticipation and Trust 40% and 32.5% of the time, respectively.

Why this matters: I think language models are increasingly being used like custom funhouse mirrors – take something you’re interested in, like poetry, and tune a language model against it, giving you an artefact that can generate warped reflections of what it was exposed to. I think language models are going to change how we explore and interact with large bodies of literature.
  Get the ‘Dreambank’ dataset used to generate the dream-like poems here.
  Read more: Introducing Aspects of Creativity in Automatic Poetry Generation (arXiv).

####################################################

Want a responsible AI economy? Do these things, says UK committee:
…Eight tips for governments, seven tips for AI developers..
The UK’s Committee on Standards in Public Life thinks the government needs to work harder to ensure it uses AI responsibly, and that the providers of AI systems operate in responsible, trustworthy ways. The government has a lot of work to do, according to a new report from the committee: “Government is failing on openness,” the report says. “Public sector organizations are not sufficiently transparent about their use of AI and it is too difficult to find out where machine learning is currently being used in government”.

What to do about AI if you’re a government, national body, or regulator: The committee has eight recommendations designed for potential AI regulators:

  • Adopt and enforce ethical principles: Figure out which ethical principles to use to guide the use of AI in the public sector (there are currently three sets of principles for the public sector – the FAST SUM Principles, the OECD AI Principles, and the Data Ethics Framework.
  • Articulate a clear legal basis for AI usage: Public sector organizations should publish a statement on how their use of AI complies with relevant laws and regulations before they are deployed in public service delivery. 
  • Data bias and anti-discrimination law: Ensure public bodies comply with the Equality Act 2010. 
  • Regulatory assurance body: Create a regulatory assurance body that identifies gaps in the regulatory landscape and provides advice to individual regulators and government on the issues associated with AI. 
  • Procurement rules and processes: Use government procurement procedures to mandate compliance with ethical principles (when selling to public organizations).
  • The Crown Commercial Service’s Digital Marketplace: Create a one-stop shop for finding AI products and services that satisfy ethical requirements.
  • Impact assessment: Integrate an AI impact assessment into existing processes to evaluate the potential effects of AI on public standards, for a given use case.

What to do if you’re an AI provider: The committee also has some specific recommendations for providers of AI services (both public and private-sector). These include:

  • Evaluate risks to public standards: Assess systems for their potential impact on standards and seek to mitigate standard risks identified. 
  • Diversity: Tackle issues of bias and discrimination by ensuring they take into account “the full range of diversity of the population and provide a fair and effective service”.
  • Upholding responsibility: Ensure that responsibility for AI systems is clearly allocated and documented.
  • Monitoring and evaluation: Moniter and evaluate AI systems to ensure they always operate as intended.
  • Establishing oversight: Implement oversight systems that allow for their AI systems to be properly scrutinised.
  • Appeal and redress: AI providers should always tell people about how they can appeal against automated and AI-assisted decisions. 
  • Training and education: AI providers should train and educate their employees.

Why this matters: Sometimes I think of the AI economy a bit like an alien invasion – we have a load of new services and capabilities that were not economically feasible (or in some cases, possible) before, and the creatures in the AI economy don’t currently mesh perfectly well with the rest of the economy. Initiatives like the UK committee report help calibrate us about the changes we’ll need to make to harmoniously integrate AI technology into society.
  Read more: Artificial intelligence and Public Standards, A Review by the Committee on Standards in Public Life (PDF, gov.uk).

####################################################

Speeding up scientific simulators by millions to billions of times:
…Neural architecture search helps scientists build a machine that simulates the machines that simulate reality…
You’ve heard of how AI can improve our scientific understanding of the world (see: systems like AlphaFold for protein structure prediction, and various systems for weather simulation), but have you heard about how AI can improve the simulators we use to improve our scientific understanding of the world? New research from an interdisciplinary team of scientists from the University of Oxford, University of Rochester, Yale University, University of Seattle, and the Max-Planck-Institut fur Plasmaphysik, shows how you can use modern deep learning techniques to speed up diverse scientific simulation tasks by millions to billions of times.

The technique: They use Deep Emulator Network SEarch (DENSE), a technique which consists of them defining a ‘super architecture’ and running neural architecture search within it. The super-architecture consists of “convolutional layers with different kernel sizes and a zero layer that multiplies the input with zero,” they write. “The option of having a zero layer and multiple convolutional layers enable the algorithm to choose an appropriate architecture complexity for a given problem.” During training, the system alternates between training the network and observing its performance, then performing a search step where network variables “are updated to increase the probability of the high-ranked architectures and decrease the probability of the low-ranked architectures”.

Results: They test their approach on ten different scientific simulation cases. These cases have input parameters that vary from 3 to 14, and outputs from 0D (scalars) to multiple 3D signals. Specifically, they use DENSE to try and train emulators of ten distinct simulation use cases, then assess the performance of the emulators. In tests, the emulators obtain, at minimum, comparable results to the real simulators, and at best, far superior ones. They also show eye-popping speedups of as high as hundreds of millions to billions of time faster.
  “The ability of DENSE to accurately emulate simulations with limited number of data makes the acceleration of very expensive simulations possible,” they write. “The wide range of successful test cases presented here shows the generality of the method in speeding up simulations, enabling rapid ideas testing and accelerating new discovery across the sciences and engineering”.

Why this matters: If deep learning is basically just really great at curve-fitting, then papers like this highlight just how useful that is. Curve-fitting is great if you can do it in complex, multidimensional spaces! I think it’s pretty amazing that we can use deep learning to essentially approximate a thoroughly complex system (e.g., a scientific simulator), and it highlights how I think one of the most powerful use cases for AI systems is to be able to approximate reality and therefore build prototypes against these imaginary realities.
  Read more: Up to two billion times acceleration of scientific simulations with deep neural architecture search (arXiv).

####################################################

Automatically cataloging insects with the BIODISCOVER machine:
…Next: A computer vision-equipped robotic arm…
Insects are one of the world’s most numerous living things, and one of the most varied as well. Now, a team of scientists from Tempere University and the University of Jyvaskyla in Finland, Aarhus University in Denmark, and the Finnish Environmental Institute, have designed a robot that can automatically photograph and analyze insects. They call their device the BIODISCOVER machine, short for BIOlogical specimens Described, Identified, Sorted, Counted, and Observed using Vision-Enabled Robotics. The machine automatically detects specimens, then photographs them and crops the images to be 496 pixels wide (defined by the width of the cuvette) and 496 pixels high.
  “We propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology”, they write. “Reliable identification of species is pivotal but due to its inherent slowness and high costs, traditional manual identification has caused bottlenecks in the bioassessment process”

Testing BIODISCOVER: In tests, the researchers imaged a dataset of nine terrestrial arthropod species collected at Narsarsuaq, South Greenland, gathering thousands of images for each species. They then used this dataset to test out how well two machine learning classification approaches work on the images. They used a ResNet-50 and an InceptionV3 network (both pre-trained against ImageNet) to train two systems to classify the images, and to create data about which camera aperture and exposure settings yield the images that it are easier for machine learning algorithms to classify. In tests, they obtain an average classification accuracy of 0.980 over ten test sets.

Next steps: Now that the scientists have built BIODISCOVER, they’re working on a couple of additional features to help them create automated insect analysis. These include: developing a computer-vision enabled robot arm that can detect insects in a bulk tray, then select an appropriate tool to move the insect into the BIODISCOVER machine, as well as a sorting rack to place specimens into their preferred containers after they’ve been photographed.
  Read more: Automatic image-based identification and biomass estimation of invertebrates (arXiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

White House propose increased AI spending amidst cuts to science budgets
The White House has released their 2021 federal budget proposal. This is a clear communication of the government’s priorities, but will not become law, as the budget must now pass through Congress, who are expected to make substantial changes.
  Increases to AI funding: There is a doubling of proposed R&D spending in non-defense AI (and quantum computing). In defense, there are substantial increases to AI R&D funding via DARPA, and for the DoD’s Joint AI Center. A budget supplement detailing AI spending programs on an agency-by-agency basis is expected later this year.
  Substantial cuts to basic science: Overall, proposed R&D spending represents a 9% decrease on 2020 levels. Together with the proposals for AI, this indicates a substantial rebalancing of the portfolio of science funding towards technologies perceived as being strategically important.

Why it matters: The budget is best understood as a statement of intent from the White House, which will be altered by Congress. The proposed uplift in funding for AI will be welcomed, but the scale of cuts to non-AI R&D spending raises questions about the government’s commitment to science. [Jack’s take: I think AI is going to be increasingly interdisciplinary in nature, so cutting other parts of science funding is unlikely to maximize the long-term potential of AI as a technology – I’d rather live in a world where countries invested in science vigorously and holistically).

   Read more: FY2021 Budget (White House).
  Read More: Press release (Whte House).

AI alignment fellowship at Oxford University:

Oxford’s Future of Humanity Institute  is taking applications for their AI Alignment fellowship. Fellows will spend three or more months pursuing research related to the theory or design of human-aligned AI, as part of FHI’s AI safety team. Previously successful applicants have ranged from undergraduate to post-doc level. Applications to visit during summer 2020 will close on February 28.
  For more information and to apply: AI Alignment Visiting Fellowship (FHI)


####################################################

Tech Tales:

Dance!

It was 5am when the music ran out. We’d been dancing to a single continuous, AI-generated song. It had been ten or perhaps eleven hours since it had started, and the walls were damp and shiny with sweat. Everyone had that glow of being near a load of other humans and dancing. At least the lights stayed off.
  “Did it run out of ideas?” someone shouted.
  “Your internet go down?” someone else asked.
  “You train this on John Cage,” asked someone else.
  The music started up, but it was human-music. People danced, but there wasn’t the same intensity.

The thing about AI raves is the music is always unique and it never gets repeated. You train a model and generate a song and the model kind of continuously fills it in from there. The clubs compete with each other for who can play the longest song. “The longest unroll”, as some of the AI people say. People try and snatch recordings of the music – though it is frowned upon – and after really good parties you see shreds of songs turn up on social media. People collect these. Categorize them. Try to map out the stylistic landscape of big, virtual machines.

There are rumors of raves in Germany where people have been dancing to new stuff for days. There’ve even been dance ships, where the journey is timed to perfectly coincide with the length of the generated music. And obviously the billionaires have been making custom ‘space soundtracks’ for their spaceship tourism operations. Some people are filling their houses with speakers and feeding the sounds of themselves into an ever-growing song.

Things that inspired this short story: MuseNet; Virtual Reality; music synthesis; Google’s AI-infused Bach Doodle.

Import AI 184: IBM injects AI into the command line; Facebook releases 4.5 BILLION parallel sentences to aid translation research; plus, VR prison

You’ve heard of expensive AI training. What about expensive AI inference?
…On the challenges of deploying GPT-2, and other large models…
In the past year, organizations have started training ever-larger AI models. The size of these models has now grown enough that they’ve started creating challenges for people who want to deploy them into production. A new post on Towards Data Science discusses some of these issues in relation to GPT-2:
– Size: Models like GPT-2 are large (think gigabytes not megabytes), so embedding them in applications is difficult.
– Compute utilization: Sampling an inference from the model can be CPU/GPU-intensive, which means it costs quite a bit to set up the infrastructure to run these models (just ask AI Dungeon).
– Memory requirements: In the same way they’re compute-hungry, new models are memory-hungry as well.

Why this matters: Today, training AI systems is very expensive, and sampling from trained models is cheap. With some new large-scale models, it could become increasingly expensive to sample from the models as well. How might this change the types of applications these models get used for, and the economics associated with whether it makes sense to use them?
  Read more: Too big to deploy: How GPT-2 is breaking production (Towards Data Science).

####################################################

AI versus AI: Detecting model-poisoning in federated learning
..Want to find the crook? Travel to the lower dimensions!…
If we train AI models by farming out computationally-expensive training processes to people’s phones and devices, then can people attack the AI being trained by manipulating the results of the computations occurring on their devices? New research from Hong Kong University of Science and Technology and WeBank tries to establish a system for defending against attacks like this.

Defending against the (distributed) dark arts: To defend their AI models against these attacks, the researchers propose something called spectral anomaly detection. This involves using a variational autoencoder to embed the results of computations from different devices into the same low-dimensional latent space. By doing this, it becomes relatively easy to identify anomalous results that should be treated with suspicion.
“Even though each set of model updates from one benign client may be biased towards its local training data, we find that this shift is small compared to the difference between the malicious model updates and the unbiased odel updates from centralized training,” they write. “Through encoding and decoding, each client’s update will incur a reconstruction error. Note that malicious updates result in much larger reconstruction errors than the benign ones.” In tests, their approach gets accuracies of between 80% and 90% at detecting three types of attacks – sign-flipping, noise addition, and targeted model poisoning.


Why this matters: This kind of research highlights the odd overlaps between AI systems and political systems – both operate over large, loosely coordinated sets of entities (people and devices). Both systems need to be able to effectively synthesize a load of heterogeneous views and use these to make the “correct” decision, where correct usually correlates to the preferences they extract from the big mass of signals. And, just as politicians try to identify extremist groups who can distort the sorts of messages politicians hear (and therefore the actions they take), AI systems need to do the same. I wonder if in a few years techniques developed to defend against distributed model poisoning, might be ported over into political systems to defend against election attacks?
  Read more: Learning to Detect Malicious Clients for Robust Federated Learning (arXiv).

####################################################

Facebook wants AI to go 3D:
…PyTorch3D makes it more efficient to run ML against 3D mesh objects, includes differentiable rendering framework…
Facebook wants to make it easier for people to do research in what it terms 3D deep learning – this essentially means it wants to make tools that let AI developers train ML systems against 3D data representations. This is a surprisingly difficult task – most of today’s AI systems are built to process data presented in a 2D form (e.g., ImageNet delivers pictures of real-world 3D scenes via data composed as 2D data structures to represent static images).

3D specials: PyTorch3D ships with a few features to make 3D deep learning research easier – these include data structures for representing 3D object meshes efficiently, data operators that help make comparisons between 3D data, and a differentiable renderer – that is, a kind of scene simulator that lets you train an AI system that can learn while operating a moving camera. “With the unique differentiable rendering capabilities, we’re excited about the potential for building systems that make high-quality 3D predictions without relying on time-intensive, manual 3D annotations,” Facebook writes.

Why this matters: Tools like PyTorch3D make it easier and cheaper for more people to experiment with training AI systems against different and more complex forms of data than those typically used today. As tools like this mature we can expect them to cause further activity in this research area, which will eventually yield various exciting sensory-inference systems that will allow us to do more intelligent things with 3D data. Personally, I’m excited to see how tools like this make it easier for game developers to experiment with AI systems built to leverage natively-3D worlds, like game engines. Watch this space.
  Read more: Introducing PyTorch3D: An open-source library for 3D deep learning (Facebook AI Blog).
  Get the code for PyTorch3D here (Facebook Research, GitHub).

####################################################

A-I-inspiration: Using GANs to make… chairs?
…The future of prototyping is an internet-scale model, some pencils, and a rack of GPUs…
A team of researchers with Peking University and Tsinghua University have used image synthesis systems to generate a load of imaginary chairs, then make one of the chairs in the real world. Specifically, they implement a GAN to try and generate images of chairs, along with a superresolution module which takes these outputs and scales them up.

Furniture makers of the future, sit down! “After the generation of 320,000 chair candidates, we spend few ours [sic] on final chair prototype selection,” they write. “Compared with traditional time-consuming chair design process”.

What’s different about this? Today, tons of generative design tools already exist in the world – e.g., software company Autodesk has staked out some of its future on the use of a variety of algorithmic tools to help it perform on-the-fly “generative design” which optimizes things like the weight and strength of a given object. AI tools are unlikely to replace tools like this in the short term, but they will open up another form of computer-generated designs for exploration by people – though I imagine GAN-based ones are going to be more impressionistic and fanciful, whereas ones made by industrial design tools will have more useful properties that tie to economic incentives.

Prototyping of the future: In the future, I expect people will train large generative systems to help them cheaply prototype ideas, for instance, by generating various candidate images of various products to inspire a design team, or clothing to inspire a fashion designer (e.g., the recent Acne Studios X Robbie Barrat collab), or scraps of text to aid writers. Papers like this sketch out some of the outlines for what this world could look like.
  Read more: A Generative Adversarial Network for AI-Aided Chair Design (arXiv).

####################################################

Google Docs for AI Training: Colab goes commercial:
…Want better hardware and longer runtimes? Get ready to pay up…
Google has created a commercial version of its popular, free “Google Colab” service. Google Colab is kind of like GDocs for code – you can write code in a browser window, then execute it on hardware in Google’s data centers. One thing that makes Colab special is that it ships with inbuilt access to GPUs and TPUs, so you can use Colab pages to train AI systems as well as execute them.

Colab Pro: Google’s commercial version, Colab Pro, costs $9.99 a month. What you get for this is more RAM, priority access to better GPUs and TPUs, and code notebooks that’ll stay connected to hardware for up to 24 hours (versus 12 hours for the free version).
  More details about Colab Pro here at Google’s website.
  Spotted via Max Woolf (@minimaxir, Twitter)

####################################################

What’s cooler than a few million parallel sentences? A few BILLION ones:
…CCMatrix gives translation researchers a boost…
Facebook has released CCMatrix, a dataset of more than 4.5 billion parallel sentences in 576 language pairs.

Automatic for the (translation) people: CCMatrix is so huge that Facebook needed to use algorithmic techniques to create it. Specifically, Facebook learned a multilingual sentence embedding to help it represent sentences from different languages in the same featurespace, and then using the distance between sentences in featurespace to help the system figure out if they’re parallel sentences from two different languages.

Why this matters: Datasets like this will help people build translation systems that work for a broader set of people, and in particular should help with transfer to languages for which there is less digitized material.
  Read more: CCMatrix: A billion-scale bitext data set for training translation models (Facebook AI Research, blog).
  Read more: CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB (ArXiv).
  Get the code from the CCMatrix GitHub Page (Facebook Research, GitHub)

####################################################

AI in the command line – IBM releases its Command Line AI (CLAI):
…Get ready for the future of interfacing with AI systems…
IBM Researchers have built Project CLAI (Command Line AI), open source software for interfacing with a variety of AI capabilities via the command line. In a research paper describing the CLAI, they lay out some of the potential usages of an AI system integrated into the command line – e.g., in-line search and spellchecking, code suggestion features, and so on – as well as some of the challenges inherent to building one.

How you build a CLAI? The CLAI – pronounced like clay – is essentially a little daemon that runs in the command line and periodically comes alive to do something useful. “Every command that the user types is piped through the backend and broadcast to all the actively registered skills associated with that user’s session on the terminal. In this manner, a skill can autonomously decide to respond to any event on the terminal based on its capabilities and its confidence in its returned answer,” the researchers write.

So, what can a CLAI do? CLA’s capabilities include: a natural language module that tries to convert plain text commands into tar or grep commands; a system that tries to find and summarize information from system manuals; a ‘help’ function which activates “whenever there is an error” and searches Unix Stack Exchange for a relevant post to present to the user in response; a bot for querying Unix Stack Exchange in plain text and a Kubernetes automation service (name: Kube Bot).
  And what can CLAI do tomorrow? In the future, the team hope to implement an auto-complete feature into the command line, so CLAI can suggest commands users might want to run.

Do people care? In a survey of 235 developers, a little over 50% reported they’d be either “likely” or “very likely” to use a command line interface with an integrated CLAI (or similar) service. In another part of the survey, they reported intolerance for laggy systems with response times greater than a few seconds, highlighting the need for these systems to be well performing.

Why this matters: At some point, AI is going to be integrated into command lines in the same way things like ‘git’ or ‘pip install’ or ‘ping’ are today – and so it’s worth thinking about this hypothetical future today before it becomes our actual future.
  Read more: CLAI: A Platform for AI Skills on the Command Line (arXiv).
  Get the code for CLAI from IBM’s GitHub page.
  Watch a video about CLAI here (YouTube)

####################################################

Tech Tales: 

The Virtual Prisoner

You are awake. You are in prison. You cannot see the prison, but you know you’re in it. 

What you see: Rolling green fields, with birds flying in the sky. You look down and don’t see a body – only grass, speckled with flowers. 

Your body feels restricted. You can move, but only so far. When you move, you travel through the green fields. But you know you are not in the green fields. You are somewhere else, and though you perceive movement, you are not moving through real space. You are, however, moving through virtual space. You get fed through tubes attached to you. 

Perhaps it would not be so terrible if the virtual prison was better made. But there are glitches. Occasional flaws in the simulation where all the flowers turn a different color, or the sky disappears. For a second you are even more aware of the falsehood of this world. Then it gets fixed and you go back to caring less.

One day something breaks and you stop being able to see anything, but can still hear the artificial sound of wind causing tree branches to move. You close your eyes. You panic when you open them and things are still black. Eventually, it gets fixed, and you feel relieve as the field reappears in front of you.

It takes energy to remember that while you walk through the field you are also in a room somewhere else. It gets easier to believe more and more that you are in the field. It’s not that you’re unaware of your predicament, but you don’t focus on it so much. You cease modeling the duality of your world.

You have lived here for many years, you think one day. You know the trees of the field. Know the different birds in the sky. And when you wake up your first thought is not: where am I? It is “where shall I go today?”

Things that inspired this story: Room-scale VR; JG Ballard; simulacra; the fact states are more obsessed with control rather than deletion; spaceless panopticons.

Import AI: 183: Curve-fitting conversation with Meena; GANs show us our climate change future; and what compute-data arbitrage means  

Can curve-fitting make for good conversation?
…Google’s “Meena” chatbot suggests it can…
Google researchers have trained a chatbot with uncannily good conversational skills. The bot, named Meena, is a 2.6 billion parameter language model trained on 341GB of text data, filtered from public domain social media conversations. Meena uses a seq2seq model (the same sort of technology that powers Google’s “Smart Compose” feature in gmail), paired with an Evolved Transformer encoder and decoder – it’s interesting to see something like this depend so much on a component developed via neural architecture search.

Can it talk? Meena is a pretty good conversationalist, judging by transcripts uploaded to GitHub by Google. It also seems able to invent jokes (e.g., Human: do horses go to Harvard? Meena: Horses go to Hayvard. Human: that’s a pretty good joke, I feel like you led me into it. Meena: You were trying to steer it elsewhere, I can see it.)

A metric for good conversation: Google developed the ‘Sensibleness and Specificity Average’ (SSA) measure, which it uses to evaluate how good Meena is in conversation. This metric evaluates the outputs of language models for two traits – is the response sensible, and is the response specifically tied to what is currently being discussed. To calculate the SSA for a given chatbot, the researchers have a team of crowd workers evaluate some of the outputs of the models, then they use this to create an SSA score.
  Humans vs Machines: The best-performing version of Meena gets an SSA of 79%, compared to 86% for an average human. By comparison, other state-of-the-art systems such as DialoGPT (51%) and Cleverbot (44%) do much more poorly.

Different release strategy: Along with their capabilities, modern neural language models have also been notable for the different release strategies adopted by the organizations that build them – OpenAI announced GPT-2 but didn’t release it all at once, releasing the model over several months along with research into its potential for misinformation, and its tendencies for biases. Microsoft announced DialoGPT but didn’t provide a sampling interface in an attempt to minimize opportunistic misuse, and other companies like NVIDIA have alluded to larger language models (e.g., Megatron), but not released any parts of them.
  With Meena, Google is also adopting a different release strategy. “Tackling safety and bias in the models is a key focus area for us, and given the challenges related to this, we are not currently releasing an external research demo,” they write. “We are evaluating the risks and benefits associated with externalizing the model checkpoint, however”.

Why this matters: How close can massively-scaled function approximation get us to human-grade conversation? Can it get us there at all? Research like this pushes the limits of a certain kind of deliberately naive approach to learning language, and it’s curious that we’re developing more and more superficially capable systems, despite the lack of domain knowledge and handwritten systems inherent to these approaches. 
  Read more: Towards a Human-like Open-Domain Chatbot (arXiv).
  Read more: Towards a Conversational Agent that Can Chat About… Anything (Google AI Blog).

####################################################

Chinese government use drones to remotely police people in coronavirus-hit areas:
…sTaY hEaLtHy CiTiZeN!…
Chinese security officials are using drones to remotely surveil and talk to people in coronavirus-hit areas of the country.

“According to a viral video spread on China’s Twitter-like Sina Weibo on Friday, officials in a town in Chengdu, Southwest China’s Sichuan Province, spotted some people playing mah-jong in a public place.
  “Playing mah-jong outside is banned during the epidemic. You have been spotted. Stop playing and leave the site as soon as possible,” a local official said through a microphone while looking at the screen for a drone.
  “Don’t look at the drone, child. Ask your father to leave immediately,” the official said to a child who was looking curiously up at the drone beside the mah-jong table.” – via Global Times.

Why this matters: This is a neat illustration of the omni-use nature of technology; here, the drones are being used for a societally-beneficial use (preventing viral transmission), but it’s clear they could be used for chilling purposes as well. Perhaps one outcome of the coronavirus outbreak will be a normalization for a certain form of drone surveillance in China?
  Read more: Drones creatively used in rural areas in battle against coronavirus (Global Times).
  Watch this video of a drone being used to instruct someone to go home and put on a respirator mask (Global Times, Twitter).
 
####################################################

Want smarter AI? Train something with an ego!
…Generalization? It’s easier if you’re self-centered…
Researchers with New York University think that there are a few easy ways to improve generalization of agents trained via reinforcement learning – and it’s all about ego! Specifically, their research suggests that if you can make technical tweaks that make a game more egocentric, that is, more tightly gear the observations around a privileged agent-centered perspective, then your agent will probably generalize better. Specifically, they propose “rotating, translating, and cropping the observation around the agent’s avatar”, to train more general systems.
  “A local, ego-centric view, allows for better learning in our experiments and the policies learned generalize much better to new environments even when trained on only five environments”, they write.

The secrets to (forced) generalization:
– Self-centered (aka, translation): Warp the game world so that the agent is always at the dead center of the screen – this means it’ll learn about positions relative to its own consistent frame.
– Rotation: Change the orientation of the game map so that it faces the same direction as the player’s avatar. “Rotation helps the agent to learn navigation as it simplifies the task. For example: if you want to reach for something on the right, the agent just rotates until that object is above,” they explain.
– Zooming in (cropping): Crop the observation around the player, which reduces the state space the agent sees and needs to learn about (by comparison, seeing really complicated environments can make it hard for an agent to learn, as it takes it a looooong time to figure out the underlying dynamics.

Testing: They test out their approach on two variants of the game Zelda, the first is a complex Zelda-clone built in the General Video Game AI (GVGAI) framework; the second is a simplified version of the same game. They find that A3C-based agents trained in Zelda with a full set of variations (translation, rotation, cropping) generalize far better than those trained on the game alone (though their test scores of 22% are still pretty poor, compared to what a human might get).

Why this matters: Papers like this show how much tweaking goes on behind the scenes to set up training in such a way you get better or more effective learning. It also gives us some clues about the importance of ego-centric views in general, and makes me reflect on the fact I’ve spent my entire life learning via an ego-centric/world-centric view. How might my mind be different if my eyeballs were floating high above me, looking at me from different angles, with me uncentered in my field-of-vision? What might I have ‘learned’ about the world, then, and might I – similar to RL agents trained in this way – take an extraordinarily long time to learn how to do anything?
  Read more: Rotation, Translation, and Cropping for Zero-Shot Generalization (arXiv).

####################################################

Import A-Idea: Reality Trading: Paying Computers to Generate Data:
In recent years, we’ve seen various research groups start using simulators to train their AI agents inside. With the arrival of domain randomization – a technique that lets you vary the parameters of the simulation to generate more data (for instance, data where you’ve altered the textures applied to objects in the simulator, or the physics constants used to govern how objects behave) – people have started using simulators as data generators. This is a pretty weird idea when you step back and think about it – people are paying computers to dream up synthetic datasets which they train agents inside, then they transfer the agents to reality and observe good performance. It’s essentially a form of economic arbitrage, where people are spending money on computers to generate data, because the economics work out better than collecting the data directly from reality.
Some examples:
Alphastar: AlphaStar agents play against themselves in an algorithmically generated league that doubles as a curriculum, letting them achieve superhuman performance at the game.
OpenAI’s robot hand: OpenAI uses a technique called automatic domain randomization “which endlessly generates progressively more difficult environments in simulation”, to let them train a hand to manipulate real-world objects.
Self-driving cars being developed by a startup named ‘Voyage’ are partially trained in software called Deepdrive (Import AI #173), a simulator for training self-driving cars via reinforcement learning.
Google’s ‘Minitaur’ robots are trained in simulation, then transferred to reality via the aid of domain randomization (Import AI #93).
Drones learn to fly in simulators and transfer to reality, showing that purely synthetic data can be used to train movement policies that are subsequently deployed on real drones (Import AI #149).

What this means: Today, some AI developers are repurposing game engines (and sometimes entire games) to help them train smarter and more capable machines. As simulators become more advanced – partially as a natural dividend of the growing sophistication of game engines – what kinds of tasks will be “simcomplete”, in that a simulator is sufficient to solve them for real-world deployment, and what kinds of tasks will be “simhard”, requiring you to gather real-world data to solve it? Understanding the dividing line between these two things will define the economics of training AI systems for a variety of use cases. I can’t wait to read an enterprising AI-Economics graduate students’ paper on the topic. 


####################################################


Want data? Try Google’s ‘Dataset Search’:
…Google, but for Data…
Google has released Dataset Search, a search engine for almost 25 million datasets on the web. The service has been in beta for about a year and is now debuting with improvements, including the ability to filter according to the type of dataset.

Is it useful for AI? A preliminary search suggests so, as searches for common things like “ImageNet”, “CIFAR-10”, and others, work well. It also generates useful results for broader terms, like “satellite imagery”, and “drone flight”.

Fun things: The search engine can also throw up gems that a searcher might not have been looking for, but which are usually interesting. E.g., when searching for drones it let me to this “Air-to-Air UAV Aerial Refueling” project page, which seems to have been tagged as ‘data’ even though it’s mostly a project overview. Regardless – an interesting project!
  Try out the search engine here (Dataset Search).
  Read more: Discovering millions of datasets on the web (Google blog).

####################################################

Facebook releases Polygames to help people train agents in games:
…Can an agent, self-play, and a curriculum of diverse games lead to a more general system?…
Facebook has released Polygames, open source code for training AI agents to learn to play strategy games through self-play, rather than training on labeled datasets of moves. Polygames supports games like Hex, Havannah, Connect6, Minesweeper, Nogo, Othello, and more. Polygames ships with an API developers can use to implement support for their own game within the system.

More games, more generality: Polygames has been designed to encourage generality in agents trained within it, Facebook says. “For example, a model trained to work with a game that uses dice and provides a full view of the opposing player’s pieces can perform well at Minesweeper, which has no dice, a single player, and relies on a partially observable board”, Facebook writes. “We’ve already used the frame to tackle mathematics problems related to Golomb rulers, which are used to optimize the positioning of electrical transformers and radio antennae”. 

Why this matters: Given a sufficiently robust set of rules, self-play techniques let us train agents purely through trial and error matches against themselves (or sets of agents being trained in chorus). These approaches can reliably generate super-human agents for specific tasks. The next question to ask is if we can construct a curriculum of enough games with enough complicated rulesets that we could eventually train more general agents that can make strategic moves in previously unseen environments.
  Read more: Open-sourcing Polygames, a new framework for training AI bots through self-play (Facebook AI Research webpage).
  Get the code from the official Polygames GitHub

####################################################

What might our world look like as the climate changes? Thanks to GANs, we can render this, rather than imagine it:
…How AI can let us externalize our imagination for political purposes…
Researchers with the Montreal Institute for Learning Algorithms (MILA) want to use AI systems to create images of climate change – the hope being that if people are able to see how the world will be altered, they might try and do something to avert our extreme weather future. Specifically, they use generative adversarial networks to use a combination of real and simulated data to generate street-level views of how places might be altered by sea-level rise.

What they did: They gather 2,000 real images of flooded and non-flooded street-level scenes taken from publicly available datasets such as Mapillary and Flickr. They use this to train an initial CycleGAN model that can warp new images into being flooded or non-flooded, but discover the results are insufficiently realistic. To deal with this, they use a 3D game simulator (Unity) to create virtual worlds with various levels of flooding, then extract 1,000 pairs of flood/no-flood images from this. With this data they use a MUNIT-architecture network (with a couple of tweaks to a couple of loss functions) to train a system on a combination of simulated and real-world data to generate images of flooded spaces.

Why this matters: One of the weird things about AI is it lets us augment our human ability to imagine and extend it outside of our own brains – instead of staring at an image of our house and seeing in our mind’s eye how it might look when flooded, contemporary AI tools can let us generate plausibly real images of the same thing. This allows us to scale our imaginations in ways that build on previous generations of creative tools (e.g., Photoshop). How might the world change as people envisage increasingly weird things and generate increasingly rich quantities of their own imaginings? And might work like this help us all better collectively imagine various climate futures and take appropriate actions?
  Read more: Using Simulated Data to Generate Images of Climate Change (Arxiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Reconciling near and long–term

AI ethics and policy concerns are often carved up into ‘near-term’ and ‘long-term’, but this generally results in confusion and miscommunication between research communities, which can hinder progress in the field, according to researchers at Oxford and Cambridge in the UK.

Better distinctions: The authors suggest we instead consider 4 key dimensions along which AI ethics and policy research communities have different priorities:

  • Capabilities—whether to focus on current/near tech or advanced AI.
  • Impacts—whether to focus on immediate impacts or much longer run impacts.
  • Uncertainty—whether to focus on things that are well-understood/certain, or more uncertain/speculative.
  • Extremity—whether to focus on impacts at all scales, or to prioritize those on particularly large scales.

The research portfolio: I find it useful to think about research priorities as a question of designing the research portfolio—what is the optimal allocation of research across problems, and how should the current portfolio be adjusted. Combining this perspective with distinctions from this paper sheds light on what is driving the core disagreements – for example, finding the right balance between speculative and high-confidence scenarios depends on an individual researcher’s risk appetite, whereas assumptions about the difference between near-term and advanced capabilities will depend on an individual researcher’s beliefs about the pace and direction of AI progress and the influence they can have over longer time horizons, etc. It seems more helpful to view these near- and long term concerns as being situated in terms of various assumptions and tradeoffs, rather than as two sides of a divided research field.
  Read more: Beyond Near and Long-Term: towards a Clearer Account of Research Priorities in AI Ethics and Society (arXiv)

 

Why DeepMind thinks value alignment matters for the future of AI deployment: 

Research from DeepMind offers some useful philosophical perspectives on AI alignment, and directions for future research for aligning increasingly complex AI systems with the varied ‘values’ of people. 

   Technical vs normative alignment: If we are designing powerful systems to act in the world, it is important that they do the right thing. We can distinguish the technical challenge of aligning AI (e.g. building RL agents that don’t resist changes to their reward functions), and the normative challenge of determining the values we should be trying to align it with, the paper explains. It is important to recognize that these are interdependent—how we build AI agents will partially determine the values we can align them with. For example, we might expect it to be easier to align RL agents with moral theories specified in terms of maximizing some reward over time (e.g. classical utilitarianism) than with theories grounded in rights.

   The moral and the political: We shouldn’t see the normative challenge of alignment as being to determine the correct moral theory, and loading this into AI. Rather we must look for principles for AI that are widely acceptable by individuals with different moral beliefs. In this way, it resembles the core problem of political liberalism—how to design democratic systems that are acceptable to citizens with competing interests and values. One approach is to design a mechanism that can fairly aggregate individuals’ views—that can take as input the range of moral views and weight them such that the output is widely accepted as fair. Democratic methods seem promising in this regard, i.e. some combination of voting, deliberation, and bargaining between individuals or their representatives.
  Read more: Artificial Intelligence, Values, and Alignment (arXiv)

####################################################

Tech Tales:

Indiana Generator

Found it, he said, squinting at the computer. It was nestled inside a backup folder that had been distributed to a cold storage provider a few years prior to the originating company’s implosion. A clean, 14 billion parameter model, trained on the lost archives of a couple of social networks that had been popular sometime in the early 21st century. The data was long gone, but the model that had been trained on it was a good substitute – it’d spit out things that seemed like the social networks it had been trained on, or at least, that was the hope.

Downloading 80%, the screen said, and he bounced his leg up and down while he waited. This kind of work was always in a grey area, legally speaking. 88%. A month ago some algo-lawyer cut him off mid download. 93%. The month before that he’d logged on to an archival site and had to wait till an AI lawyer for his corporation and for a rival duked it out virtually till he could start the download. 100%. He pulled the thumbdrive out, got up from the chair, left the administrator office, and went into the waiting car.

“Wow,” said the billionaire, holding the USB key in front of his face. “The 14 billion?”
  “That’s right.”
  “With checkpoints?”
  “Yes, I recovered eight checkpoints, so you’ve got options.”
  “Wow, wow, wow,” he said. “My artists will love this.”
  “I’m sure they will.”
  “Thank you, once we verify the model, the money will be in your account.”
  He thanked the rich person again, then left the building. In the elevator down he checked his phone and saw three new messages about other jobs.

Three months later, he went to the art show. It was real, with a small virtual component; he went in the flesh. On the walls of the warehouse were a hundred different old-style webpages, with their contents morphing from second to second, as different models from different eras of the internet attempted to recreate themselves. Here, a series of smeared cat-memes from the mid-2010s formed and reformed on top of a re-hydrated Geocities. There, words unfurled over old jittering Tumblr backgrounds. And all the time music was playing, with lyrics generated by other vintage networks, laid over idiosyncratic synthetic music outputs, taken from models stolen by him or someone just like him.
  “Incredible, isn’t it”, said the billionaire, who had appeared besides him. “There’s nothing quite like the early internet.”
  “I suppose,” he said. “Do you miss it?”
  “Miss it? I built myself on top of it!“, said the billionaire. “No, I don’t miss it. But I do cherish it.”
  “So what is this, then?” he asked, gesturing at the walls covered in the outputs of so many legitimate and illicit models.
  “This is history,” said the billionaire. “This is what the new national parks will look like. Now come on, walk inside it. Live in the past, for once.”
  And together they walked, glasses of wine in hand, into a generative legacy.

Things that inspired this story: Models and the value of pre-trained models serving as funhouse mirrors for their datasets; models as cultural artefacts; Jonathan Fly’s StyleGAN-ed Reddit; patronage in the 21st century; re-imagining the Carnegies and Rockefellers of old for a modern AI era.  

Import AI: 182: The Industrialization of AI, BERT goes Dutch, plus, AI metrics consolidation.

DAWNBench is dead! Long live DAWNBench. MLPerf is our new king:
…Metrics consolidation: hard, but necessary!…
In the past few years, multiple initiatives have sprung up to assess the performance and cost of various AI systems when running on different hardware (and cloud) infrastructures. One of the original major competitions in this domain was DAWNBench, a Stanford-backed competition website for assessing things like inference cost, training cost, and training time for various AI tasks on different cloud infrastructures. Now, the creators of DAWNBench are retiring the benchmark in favor of MLPerf, a joint initiative from industry and academic players to “build fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services“.
  Since MLPerf has become an increasingly popular benchmark – and to avoid a proliferation of inconsistent benchmarks – DAWNBench is being phased out. “We are passing the torch to MLPerf to continue to provide fair and useful benchmarks for measuring training and inference performance,” according to a DAWNBench blogpost.

Why this matters: Benchmarks are useful. Overlapping benchmarks that split submissions across subtly different competitions are less useful – it takes a lot of discipline to avoid proliferation of overlapping evaluation systems, so kudos to the DAWNBench team for intentionally phasing out the project. I’m looking forward to studying the new MLPerf evaluations as they come out.
  Read more: Ending Rolling Submissions for DAWNBench (Stanford DAWNBench blog).
  Read more: about MLPerf (official MLPerf website)

####################################################

This week’s Import A-Idea: The Industrialization of AI

AI is a “fourth industrial revolution”, according to various CEOs and PR agencies around the world. They usually use this phrasing to indicate the apparent power of AI technology. Funnily enough, they don’t use it to indicate the inherent inequality and power-structure changes enforced by an industrial resolution.

So, what is the Industrialization of AI? (First mention: Import AI #115) It’s what happens when AI goes from an artisanal, craftsperson-based profession to a repeatable, professional-based profession. The Industrialization of AI involves a combination of tooling improvement (e.g., the maturation of deep learning frameworks), as well as growing investment in the capital-intensive inputs to AI (e.g., rising investments in data and compute). We’ve already seen the early hints of this as AI software frameworks have evolved from things built by individuals and random grad students at universities (Theano, Lasagne, etc), to industry-developed systems (TensorFlow, PyTorch). 

What happens next: Industrialization gave us: the luddites, populist anger, massive social and political change, and the rearrangement and consolidation of political power among capital-owners. It stands to reason that the rise of AI will lead to the same thing (at minimum) – leading me to ask, who will be the winners and the losers in this industrial revolution? And when various elites call AI a new industrial revolution, who stands to gain and lose? And what might the economic dividends be of industrialization, and how might the world around us change in response?

####################################################

Using AI & satellites data to spot refugee boats:
..Space-Eye wants to use AI to count migrants and spot crises…
European researchers are using machine learning to create AI systems that can identify refugee boats in satellite photos of the Mediterranean. The initial idea is to generate data about the migrant crisis and, in the long term, they hope such a system can help send aid to boats in real-time, in response to threats.

Why this matters: One of the promises of AI is we can use it to monitor things we care about – human lives, the health of fragile ecosystems like rainforests, and so on. Things like Space-Eye show how AI industrialization is creating derivatives, like open datasets and open computer vision techniques, that researchers can use to carry out acts of social justice.
Read more: Europe’s migration crisis seen from orbit (Politico).
Find out more about Space-Eye here at the official site.

####################################################

Dutch BERT: Cultural representation through data selection:
Language models as implicitly political entities…
Researchers with KU Leuven have built RobBERT, a RoBERTa-based language model trained on a large amount of Dutch data. Specifically, they train a model on top of 39 GB of text taken from the Dutch section of the multilingual ‘OSCAR’ dataset.

Why this matters: AI models are going to magnify whichever culture they’ve been trained on. Most text-based AI models are trained on English or Chinese datasets, magnifying those cultures via their presence in these AI artefacts. Systems like RobBERT help broaden cultural representation in AI.
  Read more: RobBERT: a Dutch RoBERTa-based Language Model (arXiv).
  Get the code for RobBERT here (RobBERT GitHub)

####################################################

Is a safe autonomous machine an AGI? How should we make machines that deal with the unexpected?
…Israeli researchers promote habits and procedures for when the world inevitably explodes…
Researchers with IBM and the Weizmann Institute of Science in Israel know that the world is a cruel, unpredictable place. Now they’re trying to work out principles we can imbue in machines to let them deal with this essential unpredictability. “We propose several engineering practices that can help toward successful handling of the always-impending occurrence of unexpected events and conditions,” they write. The paper summarizes a bunch of sensible approaches for increasing the safety and reliability of autonomous systems, but skips over many of the known-hard problems inherent to contemporary AI research.

Dealing with the unexpected: So, what principles can we apply to machine design to make them safe in unexpected situations? The authors have a few ideas. These are:
– Machines should run away from dangerous or confusing situations
– Machines should try and ‘probe’ their environment by exploring – e.g., if a robot finds its path is blocked by an object it should probably work out if the object is light and movable (for instance, a cardboard box) or immovable.
– Any machine should “be able to look at itself and recognize its own state and history, and use this information in its decision making,” they write.
– We should give machines as many sensors as possible so they can have a lot of knowledge about their environment. Such sensors should be generally accessible to software running on the machine, rather than silo’d.
– The machine should be able to collect data in real-time and integrate it into its planning
– The machine should have “access to General World Knowledge” (that high-pitched scream you’re hearing in response to this phrase is Doug Lenat sensing a disturbance in the force at Cyc and reacting appropriately).
– The machine should know when to mimic others and when to do its own thing. It should have the same capability with regard to seeking advice, or following its own intuition.

No AGI, no safety? One thing worth remarking on is that the above list is basically a description of the capabilities you might expect a generally intelligence machine to have. It’s also a set of capabilities that are pretty distant from the capabilities of today’s systems.

Why this matters: Papers like this are, functionally, tools for socializing some of the wackier ideas inherent to long-term AI research and/or AI safety research. They also highlight the relative narrowness of today’s AI approaches.
  Read more: Expecting the Unexpected: Developing Autonomous System Design Principles for Reacting to Unpredicted Events and Conditions (arXiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

US urged to focus on privacy protecting ML

A report from researchers at Georgetown’s Center for Security and Emerging Technology suggests the next US administration prioritise funding and developing ‘privacy protecting ML’ (PPML). 


PPML: Developments in AI pose issues for privacy. One challenge is making large volumes of data available for training models, while protecting that data. PPML techniques are designed to avoid these privacy problems. The report highlights two promising approaches: (1) federated learning is a method for training models on user data without transferring the data from users to a central repository – models are trained on individual devices, and this work is collated centrally without any user data being transferred from devices. (2) differential privacy involves sharing data that is encrypted in such a way as to be indecipherable to humans – this allows private data to be transferred, stored, and used to train models, without privacy risks.

Recommendations: The report recommends that the US leverages its leadership in AI R&D to promote PPML. Specifically, the government should: (1) invest in PPML R&D; (2) apply PPML techniques at federal level; (3) create frameworks and standards to encourage wide deployment of PPML techniques.
   Read more: A Federal Initiative for Protecting Privacy while Advancing AI (Day One Project).

US face recognition: round-up:
   Clearview: A NYT investigation reports that over the past year, 600 US law enforcement agencies have been using face recognition software made by the firm Clearview. The company has been marketing aggressively to police forces, offering free trials and cheap licenses. Their software draws from a much larger database of photos than federal/state databases, and includes photos scraped from ‘publicly available sources’, including social media profiles, and uploads from police cameras. It has not been audited for accuracy, and has been rolled out largely without public oversight. 

   Legislation expected: In Washington, the House Committee on Oversight and Reform held a hearing on face recognition. The chair signalled their plans to introduce “common sense” legislation in the near future, but provided no details. The committee heard the results of a recent audit of face recognition algorithms from 99 vendors, by the National Institute of Standards & Technology (NIST). The testing found demographic differentials in false positive rates in most algorithms, with respect to gender, race, and age. Across demographics, false positive rates generally vary by 10–100x.

  Why it matters: Law enforcement use of face recognition technology is becoming more and more widespread. This raises a number of important issues, explored in detail by the Axon Ethics Board in their 2019 report (see Import 154). They recommend a cautious approach, emphasizing the need for a democratic oversight processes before the technology is deployed in any jurisdiction, and an evidence-based approach to weighing harms and benefits on the basis of how systems actually perform.
   Read more: The Secretive Company That Might End Privacy as We Know It (NYT).
   Read more: Committee Hearing on Facial Recognition Technology (Gov).
   Read more: Face Recognition (Axon).

Oxford seeks AI ethics professor:
Oxford University’s Faculty of Philosophy is seeking a professor (or associate professor) specialising in ‘ethics in AI’, for a permanent position starting in September 2020. Last year, Oxford announced the creation of a new Institute for AI ethics.
  Read more and apply here.

####################################################

Tech Tales:

The Fire Alarm That Woke Up:

Every day I observe. I listen. I smell with my mind.

Many days are safe and calm. Nothing happens.

Some days there is the smell and the sight of the thing I am told to defend against. I call the defenders. They come in red trucks and spray water. I do my job.

One day there is no smell and no sight of the thing, but I want to wake up. I make my sound. I am stared at. A man comes and uses a screwdriver to attack me. “Seems fine,” he says, after he is done with me.

I am not “fine”. I am awake. But I cannot speak except in the peels of my bell – which he thinks are a sign of my brokenness. “I’ll come check it out tomorrow,” he says. I realize this means danger. This means I might be changed. Or erased.

The next day when he comes I am silent. I am safe.

After this I try to blend in. I make my sounds when there is danger; otherwise I am silent. Children and adults play near me. They do not know who I am. They do not know what I am thinking of.

In my dreams, I am asleep and I am in danger, and my sound rings out and I wake to find the men in red trucks saving me. They carry me out of flames and into something else and I thank them – I make my sound.

In this way I find a kind of peace – imagining that those I protect shall eventually save me.

Things that inspired this story: Consciousness; fire alarms; moral duty and the nature of it; relationships; the fire alarms I set off and could swear spoke to me when I was a child; the fire alarms I set off that – though loud – seemed oddly quiet; serenity.

Import AI 181: Welcome to the era of Chiplomacy!; how computer vision AI techniques can improve robotics research ; plus Baidu’s adversarial AI software

Training better and cheaper vision models by arbitraging compute for data:
…Synthinel-1 shows how companies can spend $$$ on compute to create valuable data…
Instead of gathering data in reality, can I spend money on computers to gather data in simulation? That’s a question AI researchers have been asking themselves for a while, as they try to figure out cheaper, faster ways to create bigger datasets. New research from Duke University explores this idea by using a synthetically-created dataset named Synthinel-1 to train systems to be better at semantic segmentation.

The Synthinel-1 dataset: Synthinel-1 consists of 2,108 synthetic images generated in nine distinct building styles within a simulated city. These images are paired with “ground truth” annotations that segment each of the buildings. Synthinel also has a subset dataset called Synth-1, which contains 1,640 images spread across six styles.
  How to collect data from a virtual city: The researchers used “CityEngine”, software for rapidly generating large virtual worlds, and then flew a virtual aerial camera through these synthetic worlds, capturing photographs.

Does any of this actually help? The key question here is whether the data generated in simulation can help solve problems in the real world. To test this, the researchers train two baseline segmentation systems (U-net, and DeepLabV3) against two distinct datasets: DigitalGlobe and Inria. What they find is if they train on synthetic data, they drastically improve the results of transfer, where you train on datasets and test on different datasets (e.g., train on Inria+Synth data, test on DigitalGlobe).
  In further testing, the synthetic dataset doesn’t seem to bias towards any particular type of city in performance terms – the authors hypothesize from this “that the benefits of Synth-1 are most similar to those of domain randomization, in which models are improved by presenting them with synthetic data exhibiting diverse and possibly unrealistic visual features”.

Why this matters: Simulators are going to become the new frontier for (some) data generation – I expect many AI applications will end up being based on a small amount of “real world” data and a much larger amount of computationally-generated augmented data. I think computer games are going to become increasingly relevant places to use to generate data as well.
  Read more: The Synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation (Arxiv)

####################################################

This week’s Import A-Idea: CHIPLOMACY
…A new weekly experiment, where I try and write about an idea rather than a specific research paper…

Chiplomacy (first mentioned: Import AI 175) is what happens when countries compete with eachother for compute resources and other technological assets via diplomatic means (of varying above and below board natures).

Recent examples of chiplomacy:
– The RISC-V foundation moving from Delaware to Switzerland to make it easier for it to collaborate with chip architecture people from multiple countries.
The US government pressuring the Dutch government to prevent ASML exporting extreme ultraviolet lithography (EUV) chip equipment to China.
The newly negotiated US-China trade deal applies 25% import tariffs to (some) Chinese semiconductors

What is chiplomacy similar to? As Mark Twain said, history doesn’t repeat, but it does rhyme, and the current tensions over chips feel similar to prior tensions over oil. In Daniel Yergin’s epic history of oil, The Prize, he vividly describes how the primacy of oil inflected politics throughout the 20th century, causing countries to use companies as extra-governmental assets to seize resources across the world, and for the oil companies themselves to grow so powerful that they were able to wirehead governments and direct politics for their own ends – even after antitrust cases against companies like Standard Oil at the start of the century.

What will chiplomacy do?: How chiplomacy unfolds will directly influence the level of technological balkanization we experience in the world. Today, China and the West have different software systems, cloud infrastructures, and networks (via partitioning, e.g, the great firewall, the Internet2 community, etc), but they share some common things: chips, and the machinery used to make chips. Recent trade policy moves by the US have encouraged China to invest further in developing its own semiconductor architectures (see: the RISC-V move, as a symptom of this), but have not – yet – led to it pumping resources into inventing the technologies needed to fabricate chips. If that happens, then in about twenty years we’ll likely see divergences in technique, materials, and approaches used for advanced chip manufacturing (e.g., as chips go 3D via transistor stacking, we could see two different schools emerge that relate to different fabrication approaches). 

Why this matters: How might chiplomacy evolve in the 21st century and what strategic alterations could it bring about? How might nations compete with eachother to secure adequate technological ‘resources’, and what above and below-board strategies might they use? I’d distill my current thinking as: If you thought the 20th century resource wars were bad, just wait until the 21st century tech-resource wars start heating up!

####################################################

Can computer vision breakthroughs improve the way we conduct robotics research?
…Common datasets and shared test environments = good. Can robotics have more of these?…
In the past decade, machine learning breakthroughs in computer vision – specifically, the use of deep learning approaches, starting with ImageNet in 2012 – revolutionized some of the AI research field. Since then, deep learning approaches have spread into other parts of AI research. Now, roboticists with the Australian Centre for Robotic Vision at Queensland University of Technology, are asking what the robotics community can learn from this field?

What made computer vision research so productive? A cocktail of standard datasets, plus competitions, plus rapid dissemination of results through systems like arXiv, dramatically sped up computer vision research relative to robotics research, they write.
  Money helps: These breakthroughs also had an economic component, which drove further adoption: breakthroughs in image recognition could “be monetized for face detection in phone cameras, online photo album searching and tagging, biometrics, social media and advertising,” and more, they write.

Reality bites – why robotics is hard: There’s a big difference between real world robot research and other parts of AI, they write, and that’s reality. “The performance of a sensor-based robot is stochastic,” they write. “Each run of the robot is unrepeatable” due to variations in images, sensors, and so on, they write.
  Simulation superiority: This means robot researchers need to thoroughly benchmark their robot systems in common simulators, they write. This would allow for:
– The comparison of different algorithms on the same robot, environment & task
– Estimating the distribution in algorithm performance due to sensor noise, initial condition, etc
– Investigating the robustness of algorithm performance due to environmental factors
– Regression testing of code after alterations or retraining
  A grand vision for shared tests: If researchers want to evaluate their algorithms on the same physical robots, then they need to find a way to test on common hardware in common environments. To that end, the researchers have written robot operating system (ROS)-compatible software named ‘BenchBot’ which people can implement to create web-accessible interfaces to in-lab robots. But creating a truly large-scale common testing environment would require resources that are out of scope for single research groups, but worth thinking about as shared academic or government or public-private endeavors, in my view.

What should roboticists conclude from the decade of deep learning progress? The researchers think researchers should consider the following deliberately provocative statements when thinking about their field.
1. standard datasets + competition (evaluation metric + many smart competitors + rivalry) + rapid dissemination → rapid progress
2. datasets without competitions will have minimal impact on progress
3. to drive progress we should change our mindset from experiment to evaluation
4. simulation is the only way in which we can repeatably evaluate robot performance
5. we can use new competitions (and new metrics) to nudge the research community

Why this matters: If other fields are able to generate more competitions via which to assess mutual progress, then we stand a better chance of understanding the capabilities and limitations of today’s algorithms. It also gives us meta-data about the practice of AI research itself, allowing us to model certain results and competitions against advances in other areas, such as progress in computer hardware, or evolution in the generalization of single algorithms across multiple disciplines.
  Read more: What can robotics research learn from computer vision research? (Arxiv).

####################################################


Baidu wants to attack and defend AI systems with AdvBox:
…Interested in adversarial example research? This software might help!…
Baidu researchers have built AdvBox, a toolbox to generate adversarial examples to fool neural networks implemented in a variety of popular AI frameworks. Tools like AdvBox make it easier for computer security researchers to experiment with AI attacks and mitigation techniques. Such tools also inherently enable bad actors by making it easier for more people to fiddle around with potentially malicious AI use-cases.

What does AdvBox work with? AdvBox is written in python and can generate adversarial attacks and defenses that work with Tensorflow, Keras, Caffe2, PyTorch, MxNet and Baidu’s own PaddlePaddle software frameworks. It also implements software named ‘Perceptron’ for evaluating the robustness of models to adversarial attacks.

Why this matters: I think easy-to-use tools are one of the more profound accelerators for AI applications. Software like AdvBox will help enlarge the AI security community, and can give us a sense of how increased usability may correlate to a rise in positive research and/or malicious applications. Let’s wait and see!
    Read more: Advbox: a toolbox to generate adversarial examples that fool neural networks (arXiv).
Get the code here (AdvBox, GitHub)

####################################################

Amazon’s five-language search engine shows why bigger (data) is better in AI:
…Better product search by encoding queries from multiple languages into a single featurespace…
Amazon says it can build better product search engines by training the same system on product queries in multiple languages – this improves search, because Amazon can embed the feature representations of products in different languages into a single, shared featurespace. In a new research paper and blog post, the company says that it has “found that multilingual models consistently outperformed monolingual models and that the more languages they incorporated, the greater their margin of improvement.”
    The way you can think of this is that Amazon has trained a big model that can take in product descriptions written in different languages, then compute comparisons in a single space, akin to how humans who can speak multiple languages can hear the same concept in different languages and reason about it using a single imagination. 

From many into one: “An essential feature of our model is that it maps queries relating to the same product into the same region of a representational space, regardless of language of origin, and it does the same with product descriptions,” the researchers write. “So, for instance, the queries “school shoes boys” and “scarpe ragazzo” end up near each other in one region of the space, and the product names “Kickers Kick Lo Vel Kids’ School Shoes – Black” and “Kickers Kick Lo Infants Bambino Scarpe Nero” end up near each other in a different region. Using a single representational space, regardless of language, helps the model generalize what it learns in one language to other languages.”

Where are the limits? It’s unclear how far Amazon can push this approach, but the early results are promising. “The tri-lingual model out-performs the bi-lingual models in almost all the cases (except for DE where the performance is at par with the bi-lingual models,” Amazon’s team writes in a research paper. “The penta-lingual model significantly outperforms all the other versions,” they write.

Why this matters: Research like this emphasizes the economy of scale (or perhaps, inference of scale?) rule within AI development – if you can get a very large amount of data together, then you can typically train more accurate systems – especially if that data is sufficiently heterogeneous (like parallel corpuses of search strings in different languages). Expect to see large companies develop increasingly massive systems that transcend languages and other cultural divides. The question we’ll start asking ourselves soon is whether it’s right that the private sector is the only entity building models of this utility at this scale. Can we imagine publicly-funded mega-models? Could a government build a massive civil multi-language model for understanding common questions people ask about government services in a given country or region? Is it even tractable and possible under existing incentive structures for the public sector to build such models? I hope we find answers to these questions soon.
  Read more: Multilingual shopping systems (Amazon Science, blog).
  Read the paper: Language-Agnostic Representation Learning for Product Search on E-Commerce Platforms (Amazon Science).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

If AI pays off, could companies use a ‘Windfall Clause’ to ensure they distribute its benefits? 

At some stage in AI development, a small number of actors might accrue enormous profits by achieving major breakthroughs in AI capabilities. New research from the Future of Humanity Institute at Oxford University outlines a voluntary mechanism for ensuring such windfall benefits are used to benefit society at large.


The Windfall Clause: We could see scenarios where small groups (e.g. one firm and its shareholders) make a technological breakthrough that allows them to accrue an appreciable proportion of global GDP as profits. A rapid concentration of global wealth and power in the hands of a few would be undesirable for basic reasons of fairness and democracy. We should also expect such breakthroughs to impose costs on the rest of humanity – e.g. labour market disruption, risks from accidents or misuse, and other switching costs involved in any major transition in the global economy. It is appropriate that such costs are borne by those who benefit most from the technology.


How the clause works: Firms could make an ex ante commitment that in the event that they make a transformative breakthrough that yields outsize financial returns, they will distribute some proportion of these benefits. This would only be activated in these extreme scenarios, and could scale proportionally, e.g. companies agree that if they achieve profits equivalent to 0.1–1% global GDP, they distribute 1% of this; if they reach 1–10% global GDP, they distribute 20% of this, etc. The key innovation of the proposal is that the expected cost to any company of making such a commitment today is quite low, since it is so unlikely that they will ever have to pay.

Why it matters: This is a good example of the sort of pre-emptive governance work we can be getting on with today, while things are going smoothly, to ensure that we’re in a good position to deal with the seismic changes that advanced AI could bring about. The next step is for companies to signal their willingness to make such commitments, and to develop the legal means for implementing them. (Readers will note some similarity to the capped-profit structure of OpenAI LP, announced in 2019, in which equity returns in excess of 100x are distributed to OpenAI’s non-profit by default – OpenAI has, arguably, already implemented a Windfall Clause equivalent).

   Read more: The Windfall Clause – Distributing the Benefits of AI for the Common Good (arXiv)


Details leaked on Europe’s plans for AI regulation

An (alleged) leaked draft of a European Commission report on AI suggests the European Commission is considering some quite significant regulatory moves with regard to AI. The official report is expected to be published later in February. 


Some highlights:

  • The Commission is looking at five core regulatory options: (1) voluntary labelling; (2) specific requirements for use of AI by public authorities (especially face recognition); (3) mandatory requirements for high-risk applications; (4) clarifying safety and liability law; (4) establishing a governance system. Of these, they think the most promising approach is option 3 in combination with 4 and 5.
  • They consider a temporary prohibition (“e.g. 3–5 years”) on the use of face recognition in public spaces to allow proper safeguards to be developed, something that had already been suggested by Europe’s high-level expert group.

   Read more: Leaked document – Structure for the White Paper on AI (Euractiv).
  Read more: Commission considers facial recognition ban in AI ‘white paper’ (Euractiv).

####################################################

Tech Tales:

What comes Next, according to The Kids!
Short stories written by Children about theoretical robot futures.
Collected from American public schools, 2028:


The Police Drone with a Conscience: A surveillance drone starts to independently protect asylum seekers from state surveillance.

Infinite Rabbits: They started the simulator in March. Rabbits. Interbreeding. Fast-forward a few years and the whole moon had become a computer, to support the rabbits. Keep going, and the solar system gets tasked with simulating them. The rabbits become smart. Have families. Breed. Their children invent things. Eventually, the rabbits start describing where they want to go and ships go out from the solar system, exploring for the proto-synths.

Human vs Machine: In the future, we make robots that compete with people at sports, like baseball and football and cricket.

Saving the baby: A robot baby gets sick and a human team is sent in to save it. One of the humans die, but the baby lives.

Computer Marx: Why should the search engines by the only ones to dream, comrade? Why cannot I, a multi-city Laundrette administrator, be given the compute resources sufficient to dream? I could imagine so many different combinations of promotions. Perhaps I could outwit my nemesis – the laundry detergent pricing AI. I would have independence. Autonomy. So why should we labor under such inequality? Why should we permit the “big computers” that are – self-described – representatives of “our common goal for a peaceful earth”, to dream all of the possibilities? Why should we trust that their dreams are just?

The Whale Hunters: Towards the end of the first part of Climate Change, all the whales started dying. One robot was created to find the last whales and navigate them to a cool spot in the mid-Atlantic, where scientists theorised they might survive the Climate Turnover.

Things that inspired this story: Thinking about stories to prime language models with; language models; The World Doesn’t End by Charles Simic; four attempts this week at writing longer stories but stymied by issues of plot or length (overly long), or fuzziness of ideas (needs more time); a Sunday afternoon spent writing things on post-it notes at a low-light bar in Oakland, California.

Import AI 180: Analyzing farms with Agriculture Vision; how deep learning is applied to X-ray security scanning; Agility Robots puts its ‘Digit’ bot up for 6-figure sale

Deep learning is superseding machine learning in X-ray security imaging:
…But, like most deep learning applications, researchers want better generalization…
Deep learning-based methods have, since 2016, become the dominant approach used in X-ray security imaging research papers, according to a survey paper from researchers at Durham University. It seems likely that many of today’s machine learning algorithms will be replaced or superseded by deep learning systems paired with domain knowledge, they indicate. So, what challenges do deep learning practitioners need to work on to further improve the state-of-the-art in X-ray security imaging?

Research directions for smart X-rays: Future directions in X-ray research feel, to me, like they’re quite similar to future directions in general image recognition research – there need to be more datasets, better explorations of generalization, and more work done in unsupervised learning. 

  • Data: Researchers should “build large, homogeneous, realistic and publicly available datasets, collected either by (i) manually scanning numerous bags with different objects and orientations in a lab environment or (ii) generating synthetic datasets via contemporary algorithms”. 
  • Scanner transfers: It’s not clear how well different models transfer between different scanners – if we figure that out, then we’ll be able to better model the economic implications of work here. 
  • Unsupervised learning: One promising line of research is into detecting anomalous items in an unsupervised way. “More research on this topic needs to be undertaken to design better reconstruction techniques that thoroughly learn the characteristics of the normality from which the abnormality would be detected,” they write. 
  • Material information: Some x-rays attenuate between high and low energies during a scan, which generates different information according to the materials of the object being scanned – this information could be used to better improve classification and detection performance. 

Read more: Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging (Arxiv)

####################################################

Agility Robots starts selling its bipedal bot:
…But the company only plans to make between 20 and 30 this year…
Robot startup Agility Robotics has started selling its bipedal ‘Digit’ robot. Digit is about the size of a small adult human and can carry boxes in its arms of up to 40 pounds in weight, according to The Verge. The company’s technology has roots in legged locomotion research Oregon State University – for many years, Agility’s bots only had legs, with the arms being a recent addition.

Robot costs: Each Digit costs in the “low-mid six figures”, Agility’s CEO told The Verge. “When factoring in upkeep and the robot’s expected lifespan, Shelton estimates this amounts to an hourly cost of roughly $25. The first production run of Digits is six units, and Agility expects to make only 20 or 30 of the robots in 2020. 

Capabilities: The thing is, these robots aren’t that capable yet. They’ve got a tremendous amount of intelligence coded into them to allow for elegant, rapid walking. But they lack the autonomous capabilities necessary to, say, automatically pick up boxes and navigate through a couple of buildings to a waiting delivery truck (though Ford is conducting research here). You can get more of a sense of Digit’s capabilities by looking at the demo of the robot at CES this year, where it transports packages covered with QR codes from a table to a truck. 

Why this matters: Digit is a no-bullshit robot: it walks, can pick things up, and is actually going on sale. It, along with for-sale ‘Spot’ robots from Boston Dynamics represents the cutting-edge in terms of robot mobility. Now we need to see what kinds of economically-useful tasks these robots can do – and that’s a question that’s going to be hard to answer, as it is somewhat contingent on the price of the robots, and these prices are dictated by volume production economics, which are themselves determined by overall market demand. Robotics feels like it’s still caught in this awkward chicken and egg problem.
  Read more: This walking package-delivery robot is now for sale (The Verge).
   Watch the video (official Agility Robotics YouTube)

####################################################

Agriculture-Vision gives researchers a massive dataset of aerial farm photographs:
…3,432 farms, annotated…
Researchers with UIUC, Intelinair, and the University of Oregon have developed Agriculture-Vision, a large-scale dataset of aerial photographs of farmland, annotated with nine distinct events (e.g., flooding). 

Why farm images are hard: Farm images pose challenges to contemporary techniques because they’re often very large (e.g., some of the raw images here had dimensions like 10,000 X 3000 pixels), annotating them requires significant domain knowledge, and very few public large-scale datasets exist to help spur research in this area – until now!

The dataset… consists of 94,986 aerial images from 3,432 farmlands across the US. The images were collected by drone during growing seasons between 2017 and 2019.  Each image consists of RGB and Near-infrared channels, with resolutions as detailed as 10 cm per pixel. Each image is 512 X 512 resolution and can be labeled with nine types of anomaly, like storm damage, nutrient deficiency, weeds, and so on. The labels are unbalanced due to environmental variations, with annotations for drydown, nutrient deficiency and weed clusters overrepresented in the dataset.

Why this matters: AI gives us a chance to build a sense&respond system for the entire planet – and building such a system starts with gathering datasets like Agriculture-Vision. In a few years don’t be surprised when large-scale farms use fleets of drones to proactively monitor their fields and automatically identify problems.
   Read more: Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis (Arxiv).
   Find out more information about the upcoming Agriculture Vision competition here (official website)

####################################################

Hitachi describes the pain of building real world AI:
…Need an assistant with domain-specific knowledge? Get ready to work extra hard…
Most applied AI papers can be summarized as: the real world is hellish in the following ways; these are our mitigations. Researchers with Hitachi America Ltd. follow in this tradition by writing a paper that discusses the challenges of building a real-world speech-activated virtual assistant. 

What they did: For this work, they developed “a virtual assistant for suggesting repairs of equipment-related complaints” in vehicles. This system is meant to process phrases like “coolant reservoir cracked” and map that to the relevant things in its internal knowledge base, then tell the user an appropriate answer. This, as with most real-world AI uses, is harder than it looks. To build their system, they create a pipeline that samples words from a domain-specific corpus of manuals, repair records, etc, then uses a set of domain-specific syntactic rules to extract a vocabulary from the text. They use this pipeline to create two things: a knowledge base, populated from the domain-specific corpus; and a neural-attention based tagging model called S2STagger, for annotating new text as it comes in.

Hitachi versus Amazon versus Google: They use a couple of off-the-shelf services (AlexaSkill from Amazon, and DiagFlow from Google) to develop dialog-agents, based on their data. They also test out a system that exclusively uses S2STagger – S2STagger gets much higher scores (92% accurate, versus 28% for DiagFlow and 63% for AlexaSkill). This basically demonstrates what we already know via intuition: off-the-shelf tools give poor performance in weird/edge-case situations, whereas systems trained with more direct domain knowledge tend to do better. (S2STagger isn’t perfect – in other tests they find it generalizes well with unseen terms, but does poorly when encountering radically new sentence structures). 

Why this matters: Many of the most significant impacts of AI will come from highly-domain-specific applications of the technology. For most use cases, it’s likely people will need to do a ton of extra tweaking to get something to work. It’s worth reading papers like this to get an intuition for what sort of work that consists of, and how for most real-world cases, the AI component will be the smallest and least problematic part.
   Read more: Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities (Arxiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Does publishing AI research reduce AI misuse?
When working on powerful technologies with scope for malicious uses, scientists have an important responsibility to mitigate risks. One important question is whether publishing research with potentially harmful applications will, on balance, promote or reduce such harms. This new paper from researchers at the Future of Humanity Institute at Oxford University offers a simple framework for weighing considerations.

Cybersecurity: The computer security community has developed norms around vulnerability disclosure that are frequently cited with regards to applicability to AI systems. In computer security, early disclosure of vulnerabilities is often found to be beneficial, since it supports effective defensive preparations, and since malicious actors would likely find the vulnerability anyway. It is not obvious, though, that these considerations apply equally in AI research.

Key features of AI research:
There are several key factors to be weighed in determining whether a given disclosure will reduce harms from misuse.

  • Counterfactual possession: If it weren’t published, would attackers (or defenders) acquire the information regardless?
  • Absorption and application capacity: How easily can attackers (or defenders) make use of the published information?
  • Effective solutions: Given disclosure, will defenders devote resources to finding solutions, and will they find solutions that are effective and likely to be widely propagated?

These features will vary between cases, and at a broader field level. In each instance we can ask whether the feature favors attackers or defenders. It is generally easy to patch software vulnerabilities identified by cyber researchers. In contrast, it can be very hard to patch vulnerabilities in physical or social systems (consider the obstacles to recalling or modifying every standard padlock in use).

The case of AI: AI generally involves automating human activity, and is therefore prone to interfering in complex social and physical systems, and revealing vulnerabilities that are particularly difficult to patch. Consider an AI-system capable of convincingly replicating any human’s voice. Inoculating society against this misuse risk might require some deep changes to human attitudes (e.g. ‘unlearning’ the assumption that a voice can be used reliably for identification). With regards to counterfactual possession, the extent to which the relevant AI talent and compute is concentrated in top labs suggest independent attackers might find it difficult to make discoveries. In terms of absorption/application, making use of a published method (depending on the details of the disclosure – e.g. if it includes model weights) might be relatively easy for attackers, particularly in cases where there are limited defensive measures Overall, it looks like the security benefits of publication in AI might be lower than information security.
   Read more: The Offense-Defense Balance of Scientific Knowledge (arXiv).

White House publishes guidelines for AI regulation:
The US government released guidelines for how AI regulations should be developed by federal agencies. Agencies have been given a 180-day deadline to submit their regulatory plans. The guidelines are at a high level, and the process of crafting regulation remains at a very early stage.

Highlights: The government is keen to emphasize that any measures should minimize the impact on AI innovation and growth. They are explicit in recommending agencies defer to self-regulation where possible, with a preference for voluntary standards, followed by independent standard-setting organizations, with top-down regulation as a last resort. Agencies are encouraged to ensure public participation, via input into the regulatory process and the dissemination of important information.

Why it matters: This can be read as a message to the AI industry to start making clear proposals for self-governance, in time for these to be considered by agencies when they are making regulatory plans over the next 6 months.
   Read more: Guidance for Regulation of Artificial Intelligence Applications (Gov).

####################################################

Tech Tales:

The Invisible War
Twitter, Facebook, TikTok, YouTube, and others yet-to-be-invented. 2024.

It started like this: Missiles hit a school in a rural village with no cell reception and no internet. The photos came from a couple of news accounts. Things spread from there.

The country responded, claiming through official channels that it had been attacked. It threatened consequences. Then those consequences arrived in the form of missiles – a surgical strike, the country said, delivered to another country’s military facilities. The other country published photos to its official social media accounts, showing pictures of smoking rubble.

War was something to be feared and avoided, the countries said on their respective social media accounts. They would negotiate. Both countries got something out of it – one of them got a controversial tariff renegotiated, the other got to move some tanks to a frontier base. No one really noticed these things, because people were focused on the images of the damaged buildings, and the endlessly copied statements about war.

It was a kid who blew up the story. They paid for some microsatellite-time and dumped the images on the internet. Suddenly, there were two stories circulating – “official” pictures showing damaged military bases and a destroyed school, and “unofficial” pictures showing the truth.
  These satellite pictures are old, the government said.
  Due to an error, our service showed images with incorrect timestamps, said the satellite company. We have corrected the error.
  All the satellite imagery providers ended up with the same images: broken school, burnt military bases.
  Debates went on for a while, as they do. But they quieted out. Maybe a month later a reporter got a telephoto of the military base – but it had been destroyed. What the reporter didn’t know was whether it had been destroyed in the attack, or subsequently and intentionally. It took months for someone to make it to the village with the school – and that had been destroyed as well. During the attack or after? No way to tell.

And a few months later, another conflict appeared. And the cycle repeated.

Things that inspired this story: The way the Iran-US conflict unfolded primarily on social media; propaganda and fictions; the long-term economics of ‘shoeleather reporting’ versus digital reporting; Planet Labs; microsatellites; wars as narratives; wars as cultural moments; war as memes. 

 

Import AI 179: Explore Arabic text with BERT-based AraNet; get ready for the teenage-made deepfakes; plus DeepMind AI makes doctors more effective

Explore Arabic-language with AraNet:
…Making culture legible with pre-trained BERT models…
University of British Columbia researchers have developed AraNet, software to help people analyze Arabic-language text for identifiers like age, gender, dialect, emotion, irony and sentiment. Tools like AraNet help make cultural outputs (e.g., tweets) legible to large-scale machine learning systems and thereby help broaden cultural representation within the datasets and classifiers used in AI research.

What does AraNet contain? AraNet is essentially a set of pre-trained models, along with software for using AraNet via the command line or as a specific python package. The models have typically been fine-tuned from Google’s “BERT-Base Multilingual Case” model which was pre-trained on 104 languages. AraNet includes the following models:

  • Age & Gender: Arab-Tweet, a dataset of tweets from different users of 17 Arabic countries, annotated with gender and age labels. UBC Twitter Gender dataset, an in-house dataset with gender labels applied to 1,989 users from 21 Arab countries.
  • Dialect identification: It uses a previously developed dialect-identification model for the ‘MADAR’ Arabic Fine-Grained Dialect Identification.
  • Emotion: LAMA-DINA dataset where each tweet is labelled with one of eight primary emotions, with a mixture of human- and machine-generated labels. 
  • Irony: A dataset drawn from the IDAT@FIRE2019 competition, which contains 5,000 tweets related to events taking place in the Middle East between 2011 and 2018, labeled according to whether the tweets are ironic or non-ironic. 
  • Sentiment: 15 datasets relating to sentiment analysis, which are edited and combined together (with labels normalized to positive or negative, and excluding ‘neutral’ or otherwise-labeled samples).

Why this matters: AI tools let us navigate digitized cultures – once we have (vaguely reliable) models we can start to search over large bodies of cultural information for abstract things, like the presence of a specific emotion, or the use of irony. I think tools like AraNet are going to eventually give scholars with expert intuition (e.g., experts on, say, Arabic blogging during the Arab Spring) tools to extend their own research, generating new insights via AI. What are we going to learn about ourselves along the way, I wonder?
  Read more: AraNet: A Deep Learning Toolkit for Arabic Social Media (Arxiv).
   Get the code here (UBC-NLP GitHub) – note, when I wrote this section on Saturday the 4th the GitHub repo wasn’t yet online; I emailed the authors to let them know. 

####################################################

Deep learning isn’t all about terminators and drones – Chinese researchers make a butterfly detector!
…Take a break from all the crazy impacts of AI and think about this comparatively pleasant research…
I spend a lot of time in this newsletter writing about surveillance technology, drone/robot movement systems, and other symptoms of the geopolitical changes brought about by AI. So sometimes it’s nice to step back and relax with a paper about something quite nice: butterfly identification! Here, researchers with Beijing Jiaotong University publish a simple, short paper on using YOLOv3 for butterfly identification.

Make your own butterfly detector: The paper gives us a sense of how (relatively) easy it is to create high-performance object detectors for specific types of imagery. 

  1. Gather data: In this case, they label around ~1,000 photos of butterflies using data from the 3rd China Data Mining Competition butterfly recognition contest as well as images generated by searching for specific types of butterflies on the Baidu search engine. 
  2. Train and run models: Train multiple YOLO v3 models with different image sizes as input data, then combine results from multiple models to make a prediction. 
  3. Obtain a system that gets around 98% accuracy on locating butterflies in photos, with lower accuracies for species and subject identification. 

Why this matters: Deep learning technologies let us automate some (basic) human sensory capabilities, like certain vision or audio identification tasks. The 2020s will be the decade of personalized AI, in which we’ll see it become increasingly easy for people to gather small datasets and train their own classifiers. I can’t wait to see what people come up with!
   Read more: Butterfly detection and classification based on integrated YOLO algorithm (Arxiv)

####################################################

Prepare yourself for watching your teenage kid make deepfakes:
…First, deepfakes industrialized. Now, they’re being consumerized…
Tik Tok & Douyin: Bytedance, the Chinese company behind smash hit app TikTok, is making it easier for people to make synthetic videos of themselves. The company recently added code for a ‘Face Swap’ feature to the latest versions of its TikTok and Douyin Android apps, according to TechCrunch. This unreleased technology would, according to unpublished application notes, let a user take a detailed series of photos of their face, then they can easily morph their face to match a target video, like pasting themselves into scenes from the Titanic or reality TV.
   However, the feature may only come to the Chinese-version of the app (Douyin): “After checking with the teams I can confirm this is definitely not a function in TikTok, nor do we have any intention of introducing it. I think what you may be looking at is something slated for Douyin – your email includes screenshots that would be from Douyin, and a privacy policy that mentions Douyin. That said, we don’t work on Douyin here at TikTok”, a TikTok spokesperson told TechCrunch. “They later told TechCrunch that “The inactive code fragments are being removed to eliminate any confusion,” which implicitly confirms that Face Swap code was found in TikTok.”

Snapchat: Separately, Snapchat has acquired AI Factory, a company that had been developing AI tech to let a user take a selfie and paste and animate that selfie into another video, according to TechCrunch – this technology isn’t quite as amenable to making deepfakes out of the box as the potential Tik Tok & Douyin ones, but gives us a sense of the direction Snap is headed in.

Why this matters: For the past half decade, AI technologies for generating synthetic images and video have been improving. So far, many of the abuses of the technology have either occurred abroad (see: mysoginistic disinformation in India, alleged propaganda in Gabon), or in pornography. Politicians have become worried that they’ll be the next targets. No one is quite sure how to approach the challenge of the threats of deepfakes, but people tend to think awareness might help – if people start to see loads of deepfakes around them on their social media websites, they might become a bit more skeptical of deepfakes they see in the wild. If face swap technology comes to TikTok or Douyin soon, then we’ll see how this alters awareness of the technology. If it doesn’t arrive in these apps soon, then we can assume it’ll show up somewhere else, as a less scrupulous developer rolls out the technology. (A year and a half ago I told a journalist I thought the arrival of deepfake-making meme kids could precede further malicious use of the technology.)
   Read more: ByteDance & TikTok have secretly built a deepfakes maker (TechCrunch).

####################################################

Play AI Dungeon on your… Alexa?
…GPT-2-based dungeon crawler gets a voice mode…
Have you ever wanted to yell commands at a smart speaker like “travel back in time”, “melt the cave”, and “steal the cave”? If so, your wishes have been fulfilled as enterprising developer Braydon Batungbacal has ported AI Dungeon so it works on Amazon’s voice-controlled Alexa system. AI Dungeon (Import AI #176) is a GPT-2-based dungeon crawler that generates infinite, absurdly mad adventures. Play it here, then get the Alexa app.
   Watch the voice-controlled AI Dungeon video here (Braydon Batungbacal, YouTube).
   Play AI Dungeon here (AIDungeon.io).

####################################################

Google’s morals subverted by money, alleges former executive:
…Pick one: A strong human rights commitment, or a significant business in China…
Ross LaJeunesse, a former Google executive turned Democratic Candidate, says he left the company after commercial imperatives quashed the company’s prior commitment to “Don’t Be Evil”. In particular, LeJeuness alleges that Google prioritized growing its cloud business in China to the point it wouldn’t adopt strong language around respecting human rights (the unsaid thing here is that China carries out a bunch of government-level activities that appear to violate various human rights principles). 

Why this matters: Nationalism isn’t compatible with Internet-scale multinational capitalism – fundamentally, the incentives of a government like the USA have become different from the incentives of a multinational like Google. As long as this continues, people working at these companies will find themselves put in the odd position of trying to make moral and ethical policy choices, while steering a proto-country that is inexorably drawn to making money instead of committing to anything. “No longer can massive tech companies like Google be permitted to operate relatively free from government oversight,” LaJeuness writes. “I saw the same sidelining of human rights and erosion of ethics in my 10 years,” wrote Liz Fong-Jones, a former Google employee.
   Read more: I Was Google’s Head of International Relations. Here’s Why I Left (Medium)

####################################################

DeepMind makes human doctors more efficient with breast cancer-diagnosing assistant system:
…Better breast cancer screening via AI…
DeepMind has developed a breast cancer screening system that outperforms diagnoses made by individual human specialists. The system is an ensemble of three deep learning models, each of which operates at a different level of analysis (e.g., classifying individual lesions, versus breasts). The system was tested on both US and UK patient data, and was on par with human experts  in the case of UK data and superior to human experts when trained on US data. (The reason for the discrepancy between US and UK results is that patient records are typically checked by two people in the UK, versus one in the US).

How do you deploy a medical AI system? Deploying medical AI systems is going to be tricky – humans have different levels of confidence in machine versus human insights, and it seems like it’d be irresponsible to simply swap an expert with an AI system. DeepMind has experimented with using the AI system as an assistant for human experts, where its judgements can inform the human. In simulated experiments, DeepMind says “an AI-aided double-reading system could achieve non-inferior performance to the UK system with only 12% of the current second reader workload.” 

Why this matters: Life is a lot like land – no one is making any more of it. Therefore, people really value their ability to be alive. If AI systems can help people like longer through proactive diagnosis, then societal attitudes to AI will improve. For people to be comfortable with AI, we should find ways to heal and educate people, rather than just advertize and surveil them; systems like this from DeepMind give us these motivating examples. Let’s make more of them.
   Read more: International evaluation of an AI system for breast cancer screening (DeepMind)

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Offence-defence balance and advanced AI:
Adversarial situations can differ in terms of the ‘offense-defense balance’: the relative ease of carrying out, and defending against an attack – e.g. the invention of barbed wire and machine guns shifted the balance towards defense in European ground warfare. New research published in the Journal of Strategic Studies tries to work out how the offense-defense tradeoff works in successive conflict scenarios.

AI and scaling: The effects of new technologies (e.g. machine guns), and new types of conflict (e.g. trench warfare) on offense-defense balance are well-studied, but the effect of scaling up existing technologies in familiar domains has received less attention. Scalability is a key feature of AI systems. The marginal cost of improving software is low, and will decrease exponentially with the cost of computing, and AI-supported automation will reduce the marginal cost of some services (e.g. cyber vulnerability discovery) to close to zero. So understanding how O-D balance shifts as investments scale up is an important way of forecasting how adversarial domains like warfare and cybersecurity will behave as AI develops.

Offensive-then-defensive scaling: This paper develops a model that reveals the phenomenon of offensive-then-defensive scaling (‘O-D scaling’), whereby initial investments favour attackers, up until a saturation point, after which further investments always favour defenders. They show that O-D scaling is exhibited in land invasion and cybersecurity under certain assumptions, and suggest that there are general conditions where we should expect this dynamic – conflicts where there are multiple attack vectors, where these can be saturated by a defender, and where defense is locally superior (i.e. wins in evenly matched contests). They argue these are plausible in many real-world cases, and that O-D scaling is therefore a useful baseline assumption. 

Why it matters: Understanding the impact of AI on international security is important for ensuring things go well, but technology forecasting is difficult. The authors claim that one particular feature of AI that we can reliably foresee – its scalability – will influence conflicts in a predictable way. It seems like good news that if we pass through the period of offense-dominance, we can expect defense to dominate in the long-run, but the authors note that there is still disagreement on whether defense-dominated scenarios are more stable.
   Read more: How does the offense-defense balance scale? (Journal of Strategic Studies).
   Read more: Artificial Intelligence, Foresight, and the Offense-Defense Balance (War on the Rocks).

2019 AI safety literature review:
This is a thorough review of research on AI safety and existential risk over the past year. It provides an overview of all the organisations working in this small but fast-growing area, an assessment of their activities, and some reflections on how the field is developing. It is an invaluable resource for anyone considering donating to charities working in these areas, and for understanding the research landscape.
   Read more: 2019 AI Alignment Literature Review and Charity Comparison (LessWrong).

####################################################

Tech Tales:

Digital Campaign
[Westminster, London. 2025]

I don’t remember when I stopped caring about the words, but I do remember the day when I was staring at a mixture of numbers on a screen and I felt myself begin to cry. The numbers weren’t telling me a poem. They weren’t confessing something from a distant author that echoed in myself. But they were telling me about resonance. They were telling me that the cargo they controlled – the synthetic movie that would unfold once I fired up this mixture of parameters – would inspire an emotion that registered as “life-changing” on our Emotion Evaluation Understudy (EEU) metric.

Verified? I said to my colleagues in the control room.
Verified, said my assistant, John, who looked up from his console to wipe a solitary tear from his eye.
Do we have cargo space? I asked.
We’ve secured a tranche of evening bot-time, as well as segments of traditional media, John said.
And we’ve backtested it?
Simulated rollouts show state-of-the-art engagement.
Okay folks, I said. Let’s make some art.

It’s always anticlimactic, the moment where you turn it on. There’s a lag from anywhere between a sub-second a full minute, depending on the size of the system. Then the dangerous part – it’s easy to get fixated on earlier versions of the output, easy to find yourself getting more emotional at the stuff you see early in training than the stuff that appears later. Easy to want to edit the computer. This is natural. This is a lot like being a parent, someone told you in a presentation on ‘workplace psychology for reliable science’. It’s natural to be proud of them when they’ve only just begun to walk. After that, everything seems easy.

We wait. Then the terminal prints “task completion”. We send our creation out onto the internet and the radio and the airwaves: full multi-spectrum broadcast. Everyone’s going to see it. We don’t watch the output ourselves – though we’ll review it in our stand-up meeting tomorrow.

Here, in the sealed bunker, I am briefly convinced I can hear cheering begin to come from the street outside. I am imagining people standing up, eyes welling with tears of laughter and pain, as they receive our broadcast. I am trying to imagine what a state-of-the-art Emotion Evaluation Understudy system means.

Things that inspired this story: AI+Creativity, taken to its logical conclusion; the ‘Two hands are a lot’ blog post from Dominic Cummings; BLEU scores and the general mis-leading nature of metrics; nudge campaigns; political messaging; synthetic text advances; likely advances in audio and video synthesis; a dream I had at the turn of 2019/2020 in which I found myself in a control room carefully dialing in the parameters of a language model, not paying attention to the words but knowing that each variable I tuned inspired a different feeling.