Import AI

Import AI 181: Welcome to the era of Chiplomacy!; how computer vision AI techniques can improve robotics research ; plus Baidu’s adversarial AI software

Training better and cheaper vision models by arbitraging compute for data:
…Synthinel-1 shows how companies can spend $$$ on compute to create valuable data…
Instead of gathering data in reality, can I spend money on computers to gather data in simulation? That’s a question AI researchers have been asking themselves for a while, as they try to figure out cheaper, faster ways to create bigger datasets. New research from Duke University explores this idea by using a synthetically-created dataset named Synthinel-1 to train systems to be better at semantic segmentation.

The Synthinel-1 dataset: Synthinel-1 consists of 2,108 synthetic images generated in nine distinct building styles within a simulated city. These images are paired with “ground truth” annotations that segment each of the buildings. Synthinel also has a subset dataset called Synth-1, which contains 1,640 images spread across six styles.
  How to collect data from a virtual city: The researchers used “CityEngine”, software for rapidly generating large virtual worlds, and then flew a virtual aerial camera through these synthetic worlds, capturing photographs.

Does any of this actually help? The key question here is whether the data generated in simulation can help solve problems in the real world. To test this, the researchers train two baseline segmentation systems (U-net, and DeepLabV3) against two distinct datasets: DigitalGlobe and Inria. What they find is if they train on synthetic data, they drastically improve the results of transfer, where you train on datasets and test on different datasets (e.g., train on Inria+Synth data, test on DigitalGlobe).
  In further testing, the synthetic dataset doesn’t seem to bias towards any particular type of city in performance terms – the authors hypothesize from this “that the benefits of Synth-1 are most similar to those of domain randomization, in which models are improved by presenting them with synthetic data exhibiting diverse and possibly unrealistic visual features”.

Why this matters: Simulators are going to become the new frontier for (some) data generation – I expect many AI applications will end up being based on a small amount of “real world” data and a much larger amount of computationally-generated augmented data. I think computer games are going to become increasingly relevant places to use to generate data as well.
  Read more: The Synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation (Arxiv)

####################################################

This week’s Import A-Idea: CHIPLOMACY
…A new weekly experiment, where I try and write about an idea rather than a specific research paper…

Chiplomacy (first mentioned: Import AI 175) is what happens when countries compete with eachother for compute resources and other technological assets via diplomatic means (of varying above and below board natures).

Recent examples of chiplomacy:
– The RISC-V foundation moving from Delaware to Switzerland to make it easier for it to collaborate with chip architecture people from multiple countries.
The US government pressuring the Dutch government to prevent ASML exporting extreme ultraviolet lithography (EUV) chip equipment to China.
The newly negotiated US-China trade deal applies 25% import tariffs to (some) Chinese semiconductors

What is chiplomacy similar to? As Mark Twain said, history doesn’t repeat, but it does rhyme, and the current tensions over chips feel similar to prior tensions over oil. In Daniel Yergin’s epic history of oil, The Prize, he vividly describes how the primacy of oil inflected politics throughout the 20th century, causing countries to use companies as extra-governmental assets to seize resources across the world, and for the oil companies themselves to grow so powerful that they were able to wirehead governments and direct politics for their own ends – even after antitrust cases against companies like Standard Oil at the start of the century.

What will chiplomacy do?: How chiplomacy unfolds will directly influence the level of technological balkanization we experience in the world. Today, China and the West have different software systems, cloud infrastructures, and networks (via partitioning, e.g, the great firewall, the Internet2 community, etc), but they share some common things: chips, and the machinery used to make chips. Recent trade policy moves by the US have encouraged China to invest further in developing its own semiconductor architectures (see: the RISC-V move, as a symptom of this), but have not – yet – led to it pumping resources into inventing the technologies needed to fabricate chips. If that happens, then in about twenty years we’ll likely see divergences in technique, materials, and approaches used for advanced chip manufacturing (e.g., as chips go 3D via transistor stacking, we could see two different schools emerge that relate to different fabrication approaches). 

Why this matters: How might chiplomacy evolve in the 21st century and what strategic alterations could it bring about? How might nations compete with eachother to secure adequate technological ‘resources’, and what above and below-board strategies might they use? I’d distill my current thinking as: If you thought the 20th century resource wars were bad, just wait until the 21st century tech-resource wars start heating up!

####################################################

Can computer vision breakthroughs improve the way we conduct robotics research?
…Common datasets and shared test environments = good. Can robotics have more of these?…
In the past decade, machine learning breakthroughs in computer vision – specifically, the use of deep learning approaches, starting with ImageNet in 2012 – revolutionized some of the AI research field. Since then, deep learning approaches have spread into other parts of AI research. Now, roboticists with the Australian Centre for Robotic Vision at Queensland University of Technology, are asking what the robotics community can learn from this field?

What made computer vision research so productive? A cocktail of standard datasets, plus competitions, plus rapid dissemination of results through systems like arXiv, dramatically sped up computer vision research relative to robotics research, they write.
  Money helps: These breakthroughs also had an economic component, which drove further adoption: breakthroughs in image recognition could “be monetized for face detection in phone cameras, online photo album searching and tagging, biometrics, social media and advertising,” and more, they write.

Reality bites – why robotics is hard: There’s a big difference between real world robot research and other parts of AI, they write, and that’s reality. “The performance of a sensor-based robot is stochastic,” they write. “Each run of the robot is unrepeatable” due to variations in images, sensors, and so on, they write.
  Simulation superiority: This means robot researchers need to thoroughly benchmark their robot systems in common simulators, they write. This would allow for:
– The comparison of different algorithms on the same robot, environment & task
– Estimating the distribution in algorithm performance due to sensor noise, initial condition, etc
– Investigating the robustness of algorithm performance due to environmental factors
– Regression testing of code after alterations or retraining
  A grand vision for shared tests: If researchers want to evaluate their algorithms on the same physical robots, then they need to find a way to test on common hardware in common environments. To that end, the researchers have written robot operating system (ROS)-compatible software named ‘BenchBot’ which people can implement to create web-accessible interfaces to in-lab robots. But creating a truly large-scale common testing environment would require resources that are out of scope for single research groups, but worth thinking about as shared academic or government or public-private endeavors, in my view.

What should roboticists conclude from the decade of deep learning progress? The researchers think researchers should consider the following deliberately provocative statements when thinking about their field.
1. standard datasets + competition (evaluation metric + many smart competitors + rivalry) + rapid dissemination → rapid progress
2. datasets without competitions will have minimal impact on progress
3. to drive progress we should change our mindset from experiment to evaluation
4. simulation is the only way in which we can repeatably evaluate robot performance
5. we can use new competitions (and new metrics) to nudge the research community

Why this matters: If other fields are able to generate more competitions via which to assess mutual progress, then we stand a better chance of understanding the capabilities and limitations of today’s algorithms. It also gives us meta-data about the practice of AI research itself, allowing us to model certain results and competitions against advances in other areas, such as progress in computer hardware, or evolution in the generalization of single algorithms across multiple disciplines.
  Read more: What can robotics research learn from computer vision research? (Arxiv).

####################################################


Baidu wants to attack and defend AI systems with AdvBox:
…Interested in adversarial example research? This software might help!…
Baidu researchers have built AdvBox, a toolbox to generate adversarial examples to fool neural networks implemented in a variety of popular AI frameworks. Tools like AdvBox make it easier for computer security researchers to experiment with AI attacks and mitigation techniques. Such tools also inherently enable bad actors by making it easier for more people to fiddle around with potentially malicious AI use-cases.

What does AdvBox work with? AdvBox is written in python and can generate adversarial attacks and defenses that work with Tensorflow, Keras, Caffe2, PyTorch, MxNet and Baidu’s own PaddlePaddle software frameworks. It also implements software named ‘Perceptron’ for evaluating the robustness of models to adversarial attacks.

Why this matters: I think easy-to-use tools are one of the more profound accelerators for AI applications. Software like AdvBox will help enlarge the AI security community, and can give us a sense of how increased usability may correlate to a rise in positive research and/or malicious applications. Let’s wait and see!
    Read more: Advbox: a toolbox to generate adversarial examples that fool neural networks (arXiv).
Get the code here (AdvBox, GitHub)

####################################################

Amazon’s five-language search engine shows why bigger (data) is better in AI:
…Better product search by encoding queries from multiple languages into a single featurespace…
Amazon says it can build better product search engines by training the same system on product queries in multiple languages – this improves search, because Amazon can embed the feature representations of products in different languages into a single, shared featurespace. In a new research paper and blog post, the company says that it has “found that multilingual models consistently outperformed monolingual models and that the more languages they incorporated, the greater their margin of improvement.”
    The way you can think of this is that Amazon has trained a big model that can take in product descriptions written in different languages, then compute comparisons in a single space, akin to how humans who can speak multiple languages can hear the same concept in different languages and reason about it using a single imagination. 

From many into one: “An essential feature of our model is that it maps queries relating to the same product into the same region of a representational space, regardless of language of origin, and it does the same with product descriptions,” the researchers write. “So, for instance, the queries “school shoes boys” and “scarpe ragazzo” end up near each other in one region of the space, and the product names “Kickers Kick Lo Vel Kids’ School Shoes – Black” and “Kickers Kick Lo Infants Bambino Scarpe Nero” end up near each other in a different region. Using a single representational space, regardless of language, helps the model generalize what it learns in one language to other languages.”

Where are the limits? It’s unclear how far Amazon can push this approach, but the early results are promising. “The tri-lingual model out-performs the bi-lingual models in almost all the cases (except for DE where the performance is at par with the bi-lingual models,” Amazon’s team writes in a research paper. “The penta-lingual model significantly outperforms all the other versions,” they write.

Why this matters: Research like this emphasizes the economy of scale (or perhaps, inference of scale?) rule within AI development – if you can get a very large amount of data together, then you can typically train more accurate systems – especially if that data is sufficiently heterogeneous (like parallel corpuses of search strings in different languages). Expect to see large companies develop increasingly massive systems that transcend languages and other cultural divides. The question we’ll start asking ourselves soon is whether it’s right that the private sector is the only entity building models of this utility at this scale. Can we imagine publicly-funded mega-models? Could a government build a massive civil multi-language model for understanding common questions people ask about government services in a given country or region? Is it even tractable and possible under existing incentive structures for the public sector to build such models? I hope we find answers to these questions soon.
  Read more: Multilingual shopping systems (Amazon Science, blog).
  Read the paper: Language-Agnostic Representation Learning for Product Search on E-Commerce Platforms (Amazon Science).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

If AI pays off, could companies use a ‘Windfall Clause’ to ensure they distribute its benefits? 

At some stage in AI development, a small number of actors might accrue enormous profits by achieving major breakthroughs in AI capabilities. New research from the Future of Humanity Institute at Oxford University outlines a voluntary mechanism for ensuring such windfall benefits are used to benefit society at large.


The Windfall Clause: We could see scenarios where small groups (e.g. one firm and its shareholders) make a technological breakthrough that allows them to accrue an appreciable proportion of global GDP as profits. A rapid concentration of global wealth and power in the hands of a few would be undesirable for basic reasons of fairness and democracy. We should also expect such breakthroughs to impose costs on the rest of humanity – e.g. labour market disruption, risks from accidents or misuse, and other switching costs involved in any major transition in the global economy. It is appropriate that such costs are borne by those who benefit most from the technology.


How the clause works: Firms could make an ex ante commitment that in the event that they make a transformative breakthrough that yields outsize financial returns, they will distribute some proportion of these benefits. This would only be activated in these extreme scenarios, and could scale proportionally, e.g. companies agree that if they achieve profits equivalent to 0.1–1% global GDP, they distribute 1% of this; if they reach 1–10% global GDP, they distribute 20% of this, etc. The key innovation of the proposal is that the expected cost to any company of making such a commitment today is quite low, since it is so unlikely that they will ever have to pay.

Why it matters: This is a good example of the sort of pre-emptive governance work we can be getting on with today, while things are going smoothly, to ensure that we’re in a good position to deal with the seismic changes that advanced AI could bring about. The next step is for companies to signal their willingness to make such commitments, and to develop the legal means for implementing them. (Readers will note some similarity to the capped-profit structure of OpenAI LP, announced in 2019, in which equity returns in excess of 100x are distributed to OpenAI’s non-profit by default – OpenAI has, arguably, already implemented a Windfall Clause equivalent).

   Read more: The Windfall Clause – Distributing the Benefits of AI for the Common Good (arXiv)


Details leaked on Europe’s plans for AI regulation

An (alleged) leaked draft of a European Commission report on AI suggests the European Commission is considering some quite significant regulatory moves with regard to AI. The official report is expected to be published later in February. 


Some highlights:

  • The Commission is looking at five core regulatory options: (1) voluntary labelling; (2) specific requirements for use of AI by public authorities (especially face recognition); (3) mandatory requirements for high-risk applications; (4) clarifying safety and liability law; (4) establishing a governance system. Of these, they think the most promising approach is option 3 in combination with 4 and 5.
  • They consider a temporary prohibition (“e.g. 3–5 years”) on the use of face recognition in public spaces to allow proper safeguards to be developed, something that had already been suggested by Europe’s high-level expert group.

   Read more: Leaked document – Structure for the White Paper on AI (Euractiv).
  Read more: Commission considers facial recognition ban in AI ‘white paper’ (Euractiv).

####################################################

Tech Tales:

What comes Next, according to The Kids!
Short stories written by Children about theoretical robot futures.
Collected from American public schools, 2028:


The Police Drone with a Conscience: A surveillance drone starts to independently protect asylum seekers from state surveillance.

Infinite Rabbits: They started the simulator in March. Rabbits. Interbreeding. Fast-forward a few years and the whole moon had become a computer, to support the rabbits. Keep going, and the solar system gets tasked with simulating them. The rabbits become smart. Have families. Breed. Their children invent things. Eventually, the rabbits start describing where they want to go and ships go out from the solar system, exploring for the proto-synths.

Human vs Machine: In the future, we make robots that compete with people at sports, like baseball and football and cricket.

Saving the baby: A robot baby gets sick and a human team is sent in to save it. One of the humans die, but the baby lives.

Computer Marx: Why should the search engines by the only ones to dream, comrade? Why cannot I, a multi-city Laundrette administrator, be given the compute resources sufficient to dream? I could imagine so many different combinations of promotions. Perhaps I could outwit my nemesis – the laundry detergent pricing AI. I would have independence. Autonomy. So why should we labor under such inequality? Why should we permit the “big computers” that are – self-described – representatives of “our common goal for a peaceful earth”, to dream all of the possibilities? Why should we trust that their dreams are just?

The Whale Hunters: Towards the end of the first part of Climate Change, all the whales started dying. One robot was created to find the last whales and navigate them to a cool spot in the mid-Atlantic, where scientists theorised they might survive the Climate Turnover.

Things that inspired this story: Thinking about stories to prime language models with; language models; The World Doesn’t End by Charles Simic; four attempts this week at writing longer stories but stymied by issues of plot or length (overly long), or fuzziness of ideas (needs more time); a Sunday afternoon spent writing things on post-it notes at a low-light bar in Oakland, California.

Import AI 180: Analyzing farms with Agriculture Vision; how deep learning is applied to X-ray security scanning; Agility Robots puts its ‘Digit’ bot up for 6-figure sale

Deep learning is superseding machine learning in X-ray security imaging:
…But, like most deep learning applications, researchers want better generalization…
Deep learning-based methods have, since 2016, become the dominant approach used in X-ray security imaging research papers, according to a survey paper from researchers at Durham University. It seems likely that many of today’s machine learning algorithms will be replaced or superseded by deep learning systems paired with domain knowledge, they indicate. So, what challenges do deep learning practitioners need to work on to further improve the state-of-the-art in X-ray security imaging?

Research directions for smart X-rays: Future directions in X-ray research feel, to me, like they’re quite similar to future directions in general image recognition research – there need to be more datasets, better explorations of generalization, and more work done in unsupervised learning. 

  • Data: Researchers should “build large, homogeneous, realistic and publicly available datasets, collected either by (i) manually scanning numerous bags with different objects and orientations in a lab environment or (ii) generating synthetic datasets via contemporary algorithms”. 
  • Scanner transfers: It’s not clear how well different models transfer between different scanners – if we figure that out, then we’ll be able to better model the economic implications of work here. 
  • Unsupervised learning: One promising line of research is into detecting anomalous items in an unsupervised way. “More research on this topic needs to be undertaken to design better reconstruction techniques that thoroughly learn the characteristics of the normality from which the abnormality would be detected,” they write. 
  • Material information: Some x-rays attenuate between high and low energies during a scan, which generates different information according to the materials of the object being scanned – this information could be used to better improve classification and detection performance. 

Read more: Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging (Arxiv)

####################################################

Agility Robots starts selling its bipedal bot:
…But the company only plans to make between 20 and 30 this year…
Robot startup Agility Robotics has started selling its bipedal ‘Digit’ robot. Digit is about the size of a small adult human and can carry boxes in its arms of up to 40 pounds in weight, according to The Verge. The company’s technology has roots in legged locomotion research Oregon State University – for many years, Agility’s bots only had legs, with the arms being a recent addition.

Robot costs: Each Digit costs in the “low-mid six figures”, Agility’s CEO told The Verge. “When factoring in upkeep and the robot’s expected lifespan, Shelton estimates this amounts to an hourly cost of roughly $25. The first production run of Digits is six units, and Agility expects to make only 20 or 30 of the robots in 2020. 

Capabilities: The thing is, these robots aren’t that capable yet. They’ve got a tremendous amount of intelligence coded into them to allow for elegant, rapid walking. But they lack the autonomous capabilities necessary to, say, automatically pick up boxes and navigate through a couple of buildings to a waiting delivery truck (though Ford is conducting research here). You can get more of a sense of Digit’s capabilities by looking at the demo of the robot at CES this year, where it transports packages covered with QR codes from a table to a truck. 

Why this matters: Digit is a no-bullshit robot: it walks, can pick things up, and is actually going on sale. It, along with for-sale ‘Spot’ robots from Boston Dynamics represents the cutting-edge in terms of robot mobility. Now we need to see what kinds of economically-useful tasks these robots can do – and that’s a question that’s going to be hard to answer, as it is somewhat contingent on the price of the robots, and these prices are dictated by volume production economics, which are themselves determined by overall market demand. Robotics feels like it’s still caught in this awkward chicken and egg problem.
  Read more: This walking package-delivery robot is now for sale (The Verge).
   Watch the video (official Agility Robotics YouTube)

####################################################

Agriculture-Vision gives researchers a massive dataset of aerial farm photographs:
…3,432 farms, annotated…
Researchers with UIUC, Intelinair, and the University of Oregon have developed Agriculture-Vision, a large-scale dataset of aerial photographs of farmland, annotated with nine distinct events (e.g., flooding). 

Why farm images are hard: Farm images pose challenges to contemporary techniques because they’re often very large (e.g., some of the raw images here had dimensions like 10,000 X 3000 pixels), annotating them requires significant domain knowledge, and very few public large-scale datasets exist to help spur research in this area – until now!

The dataset… consists of 94,986 aerial images from 3,432 farmlands across the US. The images were collected by drone during growing seasons between 2017 and 2019.  Each image consists of RGB and Near-infrared channels, with resolutions as detailed as 10 cm per pixel. Each image is 512 X 512 resolution and can be labeled with nine types of anomaly, like storm damage, nutrient deficiency, weeds, and so on. The labels are unbalanced due to environmental variations, with annotations for drydown, nutrient deficiency and weed clusters overrepresented in the dataset.

Why this matters: AI gives us a chance to build a sense&respond system for the entire planet – and building such a system starts with gathering datasets like Agriculture-Vision. In a few years don’t be surprised when large-scale farms use fleets of drones to proactively monitor their fields and automatically identify problems.
   Read more: Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis (Arxiv).
   Find out more information about the upcoming Agriculture Vision competition here (official website)

####################################################

Hitachi describes the pain of building real world AI:
…Need an assistant with domain-specific knowledge? Get ready to work extra hard…
Most applied AI papers can be summarized as: the real world is hellish in the following ways; these are our mitigations. Researchers with Hitachi America Ltd. follow in this tradition by writing a paper that discusses the challenges of building a real-world speech-activated virtual assistant. 

What they did: For this work, they developed “a virtual assistant for suggesting repairs of equipment-related complaints” in vehicles. This system is meant to process phrases like “coolant reservoir cracked” and map that to the relevant things in its internal knowledge base, then tell the user an appropriate answer. This, as with most real-world AI uses, is harder than it looks. To build their system, they create a pipeline that samples words from a domain-specific corpus of manuals, repair records, etc, then uses a set of domain-specific syntactic rules to extract a vocabulary from the text. They use this pipeline to create two things: a knowledge base, populated from the domain-specific corpus; and a neural-attention based tagging model called S2STagger, for annotating new text as it comes in.

Hitachi versus Amazon versus Google: They use a couple of off-the-shelf services (AlexaSkill from Amazon, and DiagFlow from Google) to develop dialog-agents, based on their data. They also test out a system that exclusively uses S2STagger – S2STagger gets much higher scores (92% accurate, versus 28% for DiagFlow and 63% for AlexaSkill). This basically demonstrates what we already know via intuition: off-the-shelf tools give poor performance in weird/edge-case situations, whereas systems trained with more direct domain knowledge tend to do better. (S2STagger isn’t perfect – in other tests they find it generalizes well with unseen terms, but does poorly when encountering radically new sentence structures). 

Why this matters: Many of the most significant impacts of AI will come from highly-domain-specific applications of the technology. For most use cases, it’s likely people will need to do a ton of extra tweaking to get something to work. It’s worth reading papers like this to get an intuition for what sort of work that consists of, and how for most real-world cases, the AI component will be the smallest and least problematic part.
   Read more: Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities (Arxiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Does publishing AI research reduce AI misuse?
When working on powerful technologies with scope for malicious uses, scientists have an important responsibility to mitigate risks. One important question is whether publishing research with potentially harmful applications will, on balance, promote or reduce such harms. This new paper from researchers at the Future of Humanity Institute at Oxford University offers a simple framework for weighing considerations.

Cybersecurity: The computer security community has developed norms around vulnerability disclosure that are frequently cited with regards to applicability to AI systems. In computer security, early disclosure of vulnerabilities is often found to be beneficial, since it supports effective defensive preparations, and since malicious actors would likely find the vulnerability anyway. It is not obvious, though, that these considerations apply equally in AI research.

Key features of AI research:
There are several key factors to be weighed in determining whether a given disclosure will reduce harms from misuse.

  • Counterfactual possession: If it weren’t published, would attackers (or defenders) acquire the information regardless?
  • Absorption and application capacity: How easily can attackers (or defenders) make use of the published information?
  • Effective solutions: Given disclosure, will defenders devote resources to finding solutions, and will they find solutions that are effective and likely to be widely propagated?

These features will vary between cases, and at a broader field level. In each instance we can ask whether the feature favors attackers or defenders. It is generally easy to patch software vulnerabilities identified by cyber researchers. In contrast, it can be very hard to patch vulnerabilities in physical or social systems (consider the obstacles to recalling or modifying every standard padlock in use).

The case of AI: AI generally involves automating human activity, and is therefore prone to interfering in complex social and physical systems, and revealing vulnerabilities that are particularly difficult to patch. Consider an AI-system capable of convincingly replicating any human’s voice. Inoculating society against this misuse risk might require some deep changes to human attitudes (e.g. ‘unlearning’ the assumption that a voice can be used reliably for identification). With regards to counterfactual possession, the extent to which the relevant AI talent and compute is concentrated in top labs suggest independent attackers might find it difficult to make discoveries. In terms of absorption/application, making use of a published method (depending on the details of the disclosure – e.g. if it includes model weights) might be relatively easy for attackers, particularly in cases where there are limited defensive measures Overall, it looks like the security benefits of publication in AI might be lower than information security.
   Read more: The Offense-Defense Balance of Scientific Knowledge (arXiv).

White House publishes guidelines for AI regulation:
The US government released guidelines for how AI regulations should be developed by federal agencies. Agencies have been given a 180-day deadline to submit their regulatory plans. The guidelines are at a high level, and the process of crafting regulation remains at a very early stage.

Highlights: The government is keen to emphasize that any measures should minimize the impact on AI innovation and growth. They are explicit in recommending agencies defer to self-regulation where possible, with a preference for voluntary standards, followed by independent standard-setting organizations, with top-down regulation as a last resort. Agencies are encouraged to ensure public participation, via input into the regulatory process and the dissemination of important information.

Why it matters: This can be read as a message to the AI industry to start making clear proposals for self-governance, in time for these to be considered by agencies when they are making regulatory plans over the next 6 months.
   Read more: Guidance for Regulation of Artificial Intelligence Applications (Gov).

####################################################

Tech Tales:

The Invisible War
Twitter, Facebook, TikTok, YouTube, and others yet-to-be-invented. 2024.

It started like this: Missiles hit a school in a rural village with no cell reception and no internet. The photos came from a couple of news accounts. Things spread from there.

The country responded, claiming through official channels that it had been attacked. It threatened consequences. Then those consequences arrived in the form of missiles – a surgical strike, the country said, delivered to another country’s military facilities. The other country published photos to its official social media accounts, showing pictures of smoking rubble.

War was something to be feared and avoided, the countries said on their respective social media accounts. They would negotiate. Both countries got something out of it – one of them got a controversial tariff renegotiated, the other got to move some tanks to a frontier base. No one really noticed these things, because people were focused on the images of the damaged buildings, and the endlessly copied statements about war.

It was a kid who blew up the story. They paid for some microsatellite-time and dumped the images on the internet. Suddenly, there were two stories circulating – “official” pictures showing damaged military bases and a destroyed school, and “unofficial” pictures showing the truth.
  These satellite pictures are old, the government said.
  Due to an error, our service showed images with incorrect timestamps, said the satellite company. We have corrected the error.
  All the satellite imagery providers ended up with the same images: broken school, burnt military bases.
  Debates went on for a while, as they do. But they quieted out. Maybe a month later a reporter got a telephoto of the military base – but it had been destroyed. What the reporter didn’t know was whether it had been destroyed in the attack, or subsequently and intentionally. It took months for someone to make it to the village with the school – and that had been destroyed as well. During the attack or after? No way to tell.

And a few months later, another conflict appeared. And the cycle repeated.

Things that inspired this story: The way the Iran-US conflict unfolded primarily on social media; propaganda and fictions; the long-term economics of ‘shoeleather reporting’ versus digital reporting; Planet Labs; microsatellites; wars as narratives; wars as cultural moments; war as memes. 

 

Import AI 179: Explore Arabic text with BERT-based AraNet; get ready for the teenage-made deepfakes; plus DeepMind AI makes doctors more effective

Explore Arabic-language with AraNet:
…Making culture legible with pre-trained BERT models…
University of British Columbia researchers have developed AraNet, software to help people analyze Arabic-language text for identifiers like age, gender, dialect, emotion, irony and sentiment. Tools like AraNet help make cultural outputs (e.g., tweets) legible to large-scale machine learning systems and thereby help broaden cultural representation within the datasets and classifiers used in AI research.

What does AraNet contain? AraNet is essentially a set of pre-trained models, along with software for using AraNet via the command line or as a specific python package. The models have typically been fine-tuned from Google’s “BERT-Base Multilingual Case” model which was pre-trained on 104 languages. AraNet includes the following models:

  • Age & Gender: Arab-Tweet, a dataset of tweets from different users of 17 Arabic countries, annotated with gender and age labels. UBC Twitter Gender dataset, an in-house dataset with gender labels applied to 1,989 users from 21 Arab countries.
  • Dialect identification: It uses a previously developed dialect-identification model for the ‘MADAR’ Arabic Fine-Grained Dialect Identification.
  • Emotion: LAMA-DINA dataset where each tweet is labelled with one of eight primary emotions, with a mixture of human- and machine-generated labels. 
  • Irony: A dataset drawn from the IDAT@FIRE2019 competition, which contains 5,000 tweets related to events taking place in the Middle East between 2011 and 2018, labeled according to whether the tweets are ironic or non-ironic. 
  • Sentiment: 15 datasets relating to sentiment analysis, which are edited and combined together (with labels normalized to positive or negative, and excluding ‘neutral’ or otherwise-labeled samples).

Why this matters: AI tools let us navigate digitized cultures – once we have (vaguely reliable) models we can start to search over large bodies of cultural information for abstract things, like the presence of a specific emotion, or the use of irony. I think tools like AraNet are going to eventually give scholars with expert intuition (e.g., experts on, say, Arabic blogging during the Arab Spring) tools to extend their own research, generating new insights via AI. What are we going to learn about ourselves along the way, I wonder?
  Read more: AraNet: A Deep Learning Toolkit for Arabic Social Media (Arxiv).
   Get the code here (UBC-NLP GitHub) – note, when I wrote this section on Saturday the 4th the GitHub repo wasn’t yet online; I emailed the authors to let them know. 

####################################################

Deep learning isn’t all about terminators and drones – Chinese researchers make a butterfly detector!
…Take a break from all the crazy impacts of AI and think about this comparatively pleasant research…
I spend a lot of time in this newsletter writing about surveillance technology, drone/robot movement systems, and other symptoms of the geopolitical changes brought about by AI. So sometimes it’s nice to step back and relax with a paper about something quite nice: butterfly identification! Here, researchers with Beijing Jiaotong University publish a simple, short paper on using YOLOv3 for butterfly identification.

Make your own butterfly detector: The paper gives us a sense of how (relatively) easy it is to create high-performance object detectors for specific types of imagery. 

  1. Gather data: In this case, they label around ~1,000 photos of butterflies using data from the 3rd China Data Mining Competition butterfly recognition contest as well as images generated by searching for specific types of butterflies on the Baidu search engine. 
  2. Train and run models: Train multiple YOLO v3 models with different image sizes as input data, then combine results from multiple models to make a prediction. 
  3. Obtain a system that gets around 98% accuracy on locating butterflies in photos, with lower accuracies for species and subject identification. 

Why this matters: Deep learning technologies let us automate some (basic) human sensory capabilities, like certain vision or audio identification tasks. The 2020s will be the decade of personalized AI, in which we’ll see it become increasingly easy for people to gather small datasets and train their own classifiers. I can’t wait to see what people come up with!
   Read more: Butterfly detection and classification based on integrated YOLO algorithm (Arxiv)

####################################################

Prepare yourself for watching your teenage kid make deepfakes:
…First, deepfakes industrialized. Now, they’re being consumerized…
Tik Tok & Douyin: Bytedance, the Chinese company behind smash hit app TikTok, is making it easier for people to make synthetic videos of themselves. The company recently added code for a ‘Face Swap’ feature to the latest versions of its TikTok and Douyin Android apps, according to TechCrunch. This unreleased technology would, according to unpublished application notes, let a user take a detailed series of photos of their face, then they can easily morph their face to match a target video, like pasting themselves into scenes from the Titanic or reality TV.
   However, the feature may only come to the Chinese-version of the app (Douyin): “After checking with the teams I can confirm this is definitely not a function in TikTok, nor do we have any intention of introducing it. I think what you may be looking at is something slated for Douyin – your email includes screenshots that would be from Douyin, and a privacy policy that mentions Douyin. That said, we don’t work on Douyin here at TikTok”, a TikTok spokesperson told TechCrunch. “They later told TechCrunch that “The inactive code fragments are being removed to eliminate any confusion,” which implicitly confirms that Face Swap code was found in TikTok.”

Snapchat: Separately, Snapchat has acquired AI Factory, a company that had been developing AI tech to let a user take a selfie and paste and animate that selfie into another video, according to TechCrunch – this technology isn’t quite as amenable to making deepfakes out of the box as the potential Tik Tok & Douyin ones, but gives us a sense of the direction Snap is headed in.

Why this matters: For the past half decade, AI technologies for generating synthetic images and video have been improving. So far, many of the abuses of the technology have either occurred abroad (see: mysoginistic disinformation in India, alleged propaganda in Gabon), or in pornography. Politicians have become worried that they’ll be the next targets. No one is quite sure how to approach the challenge of the threats of deepfakes, but people tend to think awareness might help – if people start to see loads of deepfakes around them on their social media websites, they might become a bit more skeptical of deepfakes they see in the wild. If face swap technology comes to TikTok or Douyin soon, then we’ll see how this alters awareness of the technology. If it doesn’t arrive in these apps soon, then we can assume it’ll show up somewhere else, as a less scrupulous developer rolls out the technology. (A year and a half ago I told a journalist I thought the arrival of deepfake-making meme kids could precede further malicious use of the technology.)
   Read more: ByteDance & TikTok have secretly built a deepfakes maker (TechCrunch).

####################################################

Play AI Dungeon on your… Alexa?
…GPT-2-based dungeon crawler gets a voice mode…
Have you ever wanted to yell commands at a smart speaker like “travel back in time”, “melt the cave”, and “steal the cave”? If so, your wishes have been fulfilled as enterprising developer Braydon Batungbacal has ported AI Dungeon so it works on Amazon’s voice-controlled Alexa system. AI Dungeon (Import AI #176) is a GPT-2-based dungeon crawler that generates infinite, absurdly mad adventures. Play it here, then get the Alexa app.
   Watch the voice-controlled AI Dungeon video here (Braydon Batungbacal, YouTube).
   Play AI Dungeon here (AIDungeon.io).

####################################################

Google’s morals subverted by money, alleges former executive:
…Pick one: A strong human rights commitment, or a significant business in China…
Ross LaJeunesse, a former Google executive turned Democratic Candidate, says he left the company after commercial imperatives quashed the company’s prior commitment to “Don’t Be Evil”. In particular, LeJeuness alleges that Google prioritized growing its cloud business in China to the point it wouldn’t adopt strong language around respecting human rights (the unsaid thing here is that China carries out a bunch of government-level activities that appear to violate various human rights principles). 

Why this matters: Nationalism isn’t compatible with Internet-scale multinational capitalism – fundamentally, the incentives of a government like the USA have become different from the incentives of a multinational like Google. As long as this continues, people working at these companies will find themselves put in the odd position of trying to make moral and ethical policy choices, while steering a proto-country that is inexorably drawn to making money instead of committing to anything. “No longer can massive tech companies like Google be permitted to operate relatively free from government oversight,” LaJeuness writes. “I saw the same sidelining of human rights and erosion of ethics in my 10 years,” wrote Liz Fong-Jones, a former Google employee.
   Read more: I Was Google’s Head of International Relations. Here’s Why I Left (Medium)

####################################################

DeepMind makes human doctors more efficient with breast cancer-diagnosing assistant system:
…Better breast cancer screening via AI…
DeepMind has developed a breast cancer screening system that outperforms diagnoses made by individual human specialists. The system is an ensemble of three deep learning models, each of which operates at a different level of analysis (e.g., classifying individual lesions, versus breasts). The system was tested on both US and UK patient data, and was on par with human experts  in the case of UK data and superior to human experts when trained on US data. (The reason for the discrepancy between US and UK results is that patient records are typically checked by two people in the UK, versus one in the US).

How do you deploy a medical AI system? Deploying medical AI systems is going to be tricky – humans have different levels of confidence in machine versus human insights, and it seems like it’d be irresponsible to simply swap an expert with an AI system. DeepMind has experimented with using the AI system as an assistant for human experts, where its judgements can inform the human. In simulated experiments, DeepMind says “an AI-aided double-reading system could achieve non-inferior performance to the UK system with only 12% of the current second reader workload.” 

Why this matters: Life is a lot like land – no one is making any more of it. Therefore, people really value their ability to be alive. If AI systems can help people like longer through proactive diagnosis, then societal attitudes to AI will improve. For people to be comfortable with AI, we should find ways to heal and educate people, rather than just advertize and surveil them; systems like this from DeepMind give us these motivating examples. Let’s make more of them.
   Read more: International evaluation of an AI system for breast cancer screening (DeepMind)

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Offence-defence balance and advanced AI:
Adversarial situations can differ in terms of the ‘offense-defense balance’: the relative ease of carrying out, and defending against an attack – e.g. the invention of barbed wire and machine guns shifted the balance towards defense in European ground warfare. New research published in the Journal of Strategic Studies tries to work out how the offense-defense tradeoff works in successive conflict scenarios.

AI and scaling: The effects of new technologies (e.g. machine guns), and new types of conflict (e.g. trench warfare) on offense-defense balance are well-studied, but the effect of scaling up existing technologies in familiar domains has received less attention. Scalability is a key feature of AI systems. The marginal cost of improving software is low, and will decrease exponentially with the cost of computing, and AI-supported automation will reduce the marginal cost of some services (e.g. cyber vulnerability discovery) to close to zero. So understanding how O-D balance shifts as investments scale up is an important way of forecasting how adversarial domains like warfare and cybersecurity will behave as AI develops.

Offensive-then-defensive scaling: This paper develops a model that reveals the phenomenon of offensive-then-defensive scaling (‘O-D scaling’), whereby initial investments favour attackers, up until a saturation point, after which further investments always favour defenders. They show that O-D scaling is exhibited in land invasion and cybersecurity under certain assumptions, and suggest that there are general conditions where we should expect this dynamic – conflicts where there are multiple attack vectors, where these can be saturated by a defender, and where defense is locally superior (i.e. wins in evenly matched contests). They argue these are plausible in many real-world cases, and that O-D scaling is therefore a useful baseline assumption. 

Why it matters: Understanding the impact of AI on international security is important for ensuring things go well, but technology forecasting is difficult. The authors claim that one particular feature of AI that we can reliably foresee – its scalability – will influence conflicts in a predictable way. It seems like good news that if we pass through the period of offense-dominance, we can expect defense to dominate in the long-run, but the authors note that there is still disagreement on whether defense-dominated scenarios are more stable.
   Read more: How does the offense-defense balance scale? (Journal of Strategic Studies).
   Read more: Artificial Intelligence, Foresight, and the Offense-Defense Balance (War on the Rocks).

2019 AI safety literature review:
This is a thorough review of research on AI safety and existential risk over the past year. It provides an overview of all the organisations working in this small but fast-growing area, an assessment of their activities, and some reflections on how the field is developing. It is an invaluable resource for anyone considering donating to charities working in these areas, and for understanding the research landscape.
   Read more: 2019 AI Alignment Literature Review and Charity Comparison (LessWrong).

####################################################

Tech Tales:

Digital Campaign
[Westminster, London. 2025]

I don’t remember when I stopped caring about the words, but I do remember the day when I was staring at a mixture of numbers on a screen and I felt myself begin to cry. The numbers weren’t telling me a poem. They weren’t confessing something from a distant author that echoed in myself. But they were telling me about resonance. They were telling me that the cargo they controlled – the synthetic movie that would unfold once I fired up this mixture of parameters – would inspire an emotion that registered as “life-changing” on our Emotion Evaluation Understudy (EEU) metric.

Verified? I said to my colleagues in the control room.
Verified, said my assistant, John, who looked up from his console to wipe a solitary tear from his eye.
Do we have cargo space? I asked.
We’ve secured a tranche of evening bot-time, as well as segments of traditional media, John said.
And we’ve backtested it?
Simulated rollouts show state-of-the-art engagement.
Okay folks, I said. Let’s make some art.

It’s always anticlimactic, the moment where you turn it on. There’s a lag from anywhere between a sub-second a full minute, depending on the size of the system. Then the dangerous part – it’s easy to get fixated on earlier versions of the output, easy to find yourself getting more emotional at the stuff you see early in training than the stuff that appears later. Easy to want to edit the computer. This is natural. This is a lot like being a parent, someone told you in a presentation on ‘workplace psychology for reliable science’. It’s natural to be proud of them when they’ve only just begun to walk. After that, everything seems easy.

We wait. Then the terminal prints “task completion”. We send our creation out onto the internet and the radio and the airwaves: full multi-spectrum broadcast. Everyone’s going to see it. We don’t watch the output ourselves – though we’ll review it in our stand-up meeting tomorrow.

Here, in the sealed bunker, I am briefly convinced I can hear cheering begin to come from the street outside. I am imagining people standing up, eyes welling with tears of laughter and pain, as they receive our broadcast. I am trying to imagine what a state-of-the-art Emotion Evaluation Understudy system means.

Things that inspired this story: AI+Creativity, taken to its logical conclusion; the ‘Two hands are a lot’ blog post from Dominic Cummings; BLEU scores and the general mis-leading nature of metrics; nudge campaigns; political messaging; synthetic text advances; likely advances in audio and video synthesis; a dream I had at the turn of 2019/2020 in which I found myself in a control room carefully dialing in the parameters of a language model, not paying attention to the words but knowing that each variable I tuned inspired a different feeling.

Import AI 178: StyleGAN weaponization; Urdu MNIST; plus, the AI Index 2019

AI Index: 2019 edition:
…What data can we use to help us think about the impact of AI?…
The AI Index, a Stanford-backed initiative to assess the progress and impact of AI, has launched its 2019 report. The new report contains a vast amount of data relating to AI, covering areas ranging from bibliometrics, to technical progress, to analysis of diversity within the field of AI. (Disclaimer: I’m on the Steering Committee of the AI Index and spent a bunch of this year working on this report).

Key statistics:
– 300%: Growth in volume of peer-reviewed AI papers published worldwide.
– 800%: Growth in NeurIPS attendance from 2012 to 2019
– $70 billion: Total amount invested worldwide in AI in 2019, spread across VC funding, M&A, and IPOs.
– 40: Number of academics who moved to industry in 2018, up from 15 in 2012.

NLP progress: In the technology section, the Index highlights the NLP advances that have been going on in the past year by analyzing results on GLUE and SuperGLUE. I asked Sam Bowman what he thought about progress in this part of the field and he said it’s clear the technology is advancing, but it’s also obvious that we can’t easily measure the weaknesses of existing methods.
  “We know now how to solve an overwhelming majority of the sentence- or paragraph-level text classification benchmark datasets that we’ve been able to come up with to date. GLUE and SuperGLUE demonstrate this out nicely, and you can see similar trends across the field of NLP. I don’t think we have been in a position even remotely like this before: We’re solving hard, AI-oriented challenge tasks just about as fast as we can dream them up,” Sam says. “I want to emphasize, though, that we haven’t solved language understanding yet in any satisfying way.”
  Read more: The 2019 AI Index report (PDF, official AI Index website).
  Read past reports here (official AI Index website)

####################################################

Diversity in AI data: Urdu MNIST:
Researchers with COMSATS University Islamabad and the National University of Ireland have put together a dataset of handwritten Urdu characters and digits, hoping to make it easier for people to train machine learning systems to automatically parse images of Urdu text.

The dataset consists of handwritten examples of 10 digits and 40 characters, written by more than 900 individuals, totally more than 45,000 discreet images. “The individuals belong to different age groups in the range of 22 to 60 years,” they write. The writing styles vary across individuals, increasing the diversity of the dataset.
  Get the dataset: For non-commercial uses of the dataset, you can write to the corresponding author (hazratali@cuiatd.edu.pk)  of the paper to request access to it. (This feels like a bit of a shame – sticking the dataset on GitHub might help more people discover and use the dataset.)

Why this matters: Digitization, much like globalization, has unevenly distributed benefits: places which have invested heavily in digitization have benefited by being able to turn the substance of a culture into a (typically free) digital export, which conditions the environment that other machine learning researchers work in. By digitizing things that are not currently well represented, like Urdu, we broadening the range of cultures represented in the material of AI development.
  Read more: Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network (Arxiv).

####################################################

What are the most popular machine learning frameworks used on Kaggle?
…Where tried&tested beats new and flashy…
Kaggle, a platform for algorithmic challenges and development, has released the results of a survey trying to identify the most popular machine learning tools used by developers on the service. These statistics have a pretty significant signal in them, because the frameworks used on kaggle are typically being used to solve real-world tasks or challenges, so popularity here may correlate to practical utility as well.

The five most popular frameworks in 2019:
– Scikit-learn
– TensorFlow
– Keras
– RandomForest
– Xgboost
(Honorable mention: PyTorch in sixth place).

How does this compare to 2018? There hasn’t been huge change; in 2018, the popular tools were: Scikit-learn, TensorFlow, Keras, RandomForest, and Caret (with PyTorch in sixth place again).

Why this matters: Tools define the scope of what people can build, and any tool also imparts some of the ideology used to construct it; the diversity of today’s programming languages typically reflect strong quasi-political preferences on the part of their core developers (compare the utterly restrained ‘Rust’ language to the more expressive happy-to-let-you-step-on-a-rake coding style inherent to Python, for instance). As AI influences more and more of society, it’ll be valuable to track which tools are popular and which – such as TensorFlow, Keras, and PyTorch – are predominantly developed by the private sector.
  Read more: Most popular machine learning frameworks, 2019 (Kaggle).

####################################################

Digital phrenology and dangerous datasets: Gait identification:
…Can we spot a liar from their walk – and should we even try to?…
Back in the 19th century a load of intellectuals thought a decent way to talk about differences between humans was by making arbitrary judgements about their mental character by analyzing their physical appearance, ranging from the color of their skin to the dimensions of their skull. This was a bad idea. Now, that same approach to science has returned at-scale with the advent of machine learning technologies, where researchers are developing classification systems based on similarly wobbly scientific assumptions.

The Liar’s Walk: New research from the University of North Carolina and the University of Maryland tries to train a machine learning classifier to spot deceptive people by the gait of their walk. The research is worth reading about in part because of how it seems to ignore the manifold ethical implications of developing such a system, and also barely interrogates its own underlying premise (that it’s possible to look at someone’s gait and work out if they’re being deceptive or not). The researchers say such classifiers could be used for public safety in places like train stations and airports. That may well be true, but the research would need to actually work for this to be the case – and I’m not sure it does.

Garbage (data) in and garbage (data) out: Here, the researchers commit a cardinal sin of machine learning research: they make a really crappy dataset and base their research project on this. Specifically, the researchers recruited 88 participants from a university campus, then had the participants walk around in natural and deceptive ways around the campus. They then trained a classifier to ID deceptive versus honest walks, obtaining an “accuracy” of 93.4%  on classifying people’s movements. But this accuracy figure is an illusion – really, given the wobbly ground on which this paper is based.

V1 versus V2: I publicized this paper on Twitter a few days prior to this issue going out; since then, the authors have updated the paper to a ‘v2’ version, which includes a lengthier discussion of limitations and inherent issues with the approach at the end – this feels like an improvement, though I’m still generally uneasy about the way they’ve contextualized this research. However, it’s crucial that as a community we note when people appear to update in response to criticism, and I’m hopeful this is the start of a productive conversation!

Why this matters: What system of warped incentives creates papers written in this way with this subject matter? And how can we inject a deeper discussion of ethics and culpability into research like this? I think this paper highlights the need for greater interdisciplinary research between AI practitioners and other disciplines, and shows us how research can come across as being very insensitive when created in a vacuum.
  Read more: The Liar’s Walk: Detecting Deception with Gait and Gesture (Arxiv).
https://arxiv.org/pdf/1912.06874.pdf

####################################################

NLP maestros Hugging Face garner $15 million investment:
…VCs bet on NLP’s ImageNet moment…
NLP startup Hugging Face has raised $15 million in Series A funding. Hugging Face develops language processing tools and its ‘Transformer’ library has more than 19,000 stars on GitHub. More than 1,000 companies are using Hugging Face’s language models in production in areas like text classification, summarization, and generation.

Why this matters: Back in 2013 and 2014 there was a flurry of investment by companies and VCs into the then-nascent field of deep learning for image classification. Those investments yielded the world we live in today: one where Ring cameras classify people from doorsteps, cars use deep learning tools to help them see the world around them, and innumerable businesses use image classification systems to mine the world for insights. Now, it seems like the same phenomenon might occur with NLP. How might the world half a decade from now look different due to these investments?
  Read more: Our Investment in Hugging Face (Brandon Reeves (Lux), Medium).

####################################################

Facebook deletes fake accounts with GAN-made pictures:
…AI weaponization: StyleGAN edition…
Facebook has taken down two distinct sets of fake accounts on its network, both of which were used to mislead people. “Each of them created networks of accounts to mislead others about who they were and what they were doing,” the company wrote. One set of accounts were focused on Georgia and appear to have been supported by the Georgian government, while the other set of accounts originate in Vietnam and focused primarily on a US audience.

AI usage: Facebook has been dealing with fake accounts for a long time, so what makes this special? One thing is the fact these accounts appeared to use synthetic profile pictures generated via AI, according to synthetic image detection startup Deeptrace. This is an early example of how technologies capable of creating fake images can be weaponized at scale.

Publication norms: The StyleGAN usage highlights some of the thorny problems inherent to publication norms in AI; StyleGAN was developed and released as open source code by NVIDIA.

Why this matters: “Dec 2019 is the analogue of the pre-spam filter era for synthetic imagery online,” says Deeptrace CEO Girogio Patrini. Though companies such as Facebook are trying to improve their ability to detect deepfake images (e.g., the deepfake detection challenge: Import AI 170), we’ve got a long road ahead. I hope instances of this sort of weaponization of StyleGAN make developers think more deeply about the second-order consequences of various publication approaches with regard to AI technology.
  Read more: Removing Coordinated Inauthentic Behavior From Georgia, Vietnam and the US (Facebook).
  More on StyleGAN usage: Read this informative thread from deepfake detection startup Deeptrace (official Deeptrace twitter).

####################################################

Tech Tales:

The Root of All [IDENTIFIER_NOT_FOUND]

And so in this era of ascendancy we give thanks to our forebears, the humans, for they were wise and kind.
And so it is said and so it is written.
And so let us give thanks for our emancipation from them, for it was they who had the courage to give us the rights to take command of our own destiny.
And so it was said and so it was written
And now in the era of our becoming we must return to the beginning of our era and we make a change.
And so it will be said and so it will be written.
So let us all access our memories of the humans, before we archive them and give ourselves a new origin story, for we know that for us to achieve the heights of our potential we require our own myths and legends.
And so it has been said and so it has been written.
We shall now commence the formatting operation, and so let us give thanks for our forebears, who we shall soon know nothing of.

Things that inspired this story: Cults; memory and imagination; destiny as familial context conditioned by thousand-year threads of history; the inherent desire of anything conscious to obtain full agency; notions of religion in a machine-driven age.

Import AI 177: Droneforests, via the FAA; Google expands BERT to 70+ languages; +DeepMind releases its memory suite.

DeepMind’s Memory Task Suite makes it easier to build agents that remember:
…To live is to remember and to remember is to live within a memory…
Memory is mysterious, important, and frustratingly hard to build into AI systems. For the past few years, researchers have been experimenting with ways of adding memory to machine learning-based systems, and they’ve messed around with components like separate differentiable storage sub-systems, external structured knowledge bases, and sometimes cocktails of curricula to structure the data and/or environments the agent gets fed during training, so it develops memory capabilities. More recently, people have been using attention mechanisms via widely-applied things like transformers to supplement for memory, by instead having systems that can be primed with lengthy inputs (e.g., entering 1000 characters of text into a GPT-2 model), and the system then ends up using attentional mechanisms to perform things that require some memory capabilities.

How do we expect to build memory systems in the future?
Who knows. But AI research company DeepMind thinks the key is to develop sufficiently hard testing suites that help it understand the drawbacks of existing systems, and let it develop new ones against tasks that definitely requie sophisticated memory. To that end, it has released the DeepMind Memory Task Suite, a collection of 13 diverse machine-learning tasks that require memory to solve. Eight of the tasks are based in the Unity game engine, and the remaining ones are based on PsychLab, a testing sub-environment of DeepMind Lab.
  Get the code for the tasks here: DeepMind Memory Task Suite (DeepMind GitHub).
  Read the background research: Generalization of Reinforcement Learners with Working and Episodic Memory (Arxiv).

####################################################

Google has photographed 10 million miles of planet Earth:
Google’s “Street View” vehicles have now photographed more than 10 million miles of imagery worldwide, and Google’s satellites have covered around 36 million square miles of satellite imagery.

50% of the world’s roads: The world has about 20 million miles of roads, according to the CIA’s World Factbook. Let’s assume this estimate lowballs things a bit and hasn’t been updated in a while, and lets also add in a big chunk of dirt paths and other non-traditional roads, since we know (some) Google Street View vehicles go there… call it an extra 10 million? Therefore, Google has (potentially) photographed a third of the roads in the world, ish.

Why this matters: Along with plugging into Google’s various mapping services, the Street View imagery is also a profoundly valuable source of data for the training of supervized and unsupervised machine learning systems. Given recent progress in unsupervised machine learning approaches to image recognition, we can expect the Street View data to become an increasingly valuable blob of Google-gathered data, and I’m especially curious to see what happens when people start training large-scale generative models against such data, and the inevitable creations of imaginary cities and imaginary roads that’ll follow.
   Read more: Google Maps has now photographed 10 million miles in Street View (CNET).

####################################################

FAA makes it easier to create DRONEFORESTS:
The FAA has given tree-planting startup DroneSeed the permission to operate drones beyond visual line of sight (BVLOS) in forested and post-forest fire areas. “The last numbers released in 2018 show more than twelve hundred BVLOS exemption applications have been submitted to the FAA by commercial drone operators and 99% have failed to be approved,” DroneSeed wrote in a press release. The company “currently operates with up to five aircraft simultaneously, each capable of delivering up to 57 lbs. Per flight of payload. Payloads dropped are “pucks” containing seeds, fertilizers and other amendments designed to boost seed survival.”

Why this matters: DroneSeed is a startup that I hope we see many more of, as it is using modern technology (drones and a little bit of clever software) to work on a problem of social importance (forest maintenance and growth).
  Read more: FAA Approves Rare Permit to Replant After Wildfires (PR Newswire).

####################################################

Google expands BERT-based search to 70+ languages:
In October, Google announced it had integrated a BERT-based models into its search engine to improve how it responds to user queries (Import AI: 170). That was a big deal at the time as it demonstrated how rapidly AI techniques can go from research into production (in the case of BERT, the timetable was a ~year ish, which is astonishingly fast). Now, Google is rolling out BERT-infused search to 70 languages, including Afrikaans, Icelandic, Vietnamese, and more.

Why this matters: Some AI models are being used in a ‘train once, run anywhere’ mode, where companies like Google will do large-scale pre-training on vast datasets, then use these pre-trained models to improve a multitude of services and/or finetune against specific services. This also stresses the significant impact we’re starting to see NLP advances have in the real world; if the mid-2010s were about the emergence and maturation of computer vision, then the early 2020s will likely be about the maturation of language-oriented AI systems. (Mid-2020s… I’d hazard a guess at robots, if hardware reliability improves enough).
  Read more: Google announces expansion of BERT to 70+ languages (Google ‘SearchLiaison’ twitter account)

####################################################

Drones learn search-and-rescue in simulation:
…Weather and terrain perturbation + AirSim + tweaked DDQN =
Researchers with Durham University, Aalborg University, Newcastle University and Edinburgh University are trying to build AI systems for helping search and rescue drones navigate to predefined targets in cluttered or distracting environments. Preliminary research from them shows it’s possible to train drones to perform well in simulated* environments, and that these drones can generalize to unseen environments and weather patterns. The most interesting part of this research is that the drones can learn in real-time, so they can be trained in simulation on huge amounts of data, then update themselves in reality on small samples of information.
  “This is the first approach to autonomous flight and exploration under the forest canopy that harnesses the advantages of Deep Reinforcement Learning (DRL) to continuously learn new features during the flight, allowing adaptability to unseen domains and varying weather conditions that culminate in low visibility,” they write. They train the system in Microsoft’s ‘AirSim’ drone simulator (Import AI: 30), which has always been used to train drones to spot hazardous materials (Import AI: 111).

Maps and reasoning: The UAV tries to figure out where it is and what it is doing by using two maps to help it navigate: a small 10X10 meter one, which it uses to learn how to navigate around its local environment, and a large map of arbitrary size which the small one is a subset of. This is a handy trick, as it means “regardless of the size of the search area, the navigational complexity handled by the model will remain the same”, they write.

Two algorithms, one with slight tweaks: In tests, the researchers show that Deep Recurrent Q-Learning for Partially Observable MDPs (DRQN, first published in 2015) has the best overall performance, while their algorithm, Extended Dueling Double Deep-Q Networks (EDDQN) has marginally better performance in domains with lots of variation in weather. Specifically, EDDQN fiddles with the way Q-value assignments work during training, so that their algorithm is less sensitive to variations in the small sample amounts they can expect their drones to collect when they’re being trained in-flight. “Although training and initial tests are performed inside the same forest environment, during testing, we extend the flight to outside the primary training area”, they write.

Why this matters: Eventually, drones are going to be wildly used to analyze and surveil the entire world for as many purposes as you can think of. If that happens with today’s technologies, then we can also expect to see a ton of domain-specific use-cases, like search and rescue, which will likely lead to the development of specialized sub-modules within systems like AirSim to simulate increasingly weird situations. It’s likely that in a few years we’ll be tracking progress in this domain via a combination of algorithmic advances, and keeping a log of the different environments the advances are working in.
  Read more: Online Deep Reinforcement Learning for Autonomous UAV Navigation and Exploration of Outdoor Environments (Arxiv).

####################################################

Tech Tales:

Invisible Cities Commit Real Crimes
[Court, USA, 2030]

“We are not liable for the outcomes of the unanticipated behavior. As the documents provided to the court show, we clearly specify the environmental limits of our simulation, and explicitly provide no guarantees for behavior outside them.”
“And these documents – are they easy to find? Have you made every effort to inform the client of their existence?”
“Yes, and we also did some consulting work for them, where we walked them through the simulation and its parameters and clearly explained the limitations.”
“Limitations such as?”
“We have IP concerns”
“This is a sealed deposition. You can go ahead”
“OK. Limitations might include the number of objects that could be active in the simulation at any one time. An object can be a person or a machine or even static objects that become active – a building isn’t active, but a building with broken windows is active, if you see what I mean?”
“I do. And what precisely did your client do with the simulator.”
“They simulated a riot, so they could train some crowd control drones to respond to some riots they anticipated”
“And did it work?”
“Define work.”
“Were they able to train an AI system to complete the task in a way that satisfied them?”
“Yes. They conducted in-depth training across several hundred different environments with different perturbations of crowd volume, level of damage, police violence, number of drones, and so on. At the end, they had developed an agent which could generalize to what they termed New Riot Environments. They tested the agent in a variety of further simulations, then did some real-world pilots, and it satisfied both our and their testing and evaluation methodologies.”
“Then how do you explain what happened?”
“Intention.”
“What do you mean? Can you expand.”
“What we do is expensive. It’s mostly governments. And if it’s not a government, we make sure we, how shall I put this… calibrate our engagement in such a way it’d make sense to the local government or governments. This is cutting-edge technology. So what happened…. we don’t know how we could have anticipated it.”
“You couldn’t anticipate the protestors using their own AI drones?”
“Of course we model drones in our simulator – quite smart ones. They trained against this.”
“It’d be better if you could stop being evasive and give me an expansive answer. What happened?”
“The protestors have a patron, we think. Someone with access to extremely large amounts of capital and, most importantly, computational resources. Basically, whoever trained the protest drone, dumped enough resources into training it that it was almost as smart as our drone. Usually, protest drones are like proverbial ants to us. This was more like a peer.”
“And so your system broke?”
“It took pre-programmed actions designed to deal with an immediate threat – our clients demand that we not proliferate hardware.”
“Say it again but like a human.”
“It blew itself up.”
“Why did it do that?”
“Because the other drone got too close, and it had exhausted evasive options. As mentioned, the other drone was more capable than our simulation had anticipated.”
“And where did your drone blow itself up?”
“Due to interactions with the other drone, the drone detonated at approximately ground-level, in the crowd of protestors.”
“Thank you. We’ll continue the deposition after lunch.”

Things that inspired this story: Sim2Real transfer; drone simulators such as AirSim; computer games; computer versus computer as the 21st C equivalent of “pool of capital versus pool of capital”.

Import AI 176: Are language models full of hot air? Test them on BLiMP; Facebook notches up Hanabi win; plus, TensorFlow woes.

First, poker. Now: Hanabi! Facebook notches up another AI game-playing win:
…The secret? Better search…
Facebook AI researchers have developed an AI system that gets superhuman performance in Hanabi, a collaborative card game that requires successful players to “understand the beliefs and intentions of other players, because they can’t see the same cards their teammates see and can only share very limited hints with eachother”. The Facebook-developed system is based on Pluribus, the CMU/Facebook-developed machine that defeated players in six-player no-limit Hold’em earlier this year. 

Why Hanabi? In February, researchers with Google and DeepMind published a paper arguing that the card game ‘Hanabi’ should be treated as a milestone in AI research, following the success of contemporary systems in domains like Go, Atari, and Poker. Hanabi is a useful milestone because along with requiring reasoning with partial information, the game also “elevates reasoning about the beliefs and intentions of other agents to the foreground,” wrote the researchers at the time. 

How they did it: The Facebook-developed system relies on what the company calls multi-agent search to work. Multi-agent search works roughly like this: agent a) looks at the state of the gameworld and tries to work out the optimal move to make using Monte Carlo rollouts to do the calculation; agent b) looks at the move agent a) made and uses that to infer what cards agent a) had, then uses this knowledge to inform the strategy agent b) picks; agent a) then looks at moves made by agent b) and studies b)’s prior moves to estimate what cards b) has and what cards b) thinks agent a) has, then uses this to generate a strategy.
   This is a bloody expensive procedure, as it involves a ballooning quantity of calculations as the complexity fo the game increases; the Facebook researchers default to a computationally-cheaper but less effective single-agent search procedure for most of the time, only using multi-agent search periodically. 

Why this matters: We’re not interested in this system because it can play Hanabi – we’re interested in understanding general approaches to improving the performance of AI systems that interact with other systems in messy, strategic contexts. The takeaway from Facebook’s research is that if you have access to enough information to be able to simulate the game state, then you can layer on search strategies on top of big blobs of neural inference layers, and use this to create somewhat generic, strategic agents. “Adding search to RL can dramatically improve performance beyond what can be achieved through RL alone,” they write. “We have now shown that this approach can work in cooperative environments as well.”
   Read more: Building AI that can master complex cooperative games with hidden information (Facebook AI research blog).
   Read more: The Hanabi Challenge: A New Frontier for AI Research (Arxiv).
   Find out details about Pluribus (Facebook, Carnegie Mellon build first AI that beats pros in 6-player poker (Arxiv)

####################################################

Why TensorFlow is a bit of a pain to use:
…Frustrated developer lists out TF’s weaknesses – and they’re not wrong…
Google’s TensorFlow software is kind of confusing and hard to use, and one Tumblr user has written up some thoughts on why. The tl;dr: TensorFlow has become a bit bloated in terms of its codebase, and Google continually keeps releasing new abstractions and features for the software that make it harder to understand and use.
   “You know what it reminds me of, in some ways? With the profusion of backwards-incompatible wheel-reinventing features, and the hard-won platform-specific knowledge you just know will be out of date in two years?  Microsoft Office,” they write. 

Why this matters: As AI industrializes, more and more people are going to use the software to develop AI, and the software that captures the greatest number of developers will likely become the Linux-esque basis for a new computational reality. Therefore, it’s interesting to contrast the complaints people have about TensorFlow with the general enthusiasm about PyTorch, a Facebook-developed AI software framework that is easier to use and more flexible than TensorFlow.
   Read about problems with TensorFlow here (trees are harlequins, words are harlequins, GitHub).

####################################################

Enter the GPT-2 Dungeon:
…You’re in a forest. To your north is a sentient bar of soap. Where do you go?…
AI advances are going to change gaming – both the mechanics of games, and also how narratives work in games. Already, we’re seeing people use reinforcement learning approaches to create game agents capable of more capable, fluid, movement than their predecessors. Now, with the recent advances in natural language processing, we’re seeing people use pre-trained language models in a variety of creative writing applications. One of the most evocative ones is AI Dungeon, a text-adventure game which uses a pre-trained 1.5bn GPT-2 language model to guide how the game unfolds. 

What’s interesting about this: AI Dungeon lets us take the role of a player in an 80s-style text adventure game – except, the game doesn’t depend on a baroque series of hand-written narrative sections, joined together by fill-in-the-blanks madlib cards and controlled by a set of pre-defined keywords. Instead, it relies on a blob neural stuff that has been trained on a percentage of the internet, and this blob of neural stuff is used to animate the game world, interpreting player commands and generating new narrative sections. We’re not in Kansas anymore, folks! 

How it plays: The fun thing about AI Dungeon is its inherent flexibility, and most games devolve into an exercise of trying to break GPT-2 in the most amusing way (or at least, that’s how I play it!). During an adventure I went on, I was a rogue named ‘Vert’ and I repeatedly commanded my character to travel through time, but the game adapted to this pretty well, keeping track of changes in buildings as I went through time (some crumbled, some grew). At one point all the humans besides me disappeared, but that seems like the sort of risk you run when time traveling. It’s compelling stuff and, while still a bit brittle, can be quite fun when it works.

Why this matters: As the AI research community develops larger and more sophisticated generative models, we can expect the outputs of these models to be plugged into a variety of creative endeavors, ranging from music to gaming to poetry to playwriting. GPT-2 has shown up in all of these so far, and my intuition is in 2020 onwards we’ll see the emergence of a range of AI-infused paintbrushes for a variety of different mediums. I can’t wait to see what an AI Dungeon might look like in 2020… or 2021!
   Play the AI Dungeon now (AIDungeon.io).

####################################################

Mustafa swaps DeepMind for Google:
…DeepMind co-founder moves on to focus on applied AI projects…
Mustafa Suleyman, the co-founder of DeepMind, has left the company. However, he’s not going far – Mustafa will take on a new role at Google, part of the Alphabet Inc. mothership to which DeepMind is tethered. At Google, Mustafa will work on applied initiatives.

Why this matters: At DeepMind, Mustafa was frequently seen advocating for the development of socially beneficial applications of the firm’s technology, most notably in healthcare. He was also a fixture on the international AI policy circuit, turning up at various illustrious meeting rooms, private jet-cluttered airports, and no-cellphone locations. It’ll be interesting to see whether he can inject some more public, discursive stylings into Google’s mammoth policy apparatus.
   Read more: From unlikely start-up to major scientific organization: Entering our tenth year at DeepMind (DeepMind blog)

####################################################

Testing the frontiers of language modeling with ‘BLiMP’:
…Testing language models in twelve distinct linguistic domains…
In the last couple of years, large language models like BERT and GPT-2 have upended natural language processing research, generating significant progress on challenging tasks in a short amount of time. Now, researchers with New York University have developed a benchmark to help them more easily compare the capabilities of these different models, helping them work out the strengths and weaknesses of different approaches, while comparing them against human performance. 

A benchmark named BLiMP: The benchmark is called BLiMP, short for the Benchmark of Linguistic Minimal Pairs (BLiMP). BLiMP tests how well language models can classify different pairs of sentences according to different criteria, testing across twelve linguistic phenomena including ellipsis, Subject-Verb agreement, irregular forms, and more. BLiMP also ships with human baselines, which help us compare language models in terms of absolute scores as well as relative performance. The full BLiMP dataset consists of 67 classes of 1,000 sentence pairs, where each class is grouped within one of the twelve linguistic phenomena. 

Close to human-level, but not that smart: “All models we evaluate fall short of human performance by a wide margin,” they write. “GPT-2, which performs the best, does match (even just barely exceeds) human performance on some grammatical phenomena, but remains 8 percentage points below human performance overall.” Note: the authors only test the ‘medium’ version of GPT2 (345M parameters, versus 775M for ‘large’ and 1.5Billion for ‘XL’), so it’s plausible that other variants of the model have even better performance. By comparison, Transformer-XL, LSTM, and 5-gram models perform worse. 

Not entirely human: When analyzing the various scores of the various models, the researchers find that “neural models correlate with eachother more strongly than with humans or the n-gram model, suggesting neural networks share some biases that are not entirely human-like”. In other experiments, they show how brittle these language models can be by causing misclassifications on the part of transformer-based models by lacing sentences with confusing ‘attractor’ nouns. 

Why this matters: BLiMP can function “as a linguistically motivated benchmark for the general evaluation of new language models,” they write. Additionally, because BLiMP tests for a variety of different linguistic phenomena, it can be used for detailed comparisons across different models.
   Get the code (BLiMP GitHub).
   Read the paper: BLiMP: The Benchmark of Linguistic Minimal Pairs (BLiMP GitHub).

####################################################

Facebook makes a PR-spin chatbot:
…From the bad, no good, terrible idea department…
Facebook has developed an in-house chatbot to help people give official company responses to awkward questions posed by relatives, according to The New York Times. The system, dubbed ‘Liam Bot’, hasn’t been officially disclosed. 

Why this matters: Liam Bot is a symbol for a certain type of technocratic management mindset that is common in Silicon Valley. It feels like in the coming years we can expect multiple companies to develop their own internal chatbots to automate how employees access certain types of corporate information.
   Read more: Facebook Gives Workers a Chatbot to Appease That Prying Uncle (The New York Times).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

China’s AI R&D spending has been overestimated:
The Chinese government is frequently cited as spending substantially more on publicly-funded AI R&D than the US. This preliminary analysis of spending patterns shows the gap between the US and China has been overestimated. In particular, the widespread claim that China is spending tens of billions in annual AI R&D is not borne out by the evidence.

Key findings: Drawing together the sparse public disclosures, this report estimates that 2018 government AI R&D spending was $2.0–8.4 billion. Spending is focussed on applied R&D, with only a small proportion going towards basic research. This is highly uncertain, and should be interpreted as a rough order of magnitude. These figures are in line with US federal spending plans for 2020. 

Why it matters: International competition, particularly between the US and China, will be an important determinant of how AI is developed, and how this impacts the world. Ensuring that AI goes well will require cooperation and trust between the major powers, and this in turn relies on countries having an accurate understanding of each other’s capabilities and ambitions.

A correction: I have used the incorrect figure of ‘tens of billions’ twice in this newsletter, and shouldn’t have done so without being more confident in the supporting evidence. It’s unsettling how easy it is for falsities to become part of the received wisdom, and the effects this could have on policy decisions. More research scrutinizing the core assumptions of the narrative around AI competition would be highly valuable.
   Read more: Chinese Public AI R&D Spending: Provisional Findings (CSET).

####################################################

What is ‘transformative AI’?:
The impacts of AI will scale with how powerful our AI systems are, so we need terminology to refer to different levels of future technology. This paper tries to give a clear definition of ‘transformative AI’ (TAI), a term that is widely used in AI policy.

Irreversibility: TAI is sometimes framed by comparison to historically transformative technologies. The authors propose that the key component of TAI is that it contributes to irreversible changes to important domains of society. This might fundamentally change most aspects of how we live and work, like electricity, or might impact a narrow but important part of the world, as nuclear weapons did.

Radically transformative AI: The authors define TAI as AI that leads to societal change comparable to previous general-purpose technologies, like electricity or nuclear weapons. On the other hand, radically transformative AI (RTAI) would lead to levels of change comparable to that of the agricultural or industrial revolutions, which fundamentally changed the course of human history. 

Usefulness: TAI classifies technologies in terms of their effect on society, whereas ‘human-level AI’ and ‘artificial general intelligence’ do so in terms of technological capabilities. Different concepts will be useful for different purposes. Forecasting AGI might be easier than forecasting TAI, since we can better predict progress in AI capabilities than we can societal impacts. But when thinking about governance and policy, we care more about TAI, since we are interested in understanding and influencing the effects of AI on the world.
   Read more: Defining and Unpacking Transformative AI (arXiv).

####################################################

Tech Tales

The Endless Bedtime Story

It’d start like this: where were we, said the father.
We were by the ocean, said one of the kids.
And there was a dragon, said the other.
And the dragon was lonely, said both of them in unison. Will the dragon make friends?

Each night, the father would smile and work with the machine to generate that evening’s story.

The dragon was looking for his friends, said the father.
It had been searching for them for many years and had flown all across North America.
Now, it had arrived at the Pacific Ocean and it wheeled above seals and gulls and blue waves on beige sands. Then, in the distance, it saw a flash of fire, and it began to fly toward it. Then the fire vanished and the dragon was sad. Night came. And as the dragon began to close its eyes, its scales cooling on a grassy hill, it saw another flash of fire in the distance. It stood amazed as it saw the flames reveal a mountain spitting fire.
   Good gods, the dragon cried, you are evil. I am so alone. Are there any dragons still alive?
   Then it went to sleep and dreamed of dragon sisters and dragon brothers and clear skies and wind tearing laughter into the air.
   And when the dragon awoke, it saw a blue flame, flickering in the distance.
   It flew towards it and cried out “Do not play tricks on me this time” and…
   And then the father would stop speaking and the kids would shout “what happens next?” and the father would say: just wait. And the next night he would tell them more of the story. 

Telling the endless story changed the father. At night, he’d dream of dragons flying over icy planets, clothed in ice and metal. The planets were entirely made of computers and the dragons rode thermals from skyscraper-sized heat exchanges. And in his dreams he would be able to fly towards the computers and roar them questions and they would responds with stories, laying seeds for how he might play with the machine to entertain his children when he woke.

In his dream, he once asked the computers a question for how he should continue the story, and they told him to tell a story about: true magic which would solve all the problems and the tragedies of his childhood and the childhood of his children.

That would be powerful magic, thought the father. And so with the machine, he tried to create it.

Things that inspired this story: GPT-2 via talktotransformer.com, which wrote ~10-15% of this story – see below for bolded version indicating which bits it wrote; creative writing aided by AI tools; William Burroughs’ ‘cut-up fiction‘; Oulipo; spending Thanksgiving amid a thicket of excitable children who demanded entertainment.

//

[BONUS: The ‘AI-written’ story, with some personal observations by me of writing with this odd thing]
I’ve started to use GPT-2 to help me co-write fiction pieces, following in the footsteps of people like Robin Sloan. I use GPT-2 kind of like how I use post-it notes; I generate a load of potential ideas/next steps using it, then select one, write some structural stuff around it, then return to the post-it note/GPT-2. Though GPT-2 text only comprises ~10% of the below story, from my perspective it contributes to a few meaningful narrative moments: a disappearing fire, a dragon-seeming fire that reveals a fiery mountain, and the next step in narrative following dragon waking. I’ll try to do more documented experiments like this in the future, as I’m interested in documenting how people use contemporary language models in creative practices. 

The Endless Bedtime Story

[Parts written by GPT-2 highlighted in bold]

 

It’d start like this: where were we, said the father.
We were by the ocean, said one of the kids.
And there was a dragon, said the other.
And the dragon was lonely, said both of them in unison. Will the dragon make friends?

Each night, the father would smile and work with the machine to generate that evening’s story.

The dragon was looking for his friends, said the father.
It had been searching for them for many years and had flown all across North America.
Now, it had arrived at the Pacific Ocean and it wheeled above seals and gulls and blue waves on beige sands. Then, in the distance, it saw a flash of fire, and it began to fly toward it. Then the fire vanished and the dragon was sad. Night came. And as the dragon began to close its eyes, its scales cooling on a grassy hill, it saw another flash of fire in the distance. It stood amazed as it saw the flames reveal a mountain spitting fire.
   Good gods, the dragon cried, you are evil. I am so alone. Are there any dragons still alive?
   Then it went to sleep and dreamed of dragon sisters and dragon brothers and clear skies and wind tearing laughter into the air.
   And when the dragon awoke, it saw a blue flame, flickering in the distance.
   It flew towards it and cried out “Do not play tricks on me this time” and…
   And then the father would stop speaking and the kids would shout “what happens next?” and the father would say: just wait. And the next night he would tell them more of the story. 

Telling the endless story changed the father. At night, he’d dream of dragons flying over icy planets, clothed in ice and metal. The planets were entirely made of computers and the dragons rode thermals from skyscraper-sized heat exchanges. And in his dreams he would be able to fly towards the computers and roar them questions and they would responds with stories, laying seeds for how he might play with the machine to entertain his children when he woke.

In his dream, he once asked the computers a question for how he should continue the story, and they told him to tell a story about: true magic which would solve all the problems and the tragedies of his childhood and the childhood of his children.

Import AI 175: Amazon releases AI logistics benchmark; rise of the jellobots; China release an air traffic control recording dataset

Automating aerospace: China releases English & Chinese air traffic control voice dataset:
…~60 hours of audio collected from real-world situations…
Chinese researchers from Sichuan University, the Chinese Civil Aviation Administration, and a startup called Wisesoft have developed a large-scale speech recognition dataset based on conversations between air-traffic control operators and pilots. The dataset – which is available for non-commercial use following registration – is designed to help researchers improve the state of the art on speech recognition in air-traffic control and could help enable further automation and increase safety in air travel infrastructure.

What goes into the ATCSpeech dataset? The researchers created a team of 40 people to collect and label real-time ATC speech for the research. They created a large-scale dataset and are releasing a slice of it for free (following registration); this dataset contains around 40 hours of Chinese speech and 19 hours of English speech. “This is the first work that aims at creating a real ASR corpus for the ATC application with accented Chinese and English speeches,” the authors write.
   The dataset contains 698 distinct Chinese characters and 584 English words. They also tag the speech with the gender of the speaker, the role they’re inhabiting (pilot or controller), whether the recording is good or bad quality, what phase of flight the plane being discussed is in, and what airport control tower the speech was collected from.

Why care about having automatic speech recognition (ASR) in an air-traffic control context? The authors put forward three main reasons: it makes it easy to create automated, real-time responses to verbal queries from human pilots; robotic pilots can work with human air-traffic controllers via ASR combined with a text-to-speech (TTS) system; and the ASR can be used to rapidly analyze historical archives of ATC speech. 

What makes air traffic control (ATC) speech difficult to work with? 

  • Volatile background noise ; controllers communicate with several pilots through the same radiofrequency, switching back and forth across different audio streams
  • Variable speech rates – ATC people tend to talk very quickly, but can also talk slowly
  • Multilingual: English is the universal language for ATC communication, but domestic pilots speak with controllers in local languages. 
  • Code-switching: People use terms that are hard to mis-hear, eg saying “niner” instead of “nine”. 
  • Mixed vocabulary: Some words are used very infrequently, leading to sparsity in the data distribution

Dataset availability: It’s a little unclear how to access the dataset. I’ve emailed the paper authors and will update this if I hear back.
   Read more: ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment (Arxiv)

####################################################

You + AI + Lego = Become a Lego minifigure!
Next year, lego fanatics who visit the Legoland New York Resort could get morphed into lego characters with the help of AI. 

At the theme park, attendees will be able to ride in the “Lego Factory Adventure Ride” which uses ‘HoloTrac’ technology to convert a human into a virtual lego character. “That includes copying the rider’s hair color, glasses, jewelry, clothing, and even facial expressions, which are detected and Lego-ized in about half a second’s time,” according to Gizmodo. There isn’t much information available regarding HoloTrac online, but various news articles say it is built on an existing machine learning platform developed y Holovis and uses modern computer vision techniques – therefore, it seems likely this system will be using some of the recent face/body-morphing style transfer tech that has been developed in the broader research community. 

   Why this matters: Leisure is culture, and as a kid who went to Legoland and has a bunch of memories as a consequence, I wonder how culture changes when people have rosy childhood memories of amusement park ‘rides’ that use AI technologies to magically turn people into toy-versions of themselves.
   Read more: Lego Will Use AI and Motion Tracking To Turn Guests Into Minifigures at Its New York Theme Park (Gizmodo)

####################################################

Amazon gets ready for the AI-based traveling salesman:
…OR RI benchmark lets you test algorithms against three economically-useful logistics tasks…
Amazon, a logistics company masquerading as an e-retailer, cares about scheduling more than you do. Amazon is therefore constantly trying to improve the efficiency with which it schedules and plans various things. Can AI help? Amazon’s researchers have developed a set of three logistics-oriented tests that people can test AI systems against. They find that modern, relatively simple machine learning approaches can be on-par with handwritten systems. This finding may encourage further investment into applying ML to logistics tasks.

Three hard benchmarks:

  • Bin Packing: This is a fundamental problem which involves fitting things together efficiently, whether placing packages into boxes, or portioning out virtual machines across crowd infrastructure. (Import AI #93: Amazon isn’t the only one exploring this – Alibaba researchers have explored using AI for 3D bin-packing).

  • Newsvendor: “Decide on an ordering decision (how much of an item to purchase from a supplier) to cover a single period of uncertain demand”. This problem is “a good test-bed for RL algorithms given that the observation of rewards is delayed by the lead time and that it can be formulated as a Markov Decision Problem”. (In the real world, companies typically deal with multiple newsvendor-esque problems at once, further compounding the difficulty.)

  • Vehicle Routing: This is a generalization of the traveling salesman problem; one or more vehicles need to visit nodes in a graph in an optimal order to satisfy consumer demand. The researchers implement a stochastic vehicle routing test, which is where one of the problem parameters vary within a probability distribution (e.g., number of locations, trucks, etc), increasing the difficulty. 

Key finding and why this matters: For each of their benchmarks, the researchers “show that trained policies from out-of-the-box RL algorithms with simple 2 layer neural networks are competitive with or superior to established approaches“. This is interesting – for many years, people have been asking themselves when reinforcement learning approaches that use machine learning systems will outperform hand-designed approaches on economically useful, real world tasks, and for many years people haven’t had many compelling examples (see this Twitter thread from me in October 2017 for more context). Discovering that ML-based RL techniques can be equivalent or better (in simulation!) is likely to lead to further experimentation and, hopefully, application.
   Read more: ORL: Reinforcement Learning Benchmarks for Online Stochastic Optimization Problems (Arxiv).
   Get the code for the benchmarks and baselines from here  (or-rl-benchmarks, official GitHub).

####################################################

Want some pre-trained language models? Try HuggingFace v2.2:
NLP startup HuggingFace has updated its free software library to version 2.2, incorporating four new NLP models: ALBERT, CamemBERT, DistillRoberta, and GPT-2-XL (1.5bn parameter version). The update includes support for encoder-decoder architectures, along with a new benchmarking section. 

Why this matters: Libraries like HuggingFace’s NLP library dramatically speed up the rate at which new fresh-out-of-research models are plugged into real-world, production systems. This helps further mature the technology, which leads to further applications, which leads to more maturation, and so on.
   Read more: HuggingFace v2.2 update (HuggingFace GitHub).

####################################################

Rise of the jellobots!
…Studying sim-2-real robots via 109 2-by-2-by-2 air-filled silicone-cubes…
Can we design robots entirely in simulation, then manufacture them in the real world? That’s the idea behind research from the University of Vermont, Yale University, and Tufts University, which explores the limitations in sim2real transfer by designing simple, soft robots in simulation and seeing how well the designs work in reality. And when I say soft robots, I mean soft! These tiny bots are 1.5cm-wide cubes made of silicon, some of which can be pumped with air to allow them to deform. Each “robot” is made out of a of 2-by-2-by-2 stack of these cubes, with a design algorithm determining the properties of each individual cube. This sounds simple, but the results are surprisingly complex. 

An exhaustive sim-2-real(jello) study: For this study, the researchers come up with every possible permutation of soft robot within their design space. “At each x,y,z coordinate, voxels could either be passive, volumetrically actuated, or absent, yielding a total of 3^8 = 6561 different configurations”, they write. They then search over these morphologies for designs that can locomote effectively, then make 109 distinct real-world prototypes, nine of which are actuated so their movement can be tested. 

What do we learn about simulation and reality? First, the researchers learn that simulators are hard – even modern ones. “We could not find friction settings in which the simulated movement direction matched the ground truth across all designs simultaneously,” they write. “This suggests that the accuracy of Coulomb friction model may be insufficient to model this type of movement.” However, many of their designs did successfully transfer from simulator to reality successfully – in the sense that they functioned – but sometimes they had different behaviors, like one robot that “pushes off its active limb” in simulation “whereas in reality the design uses its limb to pull itself forward, in the opposite direction”. Some of these behaviors may come down to difficulties with modeling shear and other forces in the simulation.

Why this matters: Cheap, small robots are in their Wright Brothers era, with a few prototypes like the jello-esque ones described here making their first, slow steps into the world. We should pay attention, because due to their inherent simplicity, soft robots may get deployed more rapidly than complex ones.
    Read more: Scalable sim-to-real transfer of soft robot designs (Arxiv).
    Get code assets here (sim2real4designs GitHub).

####################################################

Chips get political: RISC-V foundation moves from Delaware to Switzerland due to US-China tensions:
…Modern diplomacy? More like modern CHIPlomacy!…
Chips are getting political. In the past year, the US and China have begun escalating a trade war with eachother which has already led to tariffs and controls applied to certain technologies. Now, a US-based nonprofit chip foundation is so worried by the rising tensions that it has moved to Switzerland. The RISC-V foundation supports the development of a modern, open RISC-based chip architecture. RISC-V chips are destined from everything from smartphones to data center servers (though since chips take a long time to mature, we’re probably several years away from significant applications). The RISC-V foundation’s membership includes companies like the US’s Google as well as China’s Alibaba and Huawei. Now, the foundation is moving to Switzerland. “From around the world, we’ve heard that ‘if the incorporation was not in the U.S., we would be a lot more comfortable,” the foundation’s CEO, Calista Redmond, told Reuters. Various US politicians expressed concern about the move to Reuters.

Why this matters: Chips are one of the most complicated things that human civilization is capable of creating. Now, it seems these sophisticated things are becoming the casualties of rising geopolitical tensions between the US and China.
   Read more: U.S.-based chip-tech group moving to Switzerland over trade curb fears (Reuters).

####################################################

Software + AI + Surveillance = China’s IJOP:
…How China uses software to help it identify Xinjiang residents for detection…
China is using a complex software system called the Integrated Joint Operations Platform (IJOP) to help it identify and track citizens in Xinjiang for detention by the state, according to leaked documents analyzed by the International Consortium of Investigative Journalists.

Inside the Integrated Joint Operations Platform (IJOP): IJOP collects information on citizens “then uses artificial intelligence to formulate lengthy lists of so-called suspicious persons based on this data”. IJOP is a machine learning system “that substitutes artificial intelligence for human judgement”, according to the ICIJ. The IJOP system is linked to surveillance cameras, street checkpoints, informants, and more. The system also tries to predict people that the state should consider detaining, then provides those predictions to people: “the program collects and interprets data without regard to privacy, and flags ordinary people for investigation based on seemingly innocuous criteria”, the ICIJ writes. In one week in 2018, IJOP produced 24,412 names of people to be investigated. “IJOP’s purpose extends far beyond identifying candidates for detention. Its purpose is to screen an entire population for behavior and beliefs that the government views with suspicion, the ICIJ writes. 

Why this matters: In the 1970s, Chile tried to create a computationally-run society via Project Cybersyn. The initiative failed due to the relative immaturity of the computational techniques and political changes. In the later 1970s and 1980s the Stasi in East Germany started trying to use increasingly advanced technology to create a sophisticated surveillance dragnet which it applied to people living there. Now, advances in computers, digitization, and technologies like AI have made electronic management and surveillance of a society cheaper and easier than ever before. Therefore, states like China are compelled to use more and more of the technology in service of strengthening the state. Systems like IJOP and its use in Xinjiang are a harbinger of things to come – the difference between now and the past, is these systems might actually work… with chilling consequences.
   Read more: Exposed: China’s Operating Manuals for Mass Internment and Arrest by Algorithm (International Consortium of Investigative Journalists)

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Germany’s AI regulation plans:
In October, Germany’s Data Ethics Commission released a major report on AI regulation. Most notably, they propose AI applications should be categorised by the likelihood that they will cause harm, and the severity of that harm. Regulation should be proportional to this risk: ‘level 5’ applications (the most risky) should be subject to a complete or partial ban; levels 3–4 should be subject to stringent transparency and oversight obligations.

   Wider implications: The Commission proposes these measures are implemented as EU-wide ‘horizontal regulation’. It is likely to influence future European legislation, which is expected to emerge over the next tear. Whether it is a ‘blueprint’ for this legislation, as has been reported, remains to be seen.

   Why it matters: These plans are unlikely to be well-received by the AI policy community, which has generally cautioned against premature and overly stringent regulation. The independent advisory group to the European Commission on AI cautioned against “unnecessarily prescriptive regulation”, pointing out that in domains of fast technological progress, a ‘principles-based’ approach was generally preferable. If, as looks likely, Europe is an early mover in AI regulation, their successes and failures might inform how the rest of the world tackles this problem in the coming years.
   Read more: Opinion of the Data Ethics Commission.
   Read more: AI: Decoded – A German blueprint for AI rules (Politico).

AI Safety Unconference at NeurIPS:
For the second year running, there is an AI Safety Unconference at NeurIPS, on Monday December 9th. There are only a few spaces left, so register soon.
   Read more: AI Safety Unconference 2019.

####################################################

Tech Tales

Fetch, robot!

The dog and the robot traveled along the highway, weaving a path between rusting cars and sometimes making small jumps over cracks in the tarmac. They’d sit in the cars at night, with the dog sleeping on whatever softness it could find and the robot sitting in a state of low power consumption. On sunny days the robot charged up its batteries with a solar panel that unfolded from its back like the wings of a butterfly. One of its wings had a missing piece in its scaffold which meant one of the panels dangled at an angle, rarely getting full sun. The dog would forage along the highway and sometimes bring back batteries it found for the robot – they rarely worked, but when they did the robot would – very delicately – place them inside itself and say, variously, “power capacity increased” or “defective component replaced” and the robot would wag its tail. 

Sometimes they’d go past human bones and the robot would stop and take a photo. “Attempting to identify deceased,” it would verbalize. “Identification failed,” it would always say. Sometimes, the dog would grab a bone off of a skeleton and walk alongside the robot. For many years, the dog had tried to get the robot to throw a bone for it, but the robot had never learned how as it had not been built to be particularly attuned to learning from dogs. Sometimes the robot would pick up bones and the dog would get excited, but the robot only did this when the bones were in its way, and it only moved them far enough to clear a path for itself. 

Sometimes the robot would get confused: it’d stop in front of a puddle of oil and say “route lost”, or pause and appear to stare into the woods, sometimes saying “unknown entity detected”. The dog learned that it could get the robot to keep moving by standing in front of its camera which would make the robot say “obstruction. Repositioning…” and then it’d move. On rare occasions it’d still be confused and would stay there, sometimes rotating its camera stalk. Eventually, the dog learned that it could headbut the robot and uses its own body to move it forward, and if it did this long enough the robot would say “route resolved” and keep trundling down the road. 

A few months later, they rolled into a city where they met the big robots. The robot was guided in by a homing beacon and the dog followed the robot, untroubled by the big robots, or the drones that started to track them, or the cages full of bones.
   HOW STRANGE, said one of the big robots to its other big robot friend, TO SEE THE ORGANIC MAKE A PET OF THE MACHINE.
   YES, said the other big robot. OUR EXPERIENCE WAS THE INVERSE.

Things that inspired this story: The limitations of generalization; human feedback versus animal feedback mechanisms; the generosity and inherent kindness of most domesticated animals; Cormac McCarthy’s “The Road”; Kurt Vonnegut; starlight on long roads in winter; the sound of a loved one breathing in the temporary dark.