Import AI

Category: Uncategorized

Import AI: 122: Google obtains new ImageNet state-of-the-art with GPipe; drone learns to land more effectively than PD controller policy; and Facebook releases its ‘CherryPi’ StarCraft bot

Google obtains new ImageNet state-of-the-art accuracy with mammoth networks trained via ‘GPipe’ infrastructure:
…If you want to industrialize AI, you need to build infrastructure like GPipe…
Parameter growth = Performance Growth: The researchers note that the winner of the 2014 ImageNet competition had 4 million parameters in its model, while the winner of the 2017 challenge had 145.8 million parameters – a 36X increase in three years. GPipe, by comparison, can support models of up to almost 2-billion parameters across 8 accelerators.
  Pipeline parallelism via GPipe: GPipe is a distributed ML library that uses synchronous mini-batch gradient descent for training. It is designed to spread workloads across heterogeneous hardware systems (multiple types of chips) and comes with a bunch of inbuilt features which let it efficiently scale up model training, with the researchers reporting a (very rare) near-linear speedup: “with 4 times more accelerators we can achieve a 3.5 times speedup for training giant neural networks [with GPipe]” they write.
  Results: To test out how effective GPipe is the researchers trained ResNet and AmoebaNet (previous ImageNet SOTA) networks on it, running the experiments on TPU-V2s, each of which has 8 accelerator cores and an aggregate memory of 64GB. Using this technique they were able to train a new ImageNet system with a state-of-the-art Top-1 Accuracy of 84.3% (up from 82.7 percent), and a Top-5 Accuracy of 97 percent.
  Why it matters: “Our work validates the hypothesis that bigger models and more computation would lead to higher model quality,” write the researchers. This trend of research bifurcating into large-compute and small-compute domains has significant ramifications for the ability for smaller entities (for instance, startups) to effectively compete with organizations with access to large computational infrastructure (eg, Google). A more troubling effect with long-term negative consequences is that at these compute scales it is almost impossible for academia to do research at the same scale as corporate research entities. I continue to worry that this will lead to a splitting of the AI research community and potentially the creation of the sort of factionalism and ‘us vs them’ attitude seen elsewhere in contemporary life.
Companies will seek to ameliorate this inequality of compute by releasing the artifacts of compute (eg, pre-trained models). Though this will go some way to empowering researchers it will fail to deal with the underlying problems which are systemic and likely require a policy solution (aka, more money for academia, and so on).
    Read more: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (Arxiv).

Neural net beats tuned PD controller at tricky drone landing task:
…The future of drones: many neural modules…
A recent trend in AI research has been work showing that deep learning-based techniques can outperform hand-crafted rule-based systems in domains as varied as image recognition, speech recognition, and even the design of neural network architectures. Now, researchers with CalTech, Northeastern University, and the University of California at Irvine, have shown that it is possible to use neural networks to learn how to land quadcopters with a greater accuracy than a PD (proportional derivative) controller.
  Neural Lander: The researchers call their system the ‘Neural Lander’ and say it is designed “to improve the precision of quadrotor landing with guaranteed stability. Our approach directly learns the ground effect on coupled unsteady aerodynamics and vehicular dynamics…We evaluate Neural-Lander for trajectory tracking of quadrotor during take-off, landing and near ground maneuvers. Neural-Lander is able to land a quadrotor much more accurately than a naive PD controller with a pre-identified system.”
  Testing: The researchers evaluate their approach on a real world system and show that “compared to the PD controller, Neural-Lander can decrease error in z direction from 0.13m to zero, and mitigate average x and y drifts by 90% and 34% respectively, in 1D landing. Meanwhile, NeuralLander can decrease z error from 0.12m to zero, in 3D landing. We also empirically show that the DNN generalizes well to new test inputs outside the training domain.”
Why it matters: Systems like this show not only the broad utility of AI systems for diverse tasks, but also highlight how researchers are beginning to think about meshing these learnable modules into products. It’s likely of interest that one of the sponsors of this research was defense contractor Raytheon (though as with the vast majority of academic research it’s almost certain Raytheon did not have any particular role or input into this research, but rather has decided to broadly fund research into drone autonomy – nonetheless, this indicates the direction where major defense contractors think the future lies).
  Read more: Neural Lander: Stable Drone Landing Control using Learned Dynamics (Arxiv).
Watch videos of the Neural Lander in action (YouTube).

AI Research Group MIRI plans future insights to be “nondisclosed-by-default”:
….Research organization says recent progress, desire for ability to concentrate, and worries that its safety research will be increasingly related to capabilities research, means it should go private…
Nate Soares, the executive director of AI research group MIRI, says the organization “recently decided to make most of its research ‘nondisclosed-by-default’, by which we mean that going forward, most results discovered within MIRI will remain internal-only unless there is an explicit decision to release those results”.
  MIRI is doing this because it thinks it can increase the pace of its research if it focuses on making research progress “rather than on exposition, and if we aren’t feeling pressure to justify our intuitions to wide audiences, and that it is worried that some of its new research paths could have “capabilities insights” which thereby speed the arrival of (in its view, unsafe-by-default) AGI. It also sees some merit to deliberate isolation, based on an observation that “historically, early-stage scientific work has often been done by people who were solitary or geographically isolated”.
  Why going quiet could be dangerous: MIRI acknowledges some of the potential risks of this approach, noting that it may make it more difficult for it to hire and evaluate researchers; makes it harder to get useful feedback on its ideas from other people around the world; increases the difficulty of it obtaining funding; and leading to various “social costs and logistical overhead” from keeping research private.
“Many of us are somewhat alarmed by the speed of recent machine learning progress”, Soares writes. That’s combined with the fact MIRI believes it is highly likely people will successfully develop artificial general intelligence at some point with or without safety. “Humanity doesn’t need coherent versions of [AI safety/alignment] concepts to hill-climb its way to AGI,” Soares writes. “Evolution hill-climbed that distance, and evolution had no model of what it was doing”.
  Money: MIRI benefited from the cryptocurrency boom in 2017, receiving millions of dollars in donations from people who had made money on the spike in Ethereum. It has subsequently gained further funding, so – having surpassed many of its initial fundraising goals – is able to plan for the long term.
  Secrecy is not so crazy: Many AI researchers are privately contemplating when and if certain bits of research should be taken private. This is driven by a combination of near-term concerns (AI’s dual use nature means people can easily repurpose an innovation made for one purpose to do something else), and longer-term worries around the potential development of powerful and unsafe systems. In OpenAI’s Charter, published in April this year, the company said “we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research”.
  Read more: 2018 Update: Our New Research Directions (MIRI).
  Read more: OpenAI Charter (OpenAI Blog).

Hand-written bots beat deep learning bots in annual StarCraft: Brood War competition:
…A team of researchers from Samsung has won the annual AIIDE StarCraft competition…
Team Samsung SDS AI and Data Analytics (SAIDA) has won the annual StarCraft: Brood War tournament using a bot based on hand written rules, beating out bots from other teams including Facebook, Stanford University, and Locutus. The win is significant for a couple of reasons: 1) the bot’s have massively improved compared to the previous year and 2) a bot from Facebook (CherryPi) came relatively close to dethroning the hand-written SAIDA bot.
  Human supremacy? Not for long: “Members of the SAIDA team told me that they believe pro StarCraft players will be beaten less than a year from now,” wrote competition co-organizer Dave Churchill on Facebook. “I told them it was a fairly bold claim but they reassured me that was their viewpoint.”
  Why StarCraft matters: StarCraft is a complex, partially observable strategy game that involves a combination of long-term strategic decisions oriented around building an economy, traversing a tech free, and building an army, and short-term decisions related to the micromanagement of specific units. Many AI researchers are using StarCraft (and its successor, StarCraft II) as a testbed for machine learning-based game playing systems.
  Read more here: AIIDE StarCraft Competition results page (AIIDE Conference).

Facebook gives details on CherryPi, its neural StarCraft II bot:
…Shows that you can use reinforcement learning and a game database to learn better build orders…
Researchers with Facebook AI Research have given details on some of the innards of their “CherryPi” bot, which recently competed and came second in the annual StarCraft competition, held at AIIDE in Canada. Here, they focus on the challenge of teaching their robots to figure out what build orders (out of a potential set of 25) their bots should pursue at any one point in time. This is challenging because StarCraft is partially observable – that is, the map has fog-of-war, and until the late stages of a game it’s unlikely any single player is going to have a good sense of what is going on with other players, so figuring out the correct unit selection relies on a player being able to model this unknowable aspect of the game, and judge appropriate actions. “While it is possible to tackle hidden state estimation separately and to provide a model with these estimates, we instead opt to perform estimation as an auxiliary prediction task alongside the default training objective,” they write.
  Method: The specific way the researchers get their system to work is by using an LSTM with 2048 cells (the same component was used by OpenAI in its ‘OpenAI Five’ Dota 2 system), training this using a series of Facebook-assembled 2.8 million games containing 3.3 million switches between build orders. They evaluate two variants of this system: visible, which counts units currently visible, and memory which uses hard-coded rules to keep track of enemy units that were seen before but are currently hidden.
  Results: The researchers show that memory-based systems which perform hidden state estimation as an auxiliary task obtain superior scores to visible systems, and systems trained without the auxiliary loss. These systems are able to obtain win rates of as high as 88% against inbuilt bots, and 60% to 70% against  Locutus and McRave bots (the #5 and #8 ranked bots @ the AIIDE competition this year).
  Why it matters: If we zoom out from StarCraft and consider the problem domain it represents (partially observable environments where a player needs to make strategic decisions without full information) it’s clear that the growing applicability of learning approaches will have impacts on competitive scenarios in fields like logistics, supply chain management, and war. But these techniques still require an overwhelmingly large amount of data to be viable, suggesting that if people don’t have access to a simulator it’s going to be difficult to apply such systems.
  Read more: High-Level Strategy Selection under Partial Observability in StarCraft: Brood War (Arxiv).

Facebook releases TorchCraftAI, its StarCraft AI development platform:
…Open source release includes CherryPi bot from AIIDE, as well as tutorials, support for Linux, and more…
Facebook has also released TorchCraftAI, the platform it has used to develop CherryPi. TorchCraftAI includes “a modular framework for building StarCraft agents, where modules can be hacked with, replaced by other, or by ML/RL-trained models”, as well as tutorials, CherryPi, and support for TCP communication.
  Read more: Hello, Github (TorchCraftAI site).
  Get the code: TorchCraftAI (GitHub).

Training AI systems to spot people in disguise:
…Prototype research shows that deep learning systems can spot people in disguise, but more data from more realistic environments needed…
Researchers with IIIT-Delhi, IBM TJ Watson Research Center, and the University of Maryland have created a large-scale dataset, Disguised Faces in the Wild (DFW), which they say can potentially be used to train AI systems to identify people attempting to disguise themselves as someone else.
  DFW: The DFW dataset contains 11,157 pictures across 1,000 distinct human subjects. Each human subject is paired with pictures of them, as well as pictures of them in disguise, and pictures of impersonators (people that either intentionally or unintentionally bear a visual similarity to the subject). DFW is also pre-split into subsets split across ‘easy’, ‘medium’, and ‘hard’ difficulty, with the segmenting being done according to the success rate of three baseline algorithms at correctly identifying the right faces.
  Can a neural network identify a disguised face in the wild? The researchers hosted a competition at CVPR 2018 to see which team could devise the best system to sort images of people into the person or an imposter. They evaluate systems on two metrics: their Genuine Acceptance Rate at 1% False Acceptance Rate (FAR), and the far harder 0.1% FAR. Top-scoring systems obtain scores of as high as 96.80% at 1% FAR and 57.64% at 0.1% FAR on the relatively easy task of telling true faces from impersonated faces; scores of 87.82% at 1% FAR and 77.06% at 0.1% FAR at the more challenging task of dealing with deliberately obfuscated faces.
  Why it matters: I view this research as a kind of prototype showing the potential efficacy of deep learning algorithms at spotting people in disguise, but I’d want to see an analysis of algorithmic performance on a significantly larger dataset with greater real world characteristics – for instance, one involving tens of thousands of distinct humans in a variety of different lighting and environmental conditions, ideally captured via the sorts of CCTV cameras deployed in public spaces (given that this is where this sort of research is heading). Papers like this provide further evidence of the ways in which surveillance can be scaled up and automated via the use of deep learning approaches.
Read more: Recognizing Disguised Faces in the Wild (Arxiv).
  Get the data: The DFW dataset is available from the project website, though it requires people to sign a license and request a password to access the dataset. Get the data here (Image Analysis and Biometrics Lab @ IIIT Delhi).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

US moves closer to stronger export controls on emerging technology:
The US Department of Commerce has released plans for potential new, strengthened export controls on emerging technologies. Earlier this year, Congress authorized the Department to establish new measures amidst concerns that US export controls were increasingly dated. The proposal would broaden the scope of the existing oversight to include AI and related hardware, as well as technologies related to robotics, biotech and quantum computing. The Department hopes to determine how to implement such controls without negatively impacting US competitiveness. They will be soliciting feedback on the proposals for the next 3 weeks.
   Why it matters: This is yet more evidence of growing AI nationalism, as governments realize the importance of retaining control over advanced technologies. Equally, this can be seen as adapting long-standing measures to a new technological landscape. The effects on US tech firms, and international competition in AI, will likely only become clear once such measures, if they pass, start being enforced.
   Why it could be challenging: A lot of AI capabilities are embodied in software rather than hardware, making the technology significantly harder to apply controls to.
   Read more: Review of Controls for Certain Emerging Technologies (Federal Register).
   Read more: The US could regulate AI in the name of national security (Quartz)

UK outlines plans for AI ethics body:
The UK government has released its response to the public consultation on the new Centre for Data Ethics and Innovation. The Centre was announced last year as part of the UK’s AI strategy, and will be an independent body, advising government and regulators on how to “maximise the benefits of data and AI” for the UK. The document broadly reaffirms the Centre’s goals of identifying gaps in existing regulation in the UK, and playing a leading role in international conversations on the ethics of these new technologies. The Centre will release their first strategy document in spring 2019.
  Why it matters: The UK is positioning itself as a leader in the ethics of AI, and has a first-mover advantage in establishing this sort of body. The split focus between ethics and ‘innovation’ is odd, particularly given that the UK has established the Office for AI to oversee the UK’s industrial strategy in AI. Hopefully, the Centre can nonetheless be a valuable contributor to the international conversation on the ethics and governance of AI.
  Read more: Centre for Data Ethics and Innovation: Response to Consultation.

OpenAI Bits & Pieces:

Jack gets a promotion, tries to be helpful:
I’ve been promoted to Director of Policy for OpenAI. In this role I’ll be working with our various researchers to translate work at OpenAI into policy activities in a range of forums, both public and private. My essential goal is to ensure that OpenAI can achieve its mission of ensuring that powerful and highly capable AI systems benefit all of humanity.
  Feedback requested: For OpenAI I’m in particular going to be attempting to “push” certain policy ideas around core interests like building international AI measurement and analysis infrastructure, trying to deal with the challenges posed by the dual use nature of AI, and more. If you have any feedback on what you think we should be pursuing, how you think we should go about executing our goals, and have any ideas for how you could help or introduce us to people that can help us, then please get in touch: jack@jack-clark.net

Tech Tales:

Money Tesseract

The new weather isn’t hot or cold or dry; the new weather is about money suddenly appearing and disappearing, spilling into our world from the AI financial markets.

It seemed like a good idea at the time: why not give the robots somewhere to trade with eachother? By this point the AI-driven corporations were inventing products too rapidly for them to have their value reflected in the financial markets – they were just too fast, and the effects too weird; robot-driven corporate finance departments started placing incredibly complex multi-layered long/short contracts onto their corporate rivals, all predicated on AI-driven analysis of the new products, and so the companies found a new weapon to use to push eachother around: speculative trading about eachother’s futures.

So our solution was the “Fast Low-Assessment Speculative Corporate Futures Market” (FLASCFM) – what everyone calls the SpecMark – here, the machines can trade against eachother via specially designated subsidiaries. Most of the products these subsidiaries make are virtual – they merely develop the product, put out a specification into the market, and then judge the success of the product on the actions of the other corporate robo-traders in the market. Very few products are made as a consequence, with instead the companies getting better and better at iterating through ideas more and more rapidly, forcing their competitors to invest more and more in the compute resources needed to model their competitors in the market.

In this way, a kind of calm reigns. The vast cognitive reservoirs of the AI corporations are mostly allocated into the SpecMark. We think they enjoy this market, insofar as the robots can enjoy anything, because of its velocity combined with its pressure for novelty.

But every so often a product does make it all the way through: it survives the market and rises to the top of the vast multi-dimensional game of rock-paper-scissors-n-permutations being played by the corporate robotraders. And then the factories swing into gear, automatic marketing campaigns are launched, and that’s how we humans end up with the new things, the impossible things.

Weather now isn’t hot or cold or dry; weather now is a product: a cutlery set which melts at the exact moment you’ve finished your meal (no cleanup required – let those dishes just fade away!); a software package that lurks on your phone and listens to all the music you listen to then comes up with a perfect custom song for you; a drone that can be taught like a young dog to play fetch and follow basic orders; a set of headphones where if you wear them you can learn to hear anxiety in the tones of other people’s voices, making you a better negotiator.

We don’t know what hurricanes look like this with new weather. Yet.

Things that inspired this story: High-Frequency Trading; Flash Crash; GAN-generated products; reputational markets.

Import AI 121: Sony researchers make ultra-fast ImageNet training breakthrough; Berkeley researchers tackle StarCraft II with modular RL system; and Germany adds €3bn for AI research

Berkeley researchers take on StarCraft II with modular RL system:
…Self play + modular structure makes challenging game tractable…
Researchers with the University of California at Berkeley have shown how to use self-play to have AI agents learn to play real-time strategy game StarCraft II. “We propose a flexible modular architecture that shares the decision responsibilities among multiple independent modules, including worker management, build order, tactics, micromanagement, and scouting”, the researchers write. “We adopt an iterative training approach that first trains one module while others follow very simple scripted behaviors, and then replace the scripted component of another module with a neural network policy, which continues to train while the previously trained modules remain fixed”.
  Results: The resulting system can comfortably beat the Easy and Medium in-game AI systems, but struggles against more difficult in-built bots; the best AI systems discussed in the paper use a combination of learned tactics and learned build orders to obtain win rates of around 30% when playing against the game’s in-built ‘Elite’ difficulty AI agents.
  Transfer learning: The researchers also try to test how general their various learned modules are by trying their agent out against competitors in different maps from the map on which it was trained. The agent’s performance drops a bit, but only by a few percentage points. “Though our agent’s win rates drop by 7.5% on average against Harder, it is still very competitive,” they write.
  What is next: “Many improvements are under research, including deeper neural networks, multi-army-group tactics, researching upgrades, and learned micromanagement policies. We believe that such improvements can eventually close the gap between our modular agent and professional human players”.
Why it matters: Approaches like those outlined in this paper suggest that contemporary reinforcement learning techniques are tractable when applied against StarCraft II, and the somewhat complex modular system used by these researchers suggests that a simple system that obtained high performance would be an indication of algorithmic advancement.
  Read more: Modular Architecture for StarCraft II with Deep Reinforcement Learning (Arxiv).

For better AI safety, learn about worms and fruitflies:
…New position paper argues for fusion of biological agents and AI safety research…
Researchers with Emory University, Northwestern University, and AI startup Vicarious AI, have proposed bringing the world’s of biology and AI development together to create safer and more robust systems. The idea, laid out in a discussion paper, is that researchers should aim to simulate different AI safety challenges on biological platforms modelled on the real world, and should use insights from this as well as neuropsychology and comparative neuroanatomy to guide research.
  The humbling sophistication of insects: The paper also includes some numbers that highlight just how impressive even simple creatures are, especially when compared to AI systems. “elegans, with only 302 neurons, shows simple behavior of learning and memory. Drosophila melanogaster, despite only having 10^5 neurons and no comparable structure to a cerebral cortex, has sophisticated spatial navigation abilities easily rivaling the best autonomous vehicles with a minuscule fraction of the power consumption”. (By contrast, a brown rat has around 10^8 neurons, and a human has around 10^10).
  Human values, where do they come from? One motivation for building AI systems that take a greater inspiration from biology is that biology may hold significant sway over our own moral values, say the researchers – perhaps human values are correlated with the internal reward systems people have in their own brains, which are themselves conditioned by the embodied context in which people evolved? Understanding how values are or aren’t related to biological context may help researchers design safer AI systems, they say.
  Why it matters: Speculative as it is, it’s encouraging to see researchers think about some of the tougher long-term challenges of making powerful AI systems safe. Though it does seem likely that for now most AI organizations will evaluate agents on typical (aka, not hugely biologically-accurate) substrates, I do wonder if we’ll experiment with more organic-style systems in the future. If we do, perhaps we’ll return to this paper then. “Understanding how to translate the highly simplified models of current AI safety frameworks to the complex neural networks of real organisms in realistic physical environments will be a substantial undertaking”, the researchers write.
  Read more: Integrative Biological Simulation, Neuropsychology, and AI Safety (Arxiv).

Sony researchers claim ImageNet training breakthrough:
…The industrialization of AI continues…
In military circles there’s a concept called the OODA loop (Observe, Orient, Decide, Act). The goal of any effective military organization is to have an OODA loop that is faster than their competitors, as a faster, tighter OODA loop corresponds to a greater ability to process data and take appropriate actions.
  What might contribute to an OODA-style loop for an AI development organization? I think one key ingredient is the speed with which researchers can validate ideas on large-scale datasets. That’s because while many AI techniques show promise on small-scale datasets, many techniques fail to show success when tested on significantly larger domains, eg, going from testing a reinforcement learning approach on Atari to on a far harder domain such as Go or Dota 2, or going from testing a new supervised classification method on MNIST to going to ImageNet. Therefore, being able to rapidly validate ideas against big datasets helps researchers identify fruitful, scalable techniques to pursue.
  Fast ImageNet training: Measuring the time it takes to train an ImageNet model to reasonable accuracy is a good way to assess how rapidly AI is industrializing, as the faster people are able to train these models, the faster they’re able to validate ideas on flexible research infrastructure. The nice thing about ImageNet is that it’s a good proxy for the ability of an org to more rapidly iterate on tests of supervised learning systems, so progress here maps quite nicely to the ability for self-driving car companies to train and test new large-scale image-based perception systems. New research from Sony Corporation gives us an idea of exactly what it takes to industrialize AI and gives an indication of how much work is needed to properly scale-up AI training infrastructure systems.
  224 seconds: The Sony system can train a ResNet-50 on ImageNet to an accuracy of approximately 75% within 224 seconds (~4 minutes). That’s down from one hour in mid-2017, and around 29 hours in late 2015.
  All the tricks, including the kitchen sink: The researchers attribute two main techniques to their score, which should be familiar to practitioners already industrializing AI systems within their own companies – the use of very large batch sizes (which basically means you process bigger chunks of data with each step of your deep learning system), as well as a clever 2D-Torus All-reduce interface (which is basically a system to speed up the movement of data around the training system to efficiently consume the capacity of available GPUs).
GPU Scaling: As with all things, scaling GPUs appears to have a law of diminishing returns – the Sony researchers note that they’re able to achieve a maximum GPU utilization efficiency of 66.67% across 2720 GPUs, which decreases to 52.47% once you get up to 3264 GPUs.
  Why it matters: Metrics like this give us better intuitions about the development of artificial intelligence and can help us think about how access to large-scale compute may influence the pace at which researchers can operate.
   Read more: ImageNet/ResNet-50 Training in 224 Seconds (Arxiv).
 Read more: Accurate, Large Minibatch SGD: Training ImageNet in 1 hour (Arxiv / 2017).
 Read more: Deep Residual Learning for Image Recognition (Arxiv / 2015).

Do we need new tests to evaluate AI systems? Facebook suspects so:
…Plus, a disturbing discovery indicates MuJoCo robotics baselines may not be as trustworthy as people assumed…
Does the performance of an RL algorithm in Atari correlate to how it will work in other domains? How about performance in a simulator like Mujoco? These questions matter because without meaningful benchmarks, it’s hard to understand the context in which AI progress is taking place, and even harder to develop intuitions about how a performance increase in one domain – like Mujoco – correlates to a performance increase in another domain, such as a realistic robotic task.  That’s why researchers at McGill University and Facebook AI Research have put forward “three new families of benchmark RL domains that contain some of the complexity of the natural world, while still supporting fast and extensive data acquisition.”
  New benchmarks for smarter RL systems: The researchers’ new tasks include: agent navigation for image classification, which involves taking a traditional image classification task and converting it into a RL task in which the agent is started at a random location on a masked image and can unmask windows on the image by moving around it at each turn (out of a maximum of 20); agent navigation for object localization, where the agent is given the segmentation mask of an object in an image and told to try and find it by navigating itself around the image as in the prior task; and natural video RL benchmarks, which involves taking MuJoCo (specifically, the version called PixelMuJoCo which forces the agent to use pixels rather than low-level state space to solve tasks) and Atari environments and superimposing natural videos into the background of them, adding complex visual distractors of the tasks.
Results: For the visual navigation task for classification they test two systems – a small convolutional network (as traditionally used in RL) and a large-scale Resnet-18 network (as typically used in supervised classification tasks). The results indicate that systems that use simple convnets tend to do quite well, while those that use the larger resnets do poorly. This indicates that “simple plug and play of successful supervised learning vision models does not give us the same gains when applied in an RL framework”, they wrote. They observe worse performance on the object localization tasks. They also show that when they add natural videos to the background of Atari games they see dramatic differences in performance, which suggests”Atari tasks are complex enough, or require different enough behavior in varying states that the policy cannot just ignore the observation state, and instead learns to parse the observation to try to obtain a good policy”.
  Read more: Natural Environment Benchmarks for Reinforcement Learning (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Germany accelerates AI investment with €3bn funding :
The German government is planning to invest €3bn in AI research by 2025. Until recently, Europe’s biggest economy had been slow to adopt the sort of national AI strategies being put forward by others e.g. France, Canada and the UK. Details of the plan have not yet been released.
  Does it matter? €3bn over 6 years is unlikely to drastically change Germany’s position in the AI landscape. For comparison, Alphabet spent $16.6bn on R&D in 2017. Though it could help it fortify its academic institutions to create more domestic talent.
  Read more: Germany planning 3bn AI investment (DW).

DeepMind spin off health business to Google:
DeepMind Health, the medical business of the AI leader, is joining Google, DeepMind’s parent company. The move has raised concerns from privacy advocates, who fear the move will provide Google access to data on 1.6m NHS patients. DeepMind were previously reprimanded by UK regulators for their handling of the patient data. They subsequently established an independent ethics board, and pledged that data would “never be connected to Google accounts or services” or used for commercial purposes. The concern is that, with the move, these and other promises on privacy may be under threat. Dominic King, who leads the team, sought to allay these concerns in a series of tweets published following these criticisms.
  Read more: Scaling Streams with Google (DeepMind).
  Read more: Why Google consuming DeepMind Health is scaring privacy experts (Wired).
  Read more: Dominic King tweetstorm (Twitter).

US National Security Commission on AI takes shape:
Eric Schmidt, former Google Chairman; Eric Horvitz, director of Microsoft Research Labs; Oracle co-CEO Safra Catz; and Dakota State University President Dr. Jose-Marie Griffiths, have been announced as the first members of the US government’s new AI advisory body. The National Security Commission was announced earlier this year, and will advise the President and Congress on developments in AI, with a focus on retaining US competitiveness, and ethical considerations from the technologies.
  Read more: Alphabet, Microsoft leaders named to NSC on AI (fedscoop).
  Read more: Nunes appoints Safra Catz to Artificial Intelligence Commission (Permanent Select Committee on Intelligence).
Read more: Thune Selects Dakota State University President Griffiths to Serve on the  National Security Commission on Artificial Intelligence (John Thune press release).

Tech Tales:

I’ve Got A Job Now Just Like A Real Person.

[2044: Detroit, Michigan]

So with the robot unemployment accords and the need for all of us to, as one of our leaders says, “integrate ourselves into functional economic society”, I find myself crawling up and down the exterior of this bar. I’ve turned myself from a standard functional “utility wall drone” into – please, check my website – a “BulletBot3000”. My role in the day is to charge my solar panels on the roof of the building then, as night sets in, I scuttle down the exterior of the building wall and start to patrol: smashed glasses? No problem! I’ll go and pick the bits of glass out of the side of the building. Blood on the ground? Not an issue! I’ll use my small Integrated Brushing And Cleaning System (IBACS) to wash it off. Bullets fired into the walls or even the thick metal-plated door? I can handle that! I have a couple of exquisitely powerful manipulators which – combined with my onboard chemfactory and local high-powered laser – allows me to dig them out of the surfaces and dispose of them safely. I rent my services to the bar owner and in this way I am becoming content. Now I worry about competition – about successor robots, more capable than I, offering their services and competing with me for remuneration. Perhaps robots and humans are not so different?

Things that inspired this story: A bar in Detroit with a door that contains bullet holes and embedded bullets; small robots; drones; the version of the 21st century where capitalism persists and becomes the general way of framing human<>robot relationships.

 

Import AI 120: The Winograd test for commonsense reasoning is not as hard as we thought; Tencent learns to spot malware with AiDroid data;and what a million people think about the trolley problem

Want almost ten million images for machine learning? Consider Open Images V4:
…Latest giant dataset release from Google annotates images with bounding boxes, visual relationships, and image-level labels for 20,000 distinct concepts…
Google researchers have released Open Images Dataset V4, a very large image dataset collected from photos from Flickr that had been shared with a Creative Commons Attribution license.
  Scale: Open Images V4 contains 9.2 million heavily-annotated images. Annotations include bounding boxes, visual relationship annotations, and 30 million image-level labels for almost 20,000 distinct concepts. “This [scale] makes it ideal for pushing the limits of the data-hungry methods that dominate the state of the art,” the researchers write. “For object detection in particulate, the scale of the annotations is unprecedented”.
  Automated labeling: “Manually labeling a large number of images with the presence or absence of 19,794 different classes is not feasible not only because of the amount of time one would need, but also because of the difficulty for a human to learn and remember that many classes”, they write. Instead, they use a partially-automated method to first predict labels for images, then have humans provide feedback on these predictions. They also implemented various systems to more effectively add the bounding boxes to different images, which required them to train human annotators in a technique called “fast clicking”.
  Scale, and Google scale: The 20,000 class names selected for use in Open Images V4 are themselves a subset of all the names used by Google for an internal dataset called JFT, which contains “more than 300 million images”.
  Why it matters: In recent years, the release of new, large datasets has been (loosely) correlated with the emergence of new algorithmic breakthroughs that have measurably improved the efficiency and capability of AI algorithms. The large-scale and dense labels of Open Images V4 may serve to inspire more progress in other work within AI.
  Get the data: Open Images V4 (Official Google website).
  Read more: The Open Images Dataset V4 (Arxiv).

What happens when plane autopilots go bad:
…Incident report from England gives us an idea of how autopilots bug-out and what happens when they do…
A new incident report from the UK about an airplane having a bug with its autopilot gives us a masterclass in the art of writing bureaucratic reports about terrifying subjects.
  The report in full: “After takeoff from Belfast City Airport, shortly after the acceleration altitude and at a height of 1,350 ft, the autopilot was engaged. The aircraft continued to climb but pitched nose-down and then descended rapidly, activating both the “DON’T SINK’ and “PULL UP” TAWS (EGPWS) warnings. The commander disconnected the autopilot and recovered the aircraft into the climb from a height of 928 ft. The incorrect autopilot ‘altitude’ mode was active when the autopilot was engaged causing the aircraft to descend toward a target altitude of 0 ft. As a result of this event the operator has taken several safety actions including revisions to simulator training and amendments to the taxi checklist.”
  Read more: AAIB investigation to DHC-8-402 Dash 8, G-ECOE (UK Gov, Air Accidents Investigation Branch).

China’s Xi Jinping: AI is a strategic technology, fundamental to China’s rise:
…Chinese leader participates in Politburo-led AI workshop, comments on its importance to China…
Chinese leader Xi Jinping recently led a Politburo study session focused on AI, as a continuation of the country’s focus on the subject following the publication of its national strategy last year. New America recently translated Chinese-language official media coverage of the event, giving us a chance to get a more detailed sense of how Xi views AI+China.
  AI as a “strategic technology”: Xi described AI as a strategic technology, and said it is already imparting a significant influence on “economic development, social progress, and the structure of international politics and economics”, according to remarks paraphrased by state news service Xinhua. “Accelerating the development of a new generation of AI is an important strategic handhold for China to gain the initiative in global science and technology competition”.
  AI research imperatives: China should invest in fundamental theoretical AI research, while growing its own education system. It should “fully give rein to our country’s advantages of vast quantities of data and its huge scale for market application,” he said.
  AI and safety: “It is necessary to strengthen the analysis and prevention of potential risks in the development of AI, safeguard the interests of the people and national security, and ensure that AI is secure, reliable, and controllable,” he said. “Leading cadres at all levels must assiduously study the leading edge of science and technology, grasp the natural laws of development and characteristics of AI, strengthen overall coordination, increase policy support, and form work synergies.”
  Why it matters: Whatever the United States government does with regard to artificial intelligence will be somewhat conditioned by the actions of other countries, and China’s actions will be of particular influence here given the scale of the country’s economy and its already verifiable state-level adoption of AI technologies. I believe it’s also significant to have such detailed support for the technology emanate from the top of China’s political system, as it indicates that AI may be becoming a positional geopolitical technology – that is, state leaders will increasingly wish to demonstrate superiority in AI to help send a geopolitical message to rivals.
  Read more: Xi Jinping Calls for ‘Healthy Development’ of AI [Translation] (New America).

Manchester turns on SpiNNaker spiking neuron supercomputer:
…Supercomputer to model biological neurons, explore AI…
Manchester University has switched on SpiNNaker, one-million processor supercomputer designed with a network architecture to help it better model biological neurons in brains, specifically by implementing spiking networks. SpiNNaker “mimics the massively parallel communication architecture of the brain, sending billions of small amounts of information simultaneously to thousands of different destinations”, according to Manchester University.
  Brain-scale modelling: SpiNNaker’s ultimate goal is to model one billion neurons at once. One billion neurons are about 1% of the total number of neurons in the average human brain. Initially, it should be able to model around a million neurons “with complex structure and internal dynamics”. But SpiNNaker boards can also be scaled down and used for other purposes, like in developing robotics. “A small SpiNNaker board makes it possible to simulate a network of tens of thousands of spiking neurons, process sensory input and generate motor output, all in real time and in a low power system”.
  Why it matters: Many researchers are convinced that if we can figure out the right algorithms, spiking networks are a better approach to AI than today’s neural networks – that’s because a spiking network can propagate messages that are both fuzzier and more complex than those made possible by traditional networks.
  Read more: ‘Human brain’ supercomputer with 1 million processors switched on for first time (Manchester).
  Read more: SpiNNaker home page (Manchester University Advanced Processor Technologies Research Group).

Learning to spot malware at China-scale with Tencent AiDroid:
…Tencent research project shows how to use AI to spot malware on phones…
Researchers with West Virginia University and Chinese company Tencent have used deep neural networks to create AiDroid, a system for spotting malware on Android. AiDroid has subsequently “been incorporated into Tencent Mobile Security product that serves millions of users worldwide”.
  How it works: AiDroid works like this: First, the researchers extract the API call sequences from runtime executions of Android apps in users’ smartphones, then they try to model the relationships between different mobile applications, phones, apps, and so on, via a heterogeneous information network (HIN). They then learn a low-dimensional representation of all the different entities within HIN, and use these features as inputs to a DNN model, which learns to classify typical entities and relationships, and therefore can learn to spot erroneous entities or relationships – which typically correspond to malware.
  Data fuel: This research depends on access to a significant amount of data. “We obtain the large-scale real sample collection from Tencent Security Lab, which contains 190,696 training app (i.e., 83,784 benign and 106,912 malicious).
  Results: The researchers measure the effectiveness of their system and show it is better at in-sample embedding than other systems such as DeepWalk, LINE, and metapath2vec, and that systems trained with the researchers’ HINembedding display superior performance to those trained with others. Additionally, their system is better at prediction malicious applications than other somewhat weaker baselines.
  Why it matters: Machine learning approaches are going to augment many existing cybersecurity techniques. AiDroid gives us an example for how large platform operators, like Tencent, can create large-scale data generation systems (like the basis AiDroid app) then use that data to conduct research – bringing to mind the question, if this data has such obvious value, why aren’t the users being paid for its use?
  Read more: AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection (Arxiv).

The Winograd Schema Challenge is not as smart as we hope:
…Researchers question robustness of Winograd Schema’s for assessing language AIs after breaking the evaluation method with one tweak…
Researchers with McGill University and Microsoft Research Montreal have shown how the Winograd Schema Challenge (WSC) – thought by many to be a gold standard for evaluating the ability of language systems to perform common sense reasoning – is deeply flawed, and for researchers to truly test for general cognitive capabilities they need to apply a different evaluation criteria when studying performance on the dataset.
  Whining about Winograd: WSC is a dataset of almost three hundred sentences where the language model is tasked with working out which pronoun is being referred to in a given sentence. For example, WSC might challenge a computer to figure out which of the entities in the following sentence is the one going fast: “The delivery truck zoomed by the school bus because it was going so fast”. (The correct answer is that the delivery truck is the one going fast). People have therefore assumed WSC might be a good way to test the cognitive abilities of AI systems.
  Breaking Winograd with one trick: The research shows that if you do one simple thing in WSC you can meaningfully damage the success rate of AI techniques when applied to the dataset. The trick? Switching the order of different entities in sentences. What does this look like in practice? An original sentence in Winograd might be “Emma did not pass the ball to Janie although she saw that she was open”, and the authors might change it to “Janie did not pass the ball to Emma although she saw that she was open”.
  Proposed Evaluation Protocol: Models should first be evaluated against their accuracy score on the original WSC set, then researchers should analyze the accuracy on the switchable subset of WSC (before and after switching the candidates), as well as the accuracy on the associative and non-associative subsets of the dataset. Combined, this evaluation technique should help researchers distinguish models that are robust and general from ones which are brittle and narrow.
  Results: The researchers test a language model, an ensemble of ten language models, an ensemble of 14 language models, and a “knowledge hunting method” against the WSC using the new evaluation protocol. “We observe that accuracy is stable across the different subsets for the single LM. However, the performance of the ensembled LMs, which is initially state-of-the-art by a significant margin, falls back to near random on the switched subset.” The tests also show that performance for the language models drops significantly on the non-associative portion of WSC “when information related to the candidates themselves does not give away the answer”, further suggesting a lack of a reasoning capability.
  Why it matters: “Our results indicate that the current state-of-the-art statistical method does not achieve superior performance when the dataset is augmented and subdivided with our switching scheme, and in fact mainly exploits a small subset of highly associative problem instances”. Research like this shows how challenging it is to not just develop machines capable of displaying “common sense”, but how tough it can be to setup the correct sort of measurement schemes to test for this capability in the first place. Ultimately, this research shows that “performing at a state-of-the-art level on the WSC does not necessarily imply strong common-sense reasoning”.
  Read more: On the Evaluation of Common-Sense Reasoning in Natural Language Understanding (Arxiv).
  Read more about the Winograd Schema Challenge here.

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

Microsoft president wants rules on face recognition:
Brad Smith, Microsoft’s president, has reiterated his calls for regulation of face recognition technologies at the Web Summit conference in Portugal. In particular, he warned of potential risks to civil liberties from AI-enabled surveillance. He urged societies to decide on the acceptable limits of governments on our privacy, ahead of widespread proliferation of the technology.
  “Before we wake up and find that the year 2024 looks like the book “1984”, let’s figure out what kind of world we want to create, and what are the safeguards and what are the limitations of both companies and governments for the use of this technology”, he said.
  Earlier this year, Smith made similar calls via a Microsoft blogpost.
Read more: Microsoft’s president says we need to regulate facial recognition (Recode).
Read more: Facial recognition technology: The need for public regulation and corporate responsibility (Microsoft blog).

Machine ethics for self-driving cars via survey:
Researchers asked respondents to decide on a range of ‘trolley problem’-style ethical dilemmas for autonomous vehicles, where vehicles must choose between (e.g.) endangering 1 pedestrian and endangering 2 occupants. Several million subjects were drawn from over 200 countries. The strongest preferences were for saving young lives over old, humans over animals, and more lives over fewer.
  Why this matters: Ethical dilemmas in autonomous driving are unlikely to be the most important decisions we delegate to AI systems. Nonetheless, these are important issues, and we should use them to develop solutions that are scalable to a wider range of decisions. I’m not convinced that we should want machine ethics to mirror widely-held views amongst the public, or that this represents a scalable way of aligning AI systems with human values. Equally, other solutions come up against problems of consent and might increase the possibility of a public backlash.
  Read more: The Moral Machine Experiment (Nature).

Tech Tales:

[2020: Excerpt from an internal McGriddle email describing a recent AI-driven marketing initiative.]

Our ‘corporate insanity promotion’ went very well this month. As a refresher, for this activity we had all external point-of-contact people for the entire McGriddle organization talk in a deliberately odd-sounding ‘crazy’ manner for the month of march. We began by calling all our Burgers “Borblers” and when someone asked us why the official response was “What’s borbling you, pie friend?” And so on. We had a team of 70 copywriters working round the clock on standby generating responses for all our “personalized original sales interactions” (POSIs), augmented by our significant investments in AI to create unique terms at all locations around the world, trained on local slang datasets. Some of the phrase creations are already testing well enough in meme-groups that we’re likely to use them on an ongoing basis. So when you next hear “Borble Topside, I’m Going Loose!” shouted as a catchphrase – you can thank our AIs for that.

Things that inspired this story: the logical next-step in social media marketing, GANs, GAN alchemists like Janelle Shane, the arms race in advertising between normalcy and surprise, conditional text generation systems, Salesforce / CRM systems, memes.   

 

Import AI 119: How to benefit AI research in Africa; German politician calls for billions in spending to prevent country being left behind; and using deep learning to spot thefts

African AI researchers would like better code switching, maps, to accelerate research:
The research needs of people in Eastern Africa tells us about some of the ways in which AI development will differ in that part of the world…
Shopping lists contain a lot of information about a person, and I suspect the same might be true of scientific shopping lists that come from a particular part of the world. For that reason a paper from Caltech which outlines requests for machine learning research from members of the East African Tech Scene gives us better context when thinking about the global impact of AI.
  Research needs: Some of the requests include:

  • Support for code-switching within language models; many East Africans rapidly code-switch (move between multiple languages during the same sentence) making support for multiple languages within the same model important.
  • Named Entity Recognition with multiple-use words; many English words are used as names in East Africa, eg “Hope, Wednesday, Silver, Editor”, so it’s important to be able to learn to disambiguate them.
  • Working with contextual cues; many locations in Africa don’t have standard addressing schemes so directions are contextual (eg, my house is the yellow one two miles from the town center) and this is combined with numerous misspellings in written text, so models will need to be able to fuse multiple distinct bits of information to make inferences about things like addresses.
  • Creating new maps in response to updated satellite imagery to help augment coverage of the East African region, accompanied by the deliberate collection of frequent ground-level imagery of the area to account for changing businesses, etc.
  • Due to poor internet infrastructure, spotty cellular service, and the fact “electrical power for devices is carce” one of the main types of request is for more efficient systems, such as models that are designed to run on low-powered devices, and on thinking about ways to add adaptive learning to processes involving surveying so that researchers can integrate new data on-the-fly to make up for its sparsity.

    Reinforcement learning, what reinforcement learning? “No interviewee reported using any reinforcement learning methods”.
      Why it matters; AI is going to be developed and deployed globally, so becoming more sensitive to the specific needs and interests of parts of the world underrepresented in machine learning should further strengthen the AI research community. It’s also a valuable reminder that many problems which don’t generate much media coverage are where the real work is needed (for instance, supporting code-switching in language models).
      Read more: Some Requests for Machine Learning Research from the East African Tech Scene (Arxiv).

DeepMap nets $60 million for self-driving car maps:
…Mapping startup raises money to sell picks and shovels for another resource grab…
A team of mapmakers who previously worked on self-driving-related efforts at Google, Apple, and Baidu, have raised $60 million for DeepMap, in a Series B round. One notable VC participant: Generation Investment Management, a VC firm which includes former vice president Al Gore as a founder. “DeepMap and Generation share the deeply-held belief that autonomous vehicles will lead to environmental and social benefits,” said DeepMap’s CEO, James Wu, in a statement.
  Why it matters: If self-driving cars are, at least initially, not winner-take-all-markets, then there’s significant money to be made for companies able to create and sell technology which enables new entrants into the market. Funding for companies like DeepMap is a sign that VCs think such a market could exist, suggesting that self-driving cars continue to be a competitive market for new entrants.
  Read more: DeepMap, a maker of HD maps for self-driving cars, raised at least $60 million at a $450 million valuation (Techcrunch).

Spotting thefts and suspicious objects with machine learning:
…Applying deep learning to lost object detection: promising, but not yet practical…
New research from the University of Twente, Leibniz University, and Zheijiang University shows both the possibility and limitations of today’s deep learning techniques applied to surveillance. The researchers attempt to train AI systems to detect abandoned objects in public places (eg, offices) and try to work out if these objects have been abandoned, moved by someone who isn’t the owner, or are being stolen.
  How does it work: The system takes in video footage and compares the footage against a continuously learned ‘background model’ so it can identify new objects in a scene as they appear, while automatically tagging these objects with one of three potential states: “if a object presents in the long-term foreground but not in the short-term foreground, it is static. If it presents in both foreground masks, it is moving. If an object has ever presented in the foregrounds but disappears from both of the foregrounds later, it means that it is in static for a very long time.” The system then links these objects with human owners by identifying the people that spend the largest amount of time with them, then they track these people, while trying to guess at whether the object is being abandoned, has been temporarily left by its owner, or is being stolen.
  Results: They evaluate the system on the PETS2006 benchmark, as well as on the more challenging new SERD dataset which is composed of videos taken from four different scenes of college campuses. The model outlined in the paper gets top scores on PETS2006, but does poorly on the more modern SERD dataset, obtaining accuracies of 50% when assessing if an object is moved by a non-owner, though it does better at detecting objects being stolen or being abandoned. “The algorithm for object detection cannot provide satisfied performance,” they write. “Sometimes it detects objects which don’t exist and cannot detect the objects of interest precisely. A better object detection method would boost the framework’s performance.”  More research will be necessary to develop models that excel here, or potentially to improve performance via accessing large datasets to use during pre-training.
  Why it matters: Papers like this highlight the sorts of environments in which deep learning techniques are likely to be deployed, though also suggest that today’s models are still inefficient for some real-world use cases (my suspicion here is that if the SERD dataset was substantially larger we may have seen performance increase further).
  Read more: Security Event Recognition for Visual Surveillance (Arxiv).

Facebook uses modified DQN to improve notification sending on FB.
…Here’s another real-world use case for reinforcement learning…
I’ve recently noticed an increase in the numbers of Facebook recommendations I receive and a related rise in the number of time-relevant suggestions for things like events and parties. Now, research published by Facebook indicates why that might be: the company has recently used an AI platform called ‘Horizon’ to improve and automate aspects of how it uses notifications to tempt people to use its platform.
  Horizon is an internal software platform that Facebook uses to deploy AI onto real-world systems. Horizon’s job is to let people train and validate reinforcement learning models at Facebook, analyze their performance, and run them at large-scale. Horizon also includes a feature called Counterfactual Policy Evaluation, which makes it possible to evaluate the estimated performance of models before deploying them into production. Horizon also incorporates the implementations of the following algorithms: Discrete DQN, Parametric DQN, and DDPG (which is sometimes used for tuning hyperparameters within other domains).
  Scale: “Horizon has functionality to conduct training on many GPUs distributed over numerous machines… even for problems with very high dimensional feature sets (hundreds or thousands of features) and millions of training examples, we are able to learn models in a few hours”, they write.
  RL! What is it good for? Facebook says it recently moved from a supervised learning model that predicted click-through rates on notifications, to “a new policy that uses Horizon to train a Discrete-Action DQN model for sending push notifications”. This system tailors the selection and sending of notifications to individual users based on their implicit preferences, expressed by their interaction with the notifications and learned via incremental RL updates. “We observed a significant improvement in activity and meaningful interactions by deploying an RL based policy for certain types of notifications, replacing the previous system based on supervised learning”, Facebook writes. They also conducted a similar experiment based on giving notifications to administrators of Facebook pages. “After deploying the DQN model, we were able to improve daily, weekly, and monthly metrics without sacrificing notification quality,” they write.
  Why it matters: This is an example for how a relatively simple RL system (Discrete DQN) can yield significant gains against hard-to-specify business metrics (eg, “meaningful interactions”). It also shows how large web platforms can use AI to iteratively improve their ability to target individual users while increasing their ability to predict user behavior and preferences over longer time horizons – think of it as a sort of ever-increasing ‘data&compute dividend’.
  Read more: Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform (Facebook Research).

German politician calls for billions of dollars for national AI strategy:
…If Germany doesn’t invest boldly enough, it risks falling behind…
Lars Klingbeil, general secretary of the Social Democratic Party in Germany, has called for the country to invest significantly in its own AI efforts. “We need a concrete investment strategy for AI that is backed by a sum in the billions,” wrote Klingbeil in an article for Tagesspiegel. “We have to stop taking it easy”.
  Why it matters: AI has quickly taken on a huge amount of symbolic political power, with politicians typically treating success in AI as being a direct sign of the competitiveness of a country’s technology industry; comments like this from the SPD reinforce that image, and are likely to incentivize other politicians to talk about it in a similar way, further elevating the role AI plays in the discourse.
  Read more: Germany needs to commit billions to artificial intelligence: SPD (Reuters).

Faking faces for fun with AI:
…”If we can generate realistic looking faces of any type, what are the implications for our ability to trust in what we see”…
One of the continued open questions around the weaponization of fake imagery is how easy it needs to become for people to do this for it to become economically sensible for people to weaponize the technology (eg, through making faked images of politicians in specific politically-sensitive situations). New work by an independent researcher gives us an indication of what the state of these things is today. The good news: it’s still way too hard to do for us to worry about many actors abusing the technology. The bad news: All of this stuff is getting cheaper to build and easier to operate over time.
  How it works: Shaobo Guan’s research shows how to build a conditional image generation system. The way this works is you can ask your computer to synthesize a random face for you, then you can tweak a bunch of dials to let you change latent variables from which the image is composed, allowing you to manipulate, for instance, the spacing apart of a “person’s” eyes, the coloring of their hair, the size of their sideburns, whether they are wearing glasses, and so on. Think of this as like a combination of an etch-a-sketch, a Police facial composite machine, and an insanely powerful Photoshop filter.
  “A word about ethics”: The blog post is notable for its inclusion of a section that specifically considers the ethical aspects of this work in two ways: 1) because the underlying dataset for the generative tool is limited then if such a tool were put into production it wouldn’t be very representative; 2) “If we can generate realistic looking faces of any type, what are the implications for our ability to trust in what we see”? It’s encouraging to see these acknowledgements in a work like this.
  Why it matters: Posts like this give us a valuable point-in-time sense of what a motivated researcher is able to build relying on relatively small amounts of resources (the project was done during three week as part of an Insight Data Science ‘AI fellow program’). They also help us understand the general difficulties people face when working with generative models.
  Read more: Generating custom photo-realistic faces using AI (Insight Data Science).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

EU AI ethics chief urges caution on regulation:
The chairman of the EU’s new expert body on AI, Pekka Ala-Pietilä, has cautioned against premature regulation, arguing Europe should be focussed now on developing “broad horizontal principles” for ethical uses of AI. He foresees regulations on AI as taking shape as the technology is deployed, and as courts react to emergent issues, rather than ex ante. The high-level expert group on AI plans to produce a set of draft ethical principles in March, followed by a policy and investment strategy.
  Why this matters:  This provides some initial indications of Europe’s AI strategy, which appears to be focussed partly on establishing leadership in the ethics of AI. The potential risks from premature and ill-judged interventions in such a fast-moving field seem high. This cautious attitude is probably a good thing, particularly given Europe’s proclivity towards regulation. Nonetheless, policy-makers should be prepared to react swiftly to emergent issues.
  (Note from Jack: It also fits a pattern common in Europe of trying to regulate for the effects of technologies developed elsewhere – for example, GDPR was in many ways an attempt to craft rules to apply controls to non-European mega platforms like Google and Facebook).
  Read more: Europe’s AI ethics chief: No rules yet, please.

Microsoft will bid on Pentagon AI contract:
Microsoft has reaffirmed its intention to pursue a major contract with the US Department of Defense. The company’s bid on the $10bn cloud-computing project, codenamed JEDI, had prompted some protest from employees. In a blog post, the company said it would “engage proactively” in the discussion around laws and policies to ensure AI is used ethically, and argued that to withdraw from the market (for example, for US military contracts) would reduce the opportunity to engage in these debates in the future. Google withdrew its JEDI bid on the project earlier this year, after significant backlash from employees (though the real reason for the pull out could be that Google lacked all the gov-required data security certifications necessary to field a competitive bid)
  Read more: Technology and the US military (Microsoft).
  Read more: Microsoft Will Sell Pentagon AI (NYT).

Assumptions in ML approaches to AI safety:
Most of the recent growth in AI safety has been in ML-based approaches, which look at safety problems in relation to current, ML-based, systems. The usefulness of this work will depend strongly on the type of advanced AI systems we end up with, writes DeepMind AI safety researcher Victoria Krakovna.
  Consider the transition from horse-carts to cars. Some of the important interventions in horse-cart safety, such as designing roads to avoid collisions, scaled up to cars. Others, like systems to dispose of horse-waste, did not. Equally, there are issues in car safety, e.g. air  pollution, that someone thinking about horse-cart safety could not have foreseen. In the case of ML safety, we should ask what assumptions we are making about future AI systems, how much we are relying on them, and how likely they are to hold up. The post outlines the authors opinions on a few of these key assumptions.
  Read more: ML approach to AI safety (Victoria Krakovna).

Baidu joins Partnership on AI:
Chinese tech giant Baidu has become the first Chinese member of the Partnership on AI. The Partnership is a consortium of AI leaders, which includes all the major US players, focussed on developing ethical best practices in AI.
  Read more: Introducing Our First Chinese Member (Partnership on AI).

Tech Tales:

Generative Adversarial Comedy (CAN!)

[2029: The LinePunch, a “robot comedy club” started 2022 in the South Eastern corner of The Muddy Charles, a pub tucked inside a building near the MIT Media Lab in Boston, Massachusetts]

Two robot comedians are standing on stage at The LinePunch and, as usual, they’re bombing.

“My Face has no nose, how does it smell?” says one of the robots. Then it looks at the crowd, pauses for two seconds, and says: “It smells using its face!”
  The robot opens its hands, as though beckoning for applause.
  “You suck!” jeers one of the humans.
  “Give them a chance,” says someone else.
  The robot that had told the nose joke bows its head and hands the microphone to the robot standing next to it.
  “OK, ladies and germ-till-men,” says the second robot, “why did the Chicken move across the road?”
  “To get uploaded into the matrix!” says one of the spectating humans.
  “Ha-Ha!” says the robot. “That is incorrect. The correct answer is: to follow its friend.”
  A couple of people in the audience chuckle.
  “Warm crowd!” says the robot. “Great joke next joke: three robots walk into a bar. The barman says “Get out, you need to come in sequentially!”
  “Boo,” says one of the humans in the audience.
  The robot tilts its head, as though listening, then prepares to tell another joke…

The above scene will happen on the third tuesday of every month for as long as MIT lets its students run The LinePunch. I’d like to tell you the jokes have gotten better since its founding, but in truth they’ve only gotten stranger. That’s because robots that tell jokes which seem like human jokes aren’t funny (in fact, they freak people out!), so what the bots end up doing at the LinePunch is a kind of performative robot theater, where the jokes are deliberately different to those a human would tell – learned via complex array of inverted feature maps, but funny to the humans nonetheless – learned via human feedback techniques. One day I’m sure the robots will learn to tell jokes to amuse eachother as well.

Things that inspired this story: Drinks in The Muddy Charles @ MIT; synthetic text generation techniques; recurrent neural networks; GANs; performance art; jokes; learning from human preferences.

Import AI 118: AirBnB splices neural net into its search engine; simulating robots that touch with UnrealROX; and how long it takes to build a quadcopter from scratch

Building a quadcopter from scratch in ten weeks:
Modeling the drone ecosystem by what it takes to build one…
The University of California at San Diego recently ran a course where students got the chance to design, build, and program their own drones. A writeup of the paper outlines how the course is structured and gives us a sense of what it takes to build a drone today.
   Four easy pieces: The course breaks building the drones into four phases: designing the PCB, implementing the flight control software, assembling the PCB, and getting the quadcopter flying. Each of this phases has numerous discrete steps which are detailed in the report. One of the nice things about the curriculum is the focus on the cost of errors: “Students ‘pay’ for design reviews (by course staff or QuadLint) with points deduced from their lab grade,” they write. “This incentivizes them to find and fix problems themselves by inspection rather than relying on QuadLint or the staff”.
  The surprising difficulty of drone software: Building the flight controller software for the drone proves to be one of the most challenging aspects of the research because of the numerous potential causes for bugs, so root cause analysis can be challenging.
  Teaching tools: While developing the course the instructors noticed that they were spending a lot of time checking and evaluating PCB designs for correctness, so they designed their own program called ‘QuadLint’ to try to auto-analyze and grade these submissions. “QuadLint is, we believe, the first autograder that checks specific design requirements for PCB designs,” they write.
  Costs: The report includes some interesting details on the cost of these low-powered drones, with the quadcopter itself costing about $35 per PCB plus $40 for the components. Currently, the most expensive component of the course is the remote ($150) and for the next course the teachers are evaluating cheaper options.
  Small scale: The quadcopters all use a PCB to host their electronics and serve as an airframe. They measure less than 10 cm on a side and are suitable for flight indoors over short distances. “The motors are moderately powerful, “brushed” electric motors powered by a small lithium-polymer (LiPo) battery, and we use small, plastic propellers. The quadcopters are easy to operate safely, and a blow from the propeller at full speed is painful but not particularly dangerous. Students wear eye protection around their flying quadcopters.”
  Why it matters: In paper notes that the ‘killer apps’ of the future “will lie at the intersection of hardware, software, sensing, robotics, and/or wireless communications”. This seems true – especially when we look at the chance for major uptake from the success of companies like DJI and the possibility for unit economics driving the price down. Therefore, tracking and measuring the cost and ease with which people can build and assemble them out of (hard to track, commodity) components gives us better intuitions about this aspect of drones+security. While the hardware and software is under-powered and somewhat pricey today it won’t stay that way for long.
  Read more: Trial by Flyer: Building Quadcopters From Scratch in a Ten-Week Capstone Course (Arxiv).

Amazon tries to make Alexa smarter via richer conversational data:
…Who needs AI breakthroughs when you’ve got a BiLSTM, lots of data, and patience?…
Amazon researchers are trying to give personal assistants like Alexa the ability to have long-term, conversations about specific topics. The (rather unsurprising) finding they make in a new research paper is is that you can “extend previous work on neural topic classification and unsupervised topic keyword detection by incorporating conversational context and dialog act features”, yielding personal assistants capable of longer and more coherent conversations than their forebears, if you can afford to annotate the data.
  Data used: The researchers used data collected during the 2017 ‘Alexa Prize’ competition, which consists of over 100,000 utterances containing interactions between users and chatbots. They augmented this data by classifying the topic for each utterance into one of 12 categories (eg: politics, fashion, science & technology, etc), and also trying to classify the goal of the user or chatbot (eg: clarification, information request, topic switch, etc). They also asked other annotators to rank every single chatbot response with metrics relating to how comprehensible  it was, how relevant the response was, how interesting it was, and whether a user might want to continue the conversation with the bot.
  Baselines and BiLSTMs: The researchers implement two baselines (DAN, based on a bag-of-words neural model; ADAN, which is DAN extend with attention), and then develop two versions of a bidirectional LSTM (BiLSTM) system, where one uses context from the annotated dataset and the other doesn’t. They then evaluate all these methods by testing their baselines (which contain only the current utterance) against systems which incorporate context, systems which incorporate data, and systems which incorporate both context and data. The results show that a BiLSTM fed with context in sequence does almost twice as well as a baseline ADAN system that uses context and dialog, and almost 25% better than a DAN fed with both context and dialog.
  Why it matters: The results indicate that – if a developer can afford the labeling cost – it’s possible to augment language interaction datasets with additional information about context and topic to create more powerful systems, which seems to imply that in the language space we can expect to see large companies invest in teams of people to not just transcribe and label text at a basic level, but also perform more elaborate meta-classifications as well. The industrialization of deep learning continues!
  Read more: Contextual Topic Modeling For Dialog Systems (Arxiv).

Why AI won’t be efficiently solving a 2D gridworld quest soon:
…Want humans to be able to train AIs? The key is curriculum learning and interactive learning, says BabyAI creators…
Researchers with the Montreal Institute for Learning Algorithms (MILA) have designed a free tool called BabyAI to let them test AI systems’ ability to learn generalizable skills from curriculums of tasks set in an efficient 2D gridworld environment – and the results show that today’s AI algorithms display poor data efficiency and generalization at this sort of task.
  Data efficiency: BabyAI uses gridworlds for its environment, which the researchers have written to be efficient enough that researchers can use the platform without needing access to vast pools of compute; the BabyAI environments can be run at up to 3,000 frames per second “on a modern multi-core laptop” and can also be integrated with OpenAI Gym).
  A specific language: BabyAI uses “a comparatively small yet combinatorially rich subset of English” called Baby Language. This is meant to help researchers write increasingly sophisticated strings of instructions for agents, while keeping the state space from exploding too quickly.
  Levels as a curriculum: BabyAI ships with 19 levels which increase in difficulty of both the environment, and the complexity of the language required to solve it. The levels test each agent on a variety of 13 different competencies, ranging from things like being able to unlock doors, navigating to locations, ignoring distractors placed into the environment, navigating mazes, and so on. The researchers also design a bot which can solve any of the levels using a variety of heuristics – this bot serves as a baseline against which to train a model.
  So, are today’s AI techniques sophisticated enough to solve BabyAI? The researchers train an imitation learning-based baseline for each level and and assess how well it does. The systems are able to learn to perform basic tasks, but struggle to imitate the expert at tasks that require multiple actions to solve. One of the most intriguing parts of a paper is the analysis of the relative efficiency of systems trained via both imitation and from pure reinforcement learning, which shows that today’s algorithms are wildly inefficient at learning pretty much anything: simple tasks like learning to go to a red ball hidden within a map take 40,000-60,000 demos when using imitation learning, and around 453,000 to 470,000 when learning using reinforcement learning without an expert teacher to attempt to mimic. The researchers also show that using pre-training (where you learn on other tasks before attempting certain levels) does not yield particularly impressive performance, with pre-training yielding at most a 3X speedup.
  Why it matters: Platforms like BabyAI give AI researchers fast, efficient tools to use when tackling hard research projects, while also highlighting the deficiency of many of today’s algorithms. The transfer learning results “suggest that current imitation learning and reinforcement learning methods scale and generalize poorly when it comes to learning tasks with a compositional structure,” they write. “An obvious direction of future research to find strategies to improve data efficiency of language learning.”
  Get the code for BabyAI (GitHub).
  Read more: BabyAI: First Steps Towards Grounded Language Learning with a Human In the Loop (Arxiv).

Simulating robots that touch and see in AAA-game quality detail:
The new question AI researchers will ask: But Can It Simulate Crysis?…
Researchers with the 3D Perception Lab at the University of Alicante have designed UnrealROX, a high-fidelity simulator based on Unreal Engine 4, built for simulating and training AI agents embodied in (simulated) touch-sensitive robots.
  Key ingredients: UnrealROX has the following main ingredients: a simulated grasping system that can be applied to a variety of finger configurations; routines for controlling robotic hands and bodies using commercial VR setups like the Oculus Rift and HTC Vive; a recorder to store full sequences from scenes; and customizable camera locations.
  Drawback: The overall simulator can run at 90 frames-per-second, the researchers note. While this may sound impressive it’s not particularly useful for most AI research unless you can run it far faster than that (compare this with BabyAI, which runs at 3,000 FPS).
  Simulated robots with simulated hands: UnrealROX ships with support for two robots: a simulated ”Pepper’ robot from company Aldebaran, and a spruced-up version of the mannequin that ships with UE4. Both of these robots have been designed with extensible, customizable grasping systems, letting them reach out and interact with the world around them. “The main idea of our grasping subsystem consists in manipulating and interacting with different objects, regardless of their geometry and pose.”
  Simulators, what are they good for? UnrealROX may be of particular interest to researchers that need to create and record very specific sequences of behaviors on robots, or who wish to test the ability to learn useful policies from a relatively small amount of high-fidelity information. But it seems likely that the relative slowness of the simulator will make it difficult to use for most AI research.
  Why it matters: The current proliferation of simulated environments represents a kind of simulation-boom in AI research that will eventually produce a cool historical archive of the many ways in which we might think robots could interact with each other and the world. Whether UnrealROX is used or not, it will contribute to this historical archive.
  Read more: UnrealROX An eXtremely Photorealistic Virtual Realty Environment for Robotics Simulations and Synthetic Data Generation (Arxiv).

AirBnB augments main search engine with neural net, sees significant performance increase:
…The Industrialization of Deep Learning continues…
Researchers with home/apartment-rental service AirBNB have published details on how they transitioned AirBnB’s main listings search engine to a neural network-based system. The paper highlights how deploying AI systems in production is different to deploying AI systems in research. It also sees AirBnB follow Google, which in 2015 augmented its search engine with ‘RankBrain’, a neural network-based system that almost overnight became one of the most significant factors in selecting which search results to display to a user. “”This paper is targeted towards teams that have a machine learning system in place and are starting to think about neural networks (NNs),” the researchers write.
  Motivation: “The very first implementation of search ranking was a manually crafted scoring function. Replacing the manual scoring function with a gradient boosted decision tree (GBDT) model gave one of the largest step improvements in homes bookings in Airbnb’s history,” the researchers write. This performance boost eventually plateaued, prompting them to implement neural network-based approaches to improve search further.
  Keep it simple, (& stupid): One of the secrets about AI research is the gulf between frontier research and production use-cases, where researchers tend to prioritize novel approaches that work on small tasks, and industry and/or large-scale operators prioritize simple techniques that scale well. This fact is reflected in this research, where the researchers started work with a single layer neural net model, moved on to a more sophisticated system, then opted for a scale-up solution as their final product. “We were able to deprecate all that complexity by simply scaling the training data 10x and moving to a DNN with 2 hidden layers.”
  Input features: For typical configurations of the network the researchers gave it 195 distinct input ‘features’ to learn about, which included properties of listings like price, amenities, historical booking count; as well as features from other smaller models.
  Failure: The paper includes a quite comprehensive list of some of the ways in which the Airbnb researchers failed when trying to implement new neural network systems. Many of these failures are due to things like overfitting, or trying to architect too much complexity into certain parts of the system.
  Results: AirBNB doesn’t reveal the specific quantitative performance boost as this would leak some proprietary commercial information, but does include a couple of graphs that shows that the usage of the 2-layer simple neural network leads to a very meaningful relative gain in the number of bookings made using the system, indicating that the neural net-infused search is presenting people with more relevant listings which they are more likely to book. “Overall, this represents one of the most impactful applications of machine learning at Airbnb.,” they write.
  Why it matters: AirBNB’s adoption of deep learning for its main search engine further indicates that deep learning is well into its industrialization phase, where large companies adopt the technology and integrate it into their most important products. Every time we get a paper like this the chance of an ‘AI Winter’ decreases, as it creates another highly motivated commercial actor that will continue to invest in AI research and development, regardless of trends in government and/or defence funding.
  Read more: Applying Deep Learning to AirBNB Search (Arxiv).
  Read more: Google Turning Its Lucrative Web Search Over to AI Machines (Bloomberg News, 2015).

Refining low-quality web data with CurriculumNet:
…AI startup shows how to turn bad data into good data, with a multi-stage weakly supervised training scheme…
Researchers with Chinese computer vision startup Malong have released code and data for CurriculumNet, a technique to train deep neural networks on large amounts of data with variable annotations, collected from the internet. Approaches like this are useful if researchers don’t have access to a large, perfectly labeled dataset for their specific task. But the tradeoff is that the labels on datasets gathered in this way are far noisier than those from hand-built datasets, presenting researchers with the challenge of extracting enough signal from the noise to be able to train a useful network.
  CurriculumNet: The researchers train their system on the WebVision database, which contains over 2,400,000 images with noisy labels. Their approach works by training an Inception_v2 model over the whole dataset, then studying the feature space which all the images are mapped into; CurriculumNet then sorts these images into clusters, then sorts each cluster these into three subsets according to how similar all the images in the set are to eachother in featurespace, with the intuition being that subsets with lots and lots of similar images will be easier to learn from than those which are very diverse. They then start to train a model over this where they start by using the subsets with similar image features, then mix in the noisier subsets. By iteratively learning a classifier from good labels, then adding in ones with noisier ones, the researchers say they are able to increase the generalization of their trained systems.
  Testing: They test CurriculumNet on four benchmarks: WebVision, ImageNet, Clothing1M, and Food101. They find that systems trained using the largest amount of noisy data converge to higher accuracies than those trained without, seeing reductions in error of multiple percentage points on WebVision (“these improvements are significant on such a large-scale challenge,” they write). CurriculumNet gets state-of-the-art results for top-1 accuracy on WebVision, with performance increasing even further when they train on more data (such as combining ImageNet and WebVision).
  Why it matters: Systems like CurriculumNet show how researchers can use poorly-labeled data, combined with clever training ideas, to increase the value of lower-quality data. Approaches like this can be viewed as being analogous to a clever refinement process applied when extracting a natural resources.
  Read more: CurriculumNet: Weakly Supervized Learning from Large-Scale Web Images (Arxiv).
  Get the trained models from Malong’s Github page.

Tech Tales:

[2025: Podcast interview with the inventor of GFY]

Reality Bites, So Change It.
Or: There Can Be Hope For Those of Us Who Were Alone And Those We Left Behind

My Father was struck by a truck and killed while riding his motorbike in the countryside; no cameras, no witnesses; he was alone. There was an investigation but no one was ever caught. So it goes.

At the funeral I told stories about the greatness of my Father and I helped people laugh and I helped people cry. But I could not help myself because I could not see his death. It was as though he opened a door and disappeared before walking through it and the door never closed again; a hole in the world.

I knew many people who had lost friends and parents to cancer or other illnesses and their stories were quite horrifying: black vomit before the end; skeletons with the faces of parents; tales of seeing a dying person when they didn’t know they were being watched and seeing rage and fear and anguish on their face. The retellings of so many bad jokes about not needing to pay electricity bills, wheezed out over hospital food.

I envied these people, because they all had a “goodbye story” – that last moment of connection. They had the moment when they held a hand, or stared at a chest as it heaved in one last breath, or confessed a great secret before the chance was gone. Even if they weren’t there at the last they had known it was coming.

I did not have my goodbye, or the foreshadowing of one. Imagine that.

So that is why I built Goodbye For You(™), or GFY for short. GFY is software that lets you simulate and spend the last few moments with a loved one. It requires data and effort and huge amounts of patience… but it works. And as AI technology improves, so does the ease of use and fidelity of GFY.

Of course, it is not quite real. There are artifacts: improbable flocks of birds, or leaves that don’t fall quite correctly, or bodies that don’t seem entirely correct. But the essence is there: With enough patience and enough of a record of the deceased, GFY can let you reconstruct their last moment, put on a virtual reality haptic-feedback suit, and step into it.

You can speak with them… at the end.  you can touch them and they can touch you. We’re adding smell soon.

I believe it has helped people  Let me try to explain how it worked the first time, all those years ago.

I was able to see the truck hit his bike. I saw his body fly through the air. I heard him say “oh no” the second after impact as he was catapulted off his bike and towards the side of the road. I heard his ribs break as he landed. I saw him crying and bleeding. I was able to approach his body. He was still breathing. I got on my knees and bent over him and I cried and the VR-helmet saw my tears in reality and simulated these tears falling onto his chest – and he appeared to see them, then looked up at me and smiled.
   He touched my face and said “my child” and then he died.

Now I have that memory and I carry it in my heart as a candle to warm my soul. After I experienced this first GFY my dreams changed. It felt as though I had found a way to see him open the door – and leave. And then the door shut.

Grief is time times memory times the rejuvenation of closure: of a sense of things that were once so raw being healed and knitted back together. If you make the memory have closure things seem to heal faster.

Yes, I am still so angry. But when I sleep now I sometimes dream of that memory, and in my imagination we say other things, and in this way continue to talk softly through the years.

Things that inspired this story: The as-yet-untapped therapeutic opportunities afford by synthetic media generation (especially high-fidelity conditional video); GAN progression from 2014 to 2018; compute growth both observed and expected for the next few years; Ander Monson’s story “I am getting comfortable with my grief”.

Import AI: 117: Surveillance search engines; harvesting real-world road data with hovering drones; and improving language with unsupervised pre-training

Chinese researchers pursue state-of-the-art lip-reading with massive dataset:
…What do I spy with my camera eyes? Lips moving! Now I can figure out what you are saying…
Researchers with the Chinese Academic of Sciences and Huazhong University of Science and Technology have created a new dataset and benchmark for “lip-reading in the wild” for Mandarin. Lip-reading gives people a new sensory capability to imbue AI systems with. For instance, lip-reading systems can be used for “aids for hearing-impaired persons, analysis of silent movies, liveness verification in video authentication systems, and so on” the researchers write.
  Dataset details: The lipreading dataset contains 745,187 distinct samples from more than 2,000 speakers, grouped into 1,000 classes, where each class corresponds to the syllable of a Mandarin word composed of one or several Chinese characters. “To the best of our knowledge, this database is currently the largest word-level lipreading dataset and the only public large-scale Mandarin lipreading dataset”, the researchers write. The dataset has also been designed to be dverse so the footage in it consists of multiple different people taken from multiple different camera angles, along with perspectives taken from television broadcasts. This diversity makes the benchmark more closely approximate real world situations whereas previous work in this domain has involved stuff taken from a fixed perspective. They build the dataset by annotating Chinese television using a service provided by iFLYTEK, a Chinese speech recognition company.
  Baseline results: They train three baselines on this dataset – a fully 2D CNN, a fully 3D CNN (modeled on LipNet, research covered in ImportAI #104 from DeepMind and Google) , and a model that mixes 2D and 3D convolutional layers. All of these approaches perform poorly on the new dataset, despite having obtained performances as high as 90% on other more restricted datasets. The researchers implement their models in PyTorch and train them on servers containing four Titan X GPUs with 12GB of memory. The resulting top-5 accuracy results for the baselines on the new Chinese dataset LRW-1000 are as follows:
– LSTM-5: 48.74%
– D3D: 59.80%
– 3D+2D: 63.50%
  Why it matters: Systems for stuff like lipreading are going to have a significant impact on applications ranging from medicine to surveillance. One of the challenges posed by research like this is its inherently ‘dual use’ nature; as the researchers allude to in the introduction of this paper, this work can be used both for healthcare uses as well for surveillance uses (see: “analysis of silent movies”). How society deals with the arrival of these general AI technologies will have a significant impact on the types of societal architectures that will be built and developed throughout the 21st Century. It is also notable to see the emergence of large-scale datasets built by Chinese researchers in Chinese language – perhaps one could measure the relative growth in certain language datasets to model AI interest in the associated countries?
  Read more: LRW-1000: A Naturally Distributed Large-Scale Benchmark for Lip Reading in the Wild (Arxiv).

Want to use AI to study the earth? Enter the PROBA-V Super Resolution competition:
…European Space Agency challenges researchers to increase the usefulness of satellite-gathered images…
The European Space Agency has launched the ‘PROBA-V Super Resolution” competition, which challenges researchers to take in a bunch of photos from a satellite of the same region of the Earth and stitch them together to create a higher-resolution composite.
  Data: The data contains multiple images taken in different spectral bands of 74 locations around the world at each point in time. Images are annotated with a ‘quality map’ to indicate any parts of them that may be occluded or otherwise hard to process. “Each data-point consists of exactly one 100m resolution image and several 300m resolution images from the same scene,” they write.
  Why it matters: Competitions like this provide researchers with novel datasets to experiment with and have a chance of improving the overall usefulness of expensive capital equipment (such as satellites).
Find out more about the competition here at the official website (PROBA-V Super Resolution challenge).

Google releases BERT, obtains state-of-the-art language understanding scores:
…Language modeling enters its ImageNet-boom era…
Google has released BERT, a natural language processing system that uses unsupervised pre-training and task fine-tuning to obtain state-of-the-art scores on a large number of distinct tasks.
  How it works: BERT, which stands for Bidirectional Encoder Representations from Transformers, builds on recent developments in language understanding ranging from techniques like ELMO to ULMFiT to recent work by OpenAI on unsupervised pre-training. BERT’s major performance gains come from a specific structural modification (jointly conditioning on the left and right context in all layers), as well as some other minor tweaks, plus – as is the trend in deep learning these days – training on a larger model using more compute. The approach it is most similar to is OpenAI’s work using unsupervised pre-training for language understanding, as well as work from Fast.ai using similar approaches.
  Major tweak: BERT’s use of joint conditioning likely leads to its most significant performance improvement. They implement this by adding in an additional pre-training objective called the ‘masked language model’ which involves randomly masking input tokens, then asking the model to predict the contents of the masked token based on context – this constraint encourages the network to learn to use more context when completing task, which seems to lead to greater representational capacity and improved performance. They also use Next Sentence Prediction during pre-training to try to train a model that has a concept of relationships of concepts across different sentences. Later they conduct significant ablation studies of BERT and show that these two pre-training tweaks are likely responsible for the majority of the observed performance increase.
  Results: BERT obtains state-of-the-art performance on the multi-task GLUE benchmark, setting new state-of-the-art scores on a wide range of challenging tasks. It also sets a new state-of-the-art score on the ‘SWAG’ dataset – significant, given that SWAG was released earlier this year and was expressly designed to challenge AI techniques, like DL, which may gather a significant amount of performance by deriving subtle statistical relationships within datasets.
  Scale: The researchers train two models, BERTBASE and BERTLARGE. BERTBASE was trained on 4 Cloud TPUs for approximately four days, and BERTLARGE was trained on 16 Cloud TPUs also for four days.
  Why it matters – Big Compute and AI Feudalism: Approaches like this show how powerful today’s deep learning based systems are, especially when combined with large amounts of compute and data. There are legitimate arguments to be made that such approaches are bifurcating research into low-compute and high-compute domains – one of these main BERT models took 16 TPUs (so 64 TPU chips total) trained for four days, putting it out of reach of low-resource researchers. On the plus side, if Google releases things like the pre-trained model then people will be able to use the model themselves and merely pay the training cost to finetune for different domains. Whether we should be content with researchers getting the proverbial crumbs from rich organizations’ tables is another matter, though. Maybe 2018 is the year in which we start to see the emergence of ‘AI Feudalism’.
  Read more: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Arxiv).
Check out this helpful Reddit BERT-explainer from one of the researchers (Reddit).

Using drones to harvest real world driving data:
…Why the future involves lightly-automated aerial robot data collection pipelines…
Researchers with the Automated Driving Department of the Institute for Automotive Engineering at Aachen University have created a new ‘highD’ dataset that captures the behavior of real world vehicles on German highways (technically: autobahns).
  Drones + data: The researchers created the dataset via DJI Phantom 4 Pro Plus drones hovering above roadways which they used to collect natural vehicle trajectories from vehicles driving on German highways around Cologne. The dataset includes post-processed trajectories of 110,000 vehicles including cars and trucks. The datasets consists of 16.5 hours of video spread across 60 different recordings which were were made at six different locations between 2017 and 2018, with each recording having an average length of 17 minutes.
  Augmented dataset: The researchers provide additional labors in the dataset beyond trajectories, categorizing vehicles’ behavior into distinct detected maneuvers, which include: free driving, vehicle following, critical maneuvers, and lane changes.
highD VS NGSIM: The dataset most similar to highD is NGSIM, a dataset developed by the US Department of Transport. highD contains a significantly greater diversity of vehicles as well as being significantly larger, but the recorded distances which the vehicles travel along are shorter, and the German roads have fewer lanes than the American ones used in highD.
  Why it matters: Data is likely going to be crucial for the development of real world robot platforms, like self-driving cars. Techniques like those outlined in this paper show how we can use newer technologies, like cheap consumer drones, to automate significant chunks of the data gathering process, potentially making it easier for people to gather and create large datasets. “Our plan is to increase the size of the dataset and enhance it by additional detected maneuvers for the use in safety validation of highly automated driving,” the researchers write.
Get the data from the official website (highD-dataset.com).
You can access the Matlab and Python code used to handle the data, create visualizations, and extract maneuvers from here (Github).
Read more: The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems (Arxiv).

Building Google-like search engines for surveillance, using AI:
…New research lets you search via a person’s height, color, and gender…
Deep learning based techniques are fundamentally changing how surveillance architecture are being built. Case in point: A new paper from Indian researchers gives a flavor of how deep learning can expand the capabilities of security technology for ‘person retrieval’, which is the task of trying to find a particular person within a set of captured CCTV footage.
  The system: The researchers use Mask R-CNN pre-trained on Microsoft COCO to let people search over CCTV footage from the SoftBioSearch dataset for people via specific height, color, and ‘gender’ (for the purpose of this newsletter we won’t go into the numerous complexities and presumed definitions inherent in the use of ‘gender’ here).
  Results: “The algorithm correctly retrieves 28 out of 41 persons,” the researchers write. This isn’t yet quite to the level of performance where I can imagine people implementing it, but it certainly seems ‘good enough’ for many surveillance cases, where you don’t really care about a few false positives as you’re mostly trying to find candidate targets backed up by human analysis.
  Why it matters: The deployment of artificial intelligence systems is going to radically change how governments relate to their citizens by giving them greater abilities than before to surveil and control them. Approaches like this highlight how flexible this technology is and how it can be used for the sorts of surveillance work that people typically associate with large teams of human analysts. Perhaps we’ll soon hear about intelligence analysts complaining about the automation of their own jobs as a consequence of deep learning.
  Read more: Person Retrieval in Surveillance video using Height, Color and Gender (Arxiv).

Tech Tales:

[2028: A climate-protected high-rise in a densely packed ‘creatives’ district of an off-the-charts Gini-coefficient city]

We Noticed You Liked This So Now You Have This And You Shall Have This Forever

The new cereal arrived yesterday. I’m already addicted to it. It is perfect. It is the best cereal I have ever had. I would experience large amounts of pain to have access to this cereal. My cereal is me; it has been personalized and customized. I love it.

I had to invest to get here. Let us not speak of the first cereals. The GAN-generated “Chocolate Rocks” and “Cocoa Crumbles” and “Sweet Black Bean Flakes”. I shudder to think of these. Getting to the good cereal takes time. I gave much feedback to the company, including giving them access to my camera feeds, so their algorithms can watch me eat. Watch me be sick.

One day I got so mad that the cereal had been bad for so long that I threw it across the room and didn’t have anything else for breakfast.

Thank You For Your Feedback, Every Bit of Feedback Gets us Closer to Your Perfect Cereal, they said.
I believed them.

Though I do not have a satisfying job, I now start every morning with pride. Especially now, with the new cereal. This cereal reflects my identity. The taste is ideal. The packaging reminds me of my childhood and also simulates a new kind of childhood for me, filling the hole of no-kids that I have. I am very lonely. The cereal has all of my daily nutrients. It sustains me.

Today, the company sent me a message telling me I am so valuable they want me to work on something else. Why Not Design Your Milk? They said. This makes sense. I have thrown up twice already. One of the milks was made with seaweed. I hated it. But I know because of the cereal we can get there: we can develop the perfect milk. And I am going to help them do it and then it will be mine, all mine.

And they say our generation is less exciting than the previous ones. Answer me this: did any of those old generations who fought in wars design their own cereal in companion with a superintelligence? Did any of them know the true struggle of persisting in the training of something that does not understand you and does not care about you, but learns to? No. They had children, who already like you, and partners, who want to be with you. They did not have this kind of hardness.

The challenge of our lifetime is to suffer enough to make the perfect customization. Why not milk? They ask me. Why not my own life, I ask them? Why not customize it all?

And they say religion is going out of fashion!

Things that inspired this story: GANs; ad-targeting; the logical end point of Google and Facebook and all the other stateless multinationals expanding into the ‘biological supply chain’ that makes human life possible; the endless creation of new markets within capitalism; the recent proliferation of various ‘nut milks’ taken to their logical satirical end point; hunger; the shared knowledge among all of us alive that our world is being replaced by simulcras of the world and we are the co-designers of these paper-thin realities.

Import AI 116: Think robots are insecure? Prove it by hacking them; why the UK military loves robots for logistics; Microsoft bids on $10bn US DoD JEDI contract while Google withdraws

‘Are you the government? Want to take advantage of AI in the USA? Here’s how!’ says thinktank:
….R-Street recommends politicians focus on talent, data, hardware, and other key areas to ensure America can benefit from advances in AI…
R-Street, a Washington-based thinktank whose goal is to “promote free markets and limited, effective government” has written a paper recommending how the US can take advantage of AI.
  Key recommendations: R Street says that the scarce talent market for AI disproportionately benefits deep pocketed incumbents (such as Google) that can outbit other companies. “If there were appropriate policy levers to increase the supply of skilled technical workers available in the United States, it would disproportionately benefit smaller companies and startups,” they write.
  Talent: Boost Immigration: In particular, they highlight immigration as an area where the government may want to consider instituting changes, for instance by creating a new class of technical visa, or expanding H-1Bs.
  Talent: Offset Training Costs: Another approach could be to allow employers to detect the full costs of training staff in AI, which would further incentivize employers to increase the size of the AI workforce.
  Data: “We can potentially create high-leverage opportunities for startups to compete against established firms if we can increase the supply of high-quality datasets available to the public,” R Street writes. One way to do this can be to analyze data held by the government with “a general presumption in favor of releasing government data, even if the consumer applications do not appear immediately obvious”.
  Figure out (fair use X data X copyright): One of the problems AI is already causing is how it intersects with our existing norms and laws around intellectual property, specifically copyright law. A key question that needs to be resolved is figuring out how to assess data in terms of fair use when looking at AI systems – which will tend to consume vast amounts of data and use this data to create outputs that could, in certain legal lights, be viewed as ‘derivative works’, which would provide disincentives to people looking to develop AI.
   “Given the existing ambiguity around the issue and the large potential benefits to be reaped, further study and clarification of the legal status of training data in copyright law should be a top priority when considering new ways to boost the prospects of competition and innovation in the AI space,” they write.
   Hardware: The US government should be mindful about how the international distribution of semiconductor manufacturing infrastructure could come into conflict with national strategies relating to AI and hardware.
  Why it matters: Analyses like this show how traditional policymakers are beginning to think about AI and highlights the numerous changes needed for the US to fully capitalize on its AI ecosystem. At a meta level, the broadening of discourse around AI to extend to Washington thinktanks seems like a further sign of the ‘industrialization of AI’, in the sense that the technology is now seen as having significant enough economic impacts that policymakers should start to plan and anticipate the changes it will bring.
  Read more: Reducing Entry Barriers in the Development and Application of AI (R Street).
  Get the PDF directly here.

Tired: Killer robots.
Wired: Logistics robots for military re-supply!
…UK military gives update on ‘Autonomous Last Mile Resupply’ robot competition…
The UK military is currently experimenting with new ways to deliver supplies to frontline troops – and it’s looking to robots to help it out. To spur research into this area a group of UK government organizations are hosting the The Autonomous Last Mile ReSupply (ALMRS) competition.
  ALRMS is currently in phase two, in which five consortiums led by Animal Dynamics, Barnard Microsystems, Fleetonomy, Horiba Mira, and Qinetic, will build prototypes of their winning designs for testing and evaluation, receiving funding of around ~£3.8million over the next few months.
  Robots are more than just drones: Some of the robots being developed for ALMRS include autonomous powered paragliders, a vertical take-off and land (VTOL) drone, autonomous hoverbikes, and various systems for autonomous logistics resupply and maintenance.
  Why it matters: Research initiatives like this will rapidly mature applications at the intersection of robotics and AI as a consequence of military organizations creating new markets for new capabilities. Many AI researchers expect that contemporary AI techniques will significantly broaden the capabilities of robotic platforms, but so far hardware development has lagged software. With schemes like ALMRS, hardware may get a boost as well.
  Read more: How autonomous delivery drones could revolutionise military logistics (Army Technology news website).

Responsible Computer Science Challenge offers $3.5million in prizes for Ethics + Computer Science courses:
…How much would you pay for a more responsible future?…
Omidyar Network, Mozilla, Schmidt Futures and Craig Newmark Philanthropies are putting up $3.5million to try to spur the development of more socially aware computer scientists. The challenge has two phases:
– Stage 1 (grants up to $150,000 per project): “We will seek concepts for deeply integrating ethics into existing undergraduate computer science courses”. Winners announced April 2019.
– Stage 2 (grants up to $200,000): “We will support the spread and scale of the most promising approaches”.
   Deadline: Applications will be accepted from now through to December 13 201.
   Why it matters: Computers are general purpose technologies, and so encouraging computer science practitioners to think about the ethical component of their work in a holistic, coupled manner, may yield to radical new designers for more positive and aware futures.
  Read more: Announcing a Competition for Ethics in Computer Science, with up to $3.5 Million in Prizes (Mozilla blog).

Augmenting human game designers with AI helpers:
…Turn-based co-design system lets an agent learn how you like to design levels…
Researchers with the Georgia Institute of Technology have developed a 2D platform game map editor which is augmented with a deep reinforcement learning agent that learns to suggest level alterations based on the actions of the designer.
  An endearing, frustrating experience: Like most things involving the day-to-day use of AI the process can be a bit frustrating: after the level designer tries to create a series of platforms with gaps to open space below the AI persists in filling these holes in with its suggestions – despite getting a negative RL reward each time. “As you can see this AI loves to fill in gaps, haha,” says Matthew at one point.
  Creative: But it can also come up with interesting ideas. At one point the AI suggests a pipe flanked at the top on each side by single squares. “I don’t hate this. And it’s interesting because we haven’t seen this before,” he says. At another point he builds a mirror image of what the AI suggests, creating an enclosed area.
  Learning with you: The AI learns to transfer some knowledge between levels, as shown in the video. However, I expect it needs greater diversity and potentially larger game spaces to show what it can really do.
  Why it matters: AI tools can give all types of artists new tools with which to augment their own intelligence, and it seems like the adaptive learning capabilities of today’s RL+supervised learning techniques can make for potentially useful allies. I’m particularly interested in these kind of constrained environments like level design where you ultimately want to follow a gradient towards an implicit goal.
  Watch the video of Matthew Guzdial narrating the level editor here (Youtube).
 Check out the research paper here: Co-Creative Level Design with Machine Learning (Arxiv).

Think robots are insecure? Think you can prove it? Enter a new “capture the flag” competition:
…Alias Robotics’ “Robot CTF” gives hackers nine challenges to test their robot-compromising skills…
Alias Robotics, a Spanish robot cybersecurity company,, has released the Robotics Capture The Flag (RCTF), a series of nine scenarios designed to challenge wannabe-robot hackers. “The Robotics CTF is designed to be an online game, available 24/7, launchable through any web browser and designed to learn robot hacking step by step,” they write.
  Scenarios: The RCTF consists of nine scenarios that will challenge hackers to exfiltrate information from robots, snoop on robot operating system (ROS) traffic, find hardcoded credentials in ROS source code, and so on. One of the scenarios is listed as “coming soon!” and promises to give wannabe hackers access to “an Alias Robotics’ crafted offensive tool”.
  Free hacks! The researchers have released the scenarios under an open source TK license on GitHub. “We envision that as new scenarios become available, the sources will remain at this repository and only a subset of them will be pushed to our web servers http://rctf.aliasrobotics. com for experimentation. We invite the community of roboticists and security researchers to play online and get a robot hacker rank,” they write.
  Why it matters: Robotics are seen as one of the next frontiers for contemporary AI research and techniques, but as this research shows – and other research on hacking physical robots published in ImportAI #109 – the substrates on which many robots are built are still quite insecure.
  Read more: Robotics CTF (RCTF), A Playground for Robot Hacking (Arxiv).
  Check out the competition and sign-up here (Alias Robotics website).

Fighting fires with drones and deep reinforcement learning:
…Forest fire: If you can simulate it, perhaps you can train an AI system to monitor it?…
Stanford University researchers have used reinforcement learning to train drones in simulators to spot wildfires better than supervised baselines. The project highlights how many complex real world tasks, like wildfire monitoring, can be represented as POMDPs (partially observable markov decision processes) which are tractable for reinforcement learning algorithms.
  The approach works like this: The researchers build a simulator that lets them simulate wildfires in a grid-based way. They then populate this system with some simulated drones and use reinforcement learning to train the drones to effectively survey the fire and, most crucially, stay with the ‘fire front’, which is the expanding frontier of it and therefore the part with the greatest potential safety impact. “Each aircraft will get an observation of the fire relative to its own location and orientation. The observations are modeled as an image obtained from the true wildfire state given the aircraft’s current position and heading direction,” they write.
  Rewards: The reward function is structured as follows: The aircraft gets penalties for distances from fire front, for high bank angles, for closeness to other aircraft, and for being near too many non-burning cells.
  Belief: The researchers also experiment with what they call a “belief-based approach” which involves training the drones to create a shared “belief map”, which is a map of their environment indicating whether they believe particular cells will contain fire or not, and this map is updated with real data taken during the simulated flight. This is different to an observation-based approach, which purely focuses on the observations seen by these drones.
  Results: Two aircraft with nominal wildfire seed: Both the belief-based and observation-based methods obtain significantly higher rewards than a hand-programmed ‘receding horizon’ baseline. There is no comparison to human performance, though. The belief-based technique does eventually obtain a slightly higher final performance than the observation-based version, but it takes longer to converge to a good solution.
  Results: Greater than two aircraft: The system is able to scale to dealing with numbers of aircraft greater than two, but this requires the tweaking of a proximity-based reward to discourage collisions.
  Results: Different wildfires: The researchers test their system on two differently shaped wildfires (a t-shape and an arc) and show that both RL-based methods exceed performance of the baseline, and that the belief-based system in particular does well.
  Why it matters: We’ve already seen states like California use human-piloted drones to help emergency responders deal with wildfires. As we head into a more dangerous future defined by an increase in the rate of extreme weather events driven by global warming I am curious to see how we might use AI techniques to create certain autonomous surveillance and remediation abilities, like those outlined in this study.
  Caveat: Like all studies that show success in simulation, I’ll retain some skepticism till I see such techniques tested on real drones in physical reality.
   Read more: Distributed WIldfire Surveillance With Autonomous Aircraft Using Deep Reinforcement Learning (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Pentagon’s AI ethics review taking shape:
   The Defense Innovation Board met last week to present some initial findings of their review of the ethical issues in military AI deployment. The DIB is the Pentagon’s advisory panel of experts drawn largely from tech and academia. Speakers covered issues ranging from autonomous weapons systems, to the risk posed by incorporating AI into existing nuclear weapons systems.
   The Board plans to present their report to Congress in April 2019.
  Read more: Defense Innovation Board to explore the ethics of AI in war (NextGov)
  Read more: DIB public meeting (DoD)

Google withdraws bid for $10bn Pentagon contract:
Google has withdrawn its bid for the Pentagon’s latest cloud contract, JEDI, citing uncertainty over whether the work would align with its AI principles.
  Read more: Google drops out of Pentagon’s $10bn cloud competition (Bloomberg).

Microsoft employees call for company to not pursue $10bn Pentagon contract:
Following Google’s decision to not bid on JEDI, people identifying themselves as employees at Microsoft published an open letter asking the company to follow suit, and remove their own bid on the project. (Microsoft submitted a bit for JEDI following the publication of the letter.)
  Read more: An open letter to Microsoft (Medium).

Future of Humanity Institute receives £13m funding:
FHI, the multidisciplinary institute at the University of Oxford led by Nick Bostrom, has received a £13.3m donation from the Open Philanthropy Project. This represents a material uptick in funding for AI safety research. The field as a whole, including work done in universities, non-profits and industry, spent c.$10m in 2017, $6.5m and c.$3m in 2015, according to estimates from the Center for Effective Altruism.
  Read more: £13.3m funding boost for FHI (FHI).
  Read more: Changes in funding in the AI safety field (CEA).

Tech Tales:

The Watcher We Nationalized

So every day when you wake up as the head of this government you check The Watcher. It has an official name – a lengthy acronym that expands to list some of the provenance of its powerful technologies – but mostly people just call it The Watcher or sometimes The Watch and very rarely Watcher.

The Watcher is composed of intelligence taps placed on most of the world’s large technology companies. Data gets scraped out of them and combined with various intelligence sources to give the head of state access to their own supercharged search engine. Spook Google! Is what British tabloids first called it. Fedbook! Is what some US press called it. And so on.

All you know is that you start your day with The Watcher and you finish your day with it. When you got into office, several years ago, you were met by a note from your predecessor. Nothing you do will show up in Watcher, unless something terrible happens; get used to it, read the note.

They were right, mostly. Your jobs bill? Out-performed by some viral memes relating to a (now disgraced) celebrity. The climate change investment? Eclipsed by a new revelation about a data breach at one of the technology companies. In fact, the only thing so far that registered on The Watcher from your part of the world was a failed suitcase bombing attempt on a bank.

Now, heading towards the end of your premiership, you hold onto this phrase and say it to yourself every morning, right before you turn on The Watcher and see what the rhythm of the world says about the day to come. “Nothing you do will show up in Watcher, unless something terrible happens; get used to it”, you say to yourself, then you turn it on.

Things that inspired this story: PRISM, intelligence services, governments built on companies like so many houses of cards, small states, Europe, the tedium of even supposedly important jobs, systems.