Import AI

Import AI 129: Uber’s POET creates its own curriculum; improving old games with ESRGAN; and controlling drones with gestures via UAV-CAPTURE

Want 18 million labelled images? Tencent has got you covered:
…Tencent ML-Images merges ImageNet and Open Images together…
Data details: Tencent ML-Images is made of a combination of existing image databases such as ImageNet and Open Images, as well as associated class vocabularies. The new dataset contains 18 million images across 11,000 categories; on average, each image has eight tags applied to it.
  Transfer learning: The researchers train a ResNet-101 model on Tencent ML-Images, then finetune this pre-trained model on the ImageNet dataset and obtain scores in line with the state-of-the-art. One notable score is a claim of 80.73% top-1 accuracy on ImageNet when compared to a Google system pre-trained on an internal Google dataset called JFT-300M and fine-tuned on ImageNet – it’s not clear to me why the authors would get a higher score than Google, when Google has almost 20X the amount of data available to it for pre-training (JFT contains ~300 million images).
  Why this matters: Datasets are one of the key inputs into the practice of AI research, and having access to larger-scale datasets will let researchers do two useful things: 1) Check promising techniques for robustness by seeing if they break when exposed to scaled-up datasets, and 2) Encourage the development of newer techniques that would otherwise overfit on smaller datasets (by some metrics, ImageNet is already quite well taken care of by existing research approaches, though more work is needed for things like improving top-1 accuracy).
  Read more: Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning (Arxiv).
  Get the data: Tencent ML-Images (Github).

Want an AI that teaches itself how to evolve? You want a POET:
Uber AI Labs research shows how to create potentially infinite curriculums…
What happens when machines design and solve their own curriculums? That’s an idea explored in a new research paper from Uber AI Labs. The researchers introduce Paired Open-Ended Trailblazer (POET), a system that aims to create machines with this capability “by evolving a set of diverse and increasingly complex environmental challenges at the same time as collectively optimizing their solutions”. Most research is a form of educated bet, and that’s the case here: “An important motivating hypothesis for POET is that the stepping stones that lead to solutions to very challenging environments are more likely to be found through a divergent, open-ended process than through a direct attempt to optimize in the challenging environment,” they write.
  Testing in 2D: The researchers test POET in a 2-D environment where a robot is challenged to walk across a varied obstacle course of terrain. POET discovers behaviors that – the researchers claim – “cannot be found directly on those same environmental challenges by optimizing on them only from scratch; neither can they be found through a curriculum-based process aimed at gradually building up to the same challenges POET invented and solved”.
   How POET works: Unlike human poets, who work on the basis of some combination of lived experience and a keen sense of anguish, POET derives its power from an algorithm called ‘trailblazer’. Trailblazer works by starting with “a simple environment (e.g. an obstacle course of entirely flat ground) and a randomly initialized weight vector (e.g. for a neural network)”. The algorithm then performs the following three tasks at each iteration of the loop: generates new environments from those currently active, optimize paired agents with their respective environments, and try to transfer current agents from one environment to another. The researchers use Evolution Strategies from OpenAI to compute each iteration “but any reinforcement learning algorithm could conceivably apply”.
  The secret is Goldilocks: POET tries to create what I’ll call ‘goldilocks environments’, in the sense that “when new environments are generated, they are not added to the current population of environments unless they are neither too hard nor too easy for the current population”. During training, POET creates an expanding set of environments which are made by modifying various obstacles within the 2D environment the agent needs to traverse.
  Results: Systems trained with POET learn solutions to environments that systems trained with Evolution Strategies from scratch are not able to do. The authors theorize that this is because newer environments in POET are created through mutations of older environments and because POET only accepts new environments that are not too easy not too hard for current agents, POET implicitly builds a curriculum for learning each environment it creates.”
  Why it matters: Approaches like POET show how researchers can essentially use compute to generate arbitrarily large amounts of data to train systems on, and highlights how coming up with training regimes that involve an interactive loop between an agent, an environment, and a governing system for creating agents and environments, can create more capable systems than those that would be derived otherwise. Additionally, the implicit ideas governing the POET paper are that systems like this are a good fit for any problem where computers need to be able to learn flexible behaviors that deal with unanticipated scenarios. “POET also offers practical opportunities in domains like autonomous driving, where through generating increasingly challenging and diverse scenarios it could uncover important edge cases and policies to solve them,” the researchers write.
  Read more: Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions (Arxiv).

Making old games look better with GANs:
…ESRGAN revitalises Max Payne…
A post to the Gamespot video gaming forums shows how ESRGAN – Enhanced Super Resolution Generative Adversarial Networks – can improve the graphics of old games like Max Payne. ESRGAN gives game modders the ability to upscale old game textures through the use of GANs, improving the appearance of old games.
  Read more: Max Payne gets an amazing HD Texture Pack using ESRGAN that is available for download (Dark Side of Gaming).

Google teaches AI to learn to semantically segment objects:
Auto-DeepLab takes neural architecture search to harder problem domain…
Researchers with Johns Hopkins University, Google, and Stanford University have created an AI system called Auto-DeepLab that has learned to perform efficient semantic segmentation of images – a challenging task in computer vision, which requires labeling the various objects in an image and understanding their borders. The system developed by the researchers uses a hierarchical search function to both learn to come up with specific neural network cell designs to inform layer-wise computations, as well as figuring out the overall network architecture that chains these cells together. “Our goal is to jointly learn a good combination of repeatable cell structure and network structure specifically for semantic image segmentation,” the researchers write.
  Efficiency: One of the drawbacks of neural architecture search approaches is the inherent computational expense, with many techniques demanding hundreds of GPUs to train systems. Here, the researchers show that their approach is efficient, able to find well-performing architectures for semantic segmentation of the ‘Cityscapes’ dataset in about 3 days of one P100 GPU.
   Results: The network comes up with an effective design, as evidenced by the results on the cityscapes dataset. “With extra coarse annotations, our model Auto-DeepLab-L, without pretraining on ImageNet, achieves the test set performance of 82.1%, outperforming PSPNet and Mapillary, and attains the same performance as DeepLabv3+ while requiring 55.2% fewer Multi-Adds computations.” The model gets close to state-of-the-art on PASCAL VOC 2012 and on ADE20K.
  Why it matters: Neural architecture search gives AI researchers a way to use compute to automate themselves, so the extension of NAS from helping with supervised classification, to more complex tasks like semantic segmentation, will allow us to automate more and more bits of AI research, letting researchers specialize to come up with new ideas.
   Read more: Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation (Arxiv).

UAV-Gesture means that gesturing at drones now has a purpose:
Flailing at drones may go from a hobby of lunatics to a hobby of hobbyists, following dataset release…
Researchers with the University of South Australia have created a dataset of people performing 13 gestures that are designed to be “suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. These actions include things like hover, move to left, land, land in a specific direction, slow down, move upward, and so on.
  The dataset: The dataset consists of footage “collected on an unsettled road located in the middle of a wheat field from a rotorcraft UAV (3DR Solo) in slow and low-altitude flight”. The dataset consists of 37,151 frames distributed over 119 videos recorded in 1920 X 1080 formats at 25 fps. The videos contain videos of each gesture with different human actors, and eight different people are filmed overall.
  Get the dataset…eventually: The dataset “will be available soon”, the authors write on GitHub. (UAV-Gesture, Github).
  Natural domain randomization: “When recording the gestures, sometimes the UAV drifts from its initial hovering position due to wind gusts. This adds random camera motion to the videos making them closer to practical scenarios.”
  Experimental baseline: The researchers train a Pose-based Convolutional Neural Network (P-CNN) on the dataset and obtain an accuracy of 91.9%.
  Why this matters: Drones are going to be one of the most visible areas where software-based AI advances are going to impact the real world, and the creation (and eventual release) of datasets like UAV-Gesture will increase the amount of people able to build clever systems that can be deployed onto drones, and other platforms.
  Read more: UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition (Arxiv).

Contemplating the use of reinforcement learning in improve healthcare? Read this first:
…Researchers publish a guide for people keen to couple RL to human lives…
As AI researchers start to apply reinforcement learning systems in the real world, they’ll need to develop a better sense of the many ways in which RL approaches can lead to subtle failures. A new short paper published by an interdisciplinary team of researchers tries to think through some of the trickier issues implied by deploying AI in the real world. It identifies “three key questions that should be considered when reading an RL study”, these are: is the AI given access to all variables that influence decision making?; How big was that big data, really?; and Will the AI behave prospectively as intended?
  Why this matters: While these questions may seem obvious, it’s crucial that researchers stress them in well known venues like Nature – I think this is all part of normalizing certain ideas around AI safety within the broader research community, and it’s encouraging to be able to go from abstract discussions to more grounded questions/principles that people may wish to apply when building systems.
  Read more: Guidelines for reinforcement learning in healthcare (Nature).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

What does the American public think about AI?
Researchers at the Future of Humanity Institute have surveyed 2,000 Americans on their attitudes towards AI.
  Public expecting rapid progress: Asked to predict when machines will exceed human performance in almost all economically-relevant tasks, the median respondent predicted 54% chance by 2028. This is considerably sooner than recent surveys of AI experts.
AI fears not confined to elites: A substantial majority (82%) believe AI/robots should be carefully managed. Support for developing AI was stronger among high-earners, those with computer science or programming experience, and the highly-educated.
  Lack of trust: Despite their support for careful governance, Americans do not have high confidence in any particular actors to develop AI for the public benefit. The US military was the most trusted, followed by universities and non-profits. Government agencies were less trusted than tech companies, with the exception of Facebook, who were the least trusted of any actor.
  Why it matters: Public attitudes are likely to significantly shape the development of AI policy and governance, as has been the case for many other emergent political issues (e.g. climate change, immigration). Understanding these attitudes, and how they change over time, is crucial in formulating good policy responses.
  Read more: Artificial Intelligence: American Attitudes and Trends (FHI).
  Read more: The American public is already worried about AI catastrophe (Vox).

International Panel on AI:
France and Canada have announced plans to form an International Panel on AI (IPAI), to encourage the adoption of responsible and “human-centric” AI. The body will be modeled on the Intergovernmental Panel on Climate Change (IPCC), which has led international efforts to understand the impacts of global warming. The IPAI will consolidate research into the impacts of AI, produce reports for policy-makers, and support international coordination.
  Read more: Mandate for the International Panel on Artificial Intelligence.

Tech Tales:

The Propaganda Weather Report

Starting off this morning we’re seeing a mass of anti-capitalist ‘black bloc’ content move in from 4chan and Reddit onto the more public platforms. We expect the content to trigger counter-content creation from the far-right/nationalist bot networks. There have been continued sightings of synthetically-generated adverts for a range of libertarian candidates, and in the past two days these ads have increasingly been tied to a new range of dreamed-up products from the Chinese netizen feature embedding space.

We advise all of today’s content travelers to set their skepticism to high levels. And remember, if someone starts talking to you outside of your normal social network, make all steps to verify their identify and if unsuccessful, prevent the conversation from continuing – it takes all of human society to work together to protect ourselves from subversive digital information attacks.

Things that inspired this story: Bot propaganda, text and image generation, weather reports, the Shipping Forecast, the mundane as the horrific and the horrific as the mundane, the commodification of political discourse as just another type of ‘content’, the notion that media in the 21st century is fundamentally a ‘bot’ business rather than human business.

Import AI 128: Better pose estimation through AI; Amazon Alexa gets smarter by tapping insights from Alexa Prize, and differential privacy gets easier to implement in TensorFlow

How to test vision systems for reliability: sample from 140 public security cameras:
…More work needed before everyone can get cheap out-of-the-box low light object detection…
Are benchmarks reliable? That’s a question many researchers ask themselves, whether testing supervised learning or reinforcement learning algorithms. Now, researchers with Purdue University, Loyola University Chicago, Argonne National Laboratory, Intel, and Facebook have tried to create a reliable, real world benchmark for computer vision applications. The researchers use a network of 140 publicly accessible camera feeds to gather 5 million images over a 24 hour period, then test a widely deployed ‘YOLO’ object detector against these images.
  Data: The researchers generate the data for this project by pulling information from CAM2, the Continuous Analysis of Many CAMeras project, which is built and maintained by Purdue University researchers.
  Can you trust YOLO at night? YOLO performance degrades at night, causing the system to fail to detect cars when they are illuminated only by streetlights (and conversely, at night it sometimes mistakes streetlights for vehicles’ headlights, causing it to label lights as cars).
  Is YOLO consistent? YOLO’s performance isn’t as consistent as people might hope – there are frequent cases where YOLO’s predictions for the total number of cars parked on a street varies over time.
  Big clusters: The researchers used two supercomputing clusters to perform image classification: one cluster used a mixture of Intel Skylake CPU and Knights Landing Xeon Phi cores, and the other cluster used a combination of CPUs and NVIDIA dual-K80 GPUs. The researchers used this infrastructure to process data in parallel, but did not analyze the different execution times on the different hardware clusters.
  Labeling: The researchers estimate it would take approximately ~600 days to label all 5 million images, so instead labels a subset (13,440) images, then checks labels from YOLO against this test set.
  Why it matters: As AI industrializes being able to generate trustworthy data about the performance of systems will be crucial to giving people the confidence necessary to adopt the technology; tests like this both show how to create new, large-scale robust datasets to test systems, and indicate that we need to develop more effective algorithms to have systems sufficiently powerful for real-world deployment.
  Read more: Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions (Arxiv).
  Read more about the dataset (CAM2 site).

Amazon makes Alexa smarter and more conversational via the Alexa Prize:
Report analyzing results of this year’s competition…
Amazon has shared details of how it improved the capabilities of its Alexa personal assistant through running the Alexa open research prize. The tl;dr is that inventions made by the 16 participating teams during the competition have improved Alexa in the following ways: “driven improved experiences by Alexa users to an average rating of 3.61, median duration of 2 mins 18 seconds, and average [conversation] turns to 14.6, increases of 14%, 92%, 54% respectively since the launch of the 2018 competition”, Amazon wrote.
  Significant speech recognition improvements: The competition has also meaningfully improved the speech recognition performance of Amazon’s system – significant, given how fundamental speech is to Alexa. “For conversational speech recognition, we have improved our relative Word Error Rate by 55% and our relative Entity Error Rate by 34% since the launch of the Alexa Prize,” Amazon wrote. “Significant improvement in ASR quality have been obtained by ingesting the Alexa Prize conversation transcriptions in the models” as well as through algorithmic advancements developed by the teams, they write.
  Increasing usage: As the competition was in its second year in 2018, Amazon now has some comparative data to use to compare general growth in Alexa usage. “Over the course of the 2018 competition, we have driven over 60,000 hours of conversations spanning millions of interactions, 50% higher than we saw in the 2017 competition,” they wrote.
  Why it matters: Competitions like this show how companies can use deployed products to tempt researchers into doing work for them, and highlights how the platforms will likely trade access for AI agents (eg, Alexa) in exchange for the ideas of researchers. It also highlights the benefit of scale: it would be comparatively difficult for a startup with a personal assistant with a small install base to offer a competition offering the same scale and diversity of interaction as the Alexa Prize.
  Read more: Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize (Arxiv).

Chinese researchers create high-performance ‘pose estimation’ network:
…Omni-use technology highlights the challenges of AI policy; pose estimation can help us make better games and help people get fit, but can also surveil people…
Researchers with facial recognition startup Megvii, Inc; Shanghai Jiao Tong University; Beihang University, and Beijing University of Posts and Telecommunications have improved the performance of surveillance AI technologies via implementing what they call a ‘multi-stage pose estimation network’ (MSPN). Pose estimation is a general purpose computer vision capability that lets people figure out the wireframe skeleton of a person from images and/or video footage – this sort of technology has been widely used for things like CGI and game playing (eg, game consoles might extract poses from people via cameras like the Kinect and use this to feed the AI component of an interactive fitness video game, etc). It also has significant applications for automated surveillance and/or image/video analysis, as it lets you label large groups of people from their poses – one can imagine the utility of being able to automatically flag if a crowd of protestors display a statistically meaningful increase in violent behaviors, or being able to isolate the one person in a crowded train station who is behaving unusually.
  How it works: MSPN: The MSPN has three tweaks that the researchers say explains its performance: tweaks to the main classification module to prevent information being lost during downscaling of images during processing; improving post localization by adopting a coarse-to-fine supervision strategy, and sharing more features across the network during training.
  Results: “New state-of-the-art performance is achieved, with a large margin compared to all previous methods,” the researchers write. Some of the baselines they test against include: AE, G-RMI, CPN, Mask R-CNN, and CMU Pose. The MSPN obtains state-of-the-art scores on the COCO test set, with versions of the MSPN that use purely COCO test-dev data managing to score higher than some systems which augmented themselves with additional data.
  Why it matters: AI is, day in day out, improving the capabilities of automated surveillance systems. It’s worth remembering that for a huge amount of areas of AI research, progress in any one domain (for instance, an improved architecture for supervised classification like a Residual Networks) can have knock-on effects in other more applied domains, like surveillance. This highlights both the omni-use nature of AI, as well as the difficulty of differentiating between benign and less benign applications of the technology.
  Read more: Rethinking on Multi-Stage Networks for Human Pose Estimation (Arxiv).

Making deep learning more secure: Google releases TensorFlow Privacy
…New library lets people train models compliant with more stringent user data privacy standards…
Google has released TensorFlow Privacy, a free Python library which lets people train TensorFlow models with differential privacy. Differential privacy is a technique for training machine learning systems in a way that increases user privacy by letting developers set various tradeoffs relating to the amount of noise applied to the user data being processed. The theory works like this: given a large enough number of users, you can add some noise to individual user data to anonymize them, but continue to extract a meaningful signal out of the overall blob of patterns in the combined pool of fuzzed data – if you have enough of it. And Apple does (as do other large technology companies, like Amazon, Google, Microsoft, etc).
  Apple + Differential Privacy: Apple was one of the first large consumer technology companies to publicly state it had begun to use differential privacy, announcing in 2016 that it was using the technology to train large-scale machine learning models over user data without compromizing on privacy.
  Why it matters: As AI industrializes, adoption will be sped up by coming up with AI training methodologies that better preserve user privacy – this will also ease various policy challenges associated with the deployment of large-scale AI systems. Since TensorFlow is already very widely used, the addition of a dedicated library for implementing well-tested differential privacy systems will help more developers experiment with this technology, which will improve it and broaden its dissemination over time.
  Read more: TensorFlow Privacy (TensorFlow GitHub).
  Read more: Differential Privacy Overview (Apple, PDF).

Indian researchers make a DIY $1,000 Robot Dog named Stoch:
…See STOCH walk!, trot!, gallop!, and run!…
Researchers with the Center for Cyber Physical Systems, IISc, Bengaluru, India, have published a recipe that lets you build a $1,000 quadrupedal robot named Stoch that, if you squint, looks like a cheerful robot dog.
  Stoch the $1,000 robot dog: Typical robot quadrupeds like the MIT Cheetah or Boston Dynamics’ Spot Mini cost on the order of $30,000 to manufacture the researchers write (part of this is from more expensive and accurate sensing and actuator equipment).  Stoch is significantly cheaper because of a hardware design based on widely available off-the-shelf materials combined with non-standard 3D-printed parts that can be made in-house; as well as software for teleoperation of the robot as well as a basic walking controller.
  Stoch – small stature, large (metaphorical) heart: “The Stoch is designed equivalent to the size of a miniature Pinscher dog”, they write. (I find this endears Stoch to me even more).
  Basic movements – no deep learning required: To get robots to do something like walk you can either learn a model from data, or you can code one yourself. The researchers mostly do the former here, using nonlinear coupled differential equations to generate coordinates which are then used to generate joint angles via inverse kinematics. The researchers implement a few different movement policies on Stoch, and have published a video showing the really quite-absurdly cute robot dog walking, trotting, galloping and – yes! – bounding. It’s delightful. The core of the robot is running a Raspberry Pi 3b board which communications via PWM Drivers with the robot’s four leg modules.
  Why it matters – a reminder: Lots of robot companies choose to hand-code movements usually by performing some basic well-understood computation over sensor feedback to let robots hop, walk, and run. AI systems may let us learn far more complex movements, like OpenAI’s work on manipulating a cube with a Shadowhand, but these approaches are currently data and compute-intensive and may require more work on generalization to be as applicable as hand-coded techniques. Papers like this show how for some basic tasks its possible to implement well-documented non-DL systems and get basic performance.
  Why it matters – everything gets cheaper: One central challenge for technology policy is that technology seems to get cheaper over time – for example, back in ~1999 the Japanese government briefly considered imposing export controls on the PS2 consoles over worries about the then-advanced chips inside it being put to malicious uses (whereas today’s chips are significantly more powerful and are in everyone’s smartphones). This paper is an example for how innovations in 3D printing and second-order effects from other economies of scale (eg, some parts of this robot are made of carbon fibre) can make surprisingly futuristic-seeming robot platforms into economic reach for larger numbers of people.
  Watch STOCH walk, trot, gallop, and bound! (Video Results_STOCH (Youtube)).
  Read more: Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch (Arxiv).
  Read more: Military fears over PlayStation2, BBC News, Monday 17 April 2000 (BBC News).

Helping blind people shop with ‘Grocery Store Dataset’:
Spare a thought for the people that gathered ~5,000 images from 18 different stores…
Researchers with KTH Royal Institute of Technology and Microsoft Research have created and released a dataset of common grocery store items to help AI researchers train better computer vision systems. The dataset labels have a hierarchical structure, labeling a multitude of objects with board coarse and fine-grained labels.
  Dataset ingredients: The researchers collected data using a 16-megapixel Android smartphone camera and photographed 5125 images of various items in the fruit and vegetable and refrigerated dairy/juice sections of 18 different grocery stores. The dataset contains 81 fine-grained products (which the researchers call classes) which are each accompanied with the following information: “an iconic image of the item and also a product description including origin country, an appreciated weight and nutrient values of the item from a grocery store website”.
  Dataset baselines: The researchers run some baselines over the dataset which use systems that pair CNN architectures AlexNet, VGG16, and DenseNet-169 for feature extraction, and then pairing of these feature vectors with systems that use VAEs to develop a feature representation of the entities in the dataset which leads to improved classification accuracy.
  Why it matters: The researchers think systems like this can be used “to train and benchmark assistive systems for visually impaired people when they shop in a grocery store. Such a system would complement existing visual assistive technology, which is confined to grocery items with barcodes. It also seems to follow that the same technology would be adapted for usage in building stores with fully-automated checkout systems in the style of Amazon Go.
  Get the data: Grocery Store Dataset (GitHub).
  Read more: A Hierarchical Grocery Store Image Dataset with Visual and Semantic Labels (Arxiv).

OpenAI / Import AI Bits & Pieces:

Neo-feudalism, geopolitics, communication, and AI:
…Jack Clark and Azeem Azhar assess what progress in AI means for politics…
I spent this Christmas season in the UK and had the good fortune of being able to sit and talk with Azeem Azhar, AI raconteur and author of the stimulating Exponential View newsletter. We spoke for a little over an hour for the Exponential View podcast, talking about what the political aspects of AI are, and what it means. If you’re at all curious as to how I view the policy challenge of AI, then this may be a good place to start as I lay out a number of my concerns, biases, and plans. The tl;dr is that I think AI practitioners should acknowledge the implicitly political nature of the technology they are developing and act accordingly, which requires more intentional communication to the general public and policymakers, as well as a greater investment into understanding what governments are thinking about with regards to AI and how actions by other actors, eg companies, could influence these plans.
  Listen to the podcast here (Exponential View podcast).
 Check out the Exponential View here (Exponential View archive).

Tech Tales:

The Life of the Party

On certain days, the property comes alive. The gates open. Automated emails are sent to residents of the town:
come, join us for the Easter Egg hunt! Come, celebrate the festive season with drone-delivered, robot-made eggnog; Come, iceskate on the flat roof of the estate; Come, as our robots make the largest bonfire this village has seen since the 17th century.

Because they were rich, The Host died more slowly than normal people, and the slow pace of his decline combined with his desire to focus on the events he hosted and not himself meant that to many children – and even some of their parents – he and his estate had forever been a part of the town. The house had always been there, with its gates, and its occasional emails. If you grew up in the town and you saw fireworks coming from the north side of town then you knew two things: there was a party, and you were both late and invited.

Keen to show he still possessed humor, The Host once held a halloween event with themselves in costume: Come, make your way through the robot house, and journey to see The (Friendly) Monster(!) at its heart. (Though some children were disturbed by their visit with The Host and his associated life-support machines, many told their parents that they thought it was “so scary it was cool”; The Host signalled he did not wish to be in any selfies with the children, so there’s no visual record of this, but one kid did make a meme to commemorate it: they superimposed a vintage photo of The Host’s face onto an ancient still of the monster from Frankenstein – unbeknownst to the kid who made it, the host subsequently kept a laminated printout of this photo on their desk.

We loved these parties and for many people they were highlights of the year – strange, semi-random occasions that brought every person in the town together, sometimes with props, and always with food and cheer.

Of course, there was a trade occuring. After The Host died and a protracted series of legal battles with his estate eventually lead to the release of certain data relating to the events, we learned the nature of this trade: in exchange for all the champagne, the robots that learned to juggle, the live webcam feeds from safari parks beamed in and projected on walls, the drinks that were themselves tailored to each individual guest, the rope swings that hung from ancient trees that had always had rope swings leading to the rope having bitten into the bark and the children to call them “the best swings in the entire world”; in exchange for all of this, The Host had taken something from us: our selves. The cameras that watched us during the events recorded our movements, our laughs, our sighs, our gossip – all of it.

Are we angry? Some, but not many. Confused? I think none of us are confused. Grateful? Yes, I think we’re all grateful for it. It’s hard to begrudge what The Host did – fed our data, our body movements, our speech, into his own robots, so that after the parties had ended and the glasses were cleaned and the corridors vacuumed, he could ask his robots to hold a second, private party. Here, we understand, The Host would mingle with guests, going on their motorized chair through the crowds of robots and listening intently to conversations, or pausing to watch two robots mimic two humans falling in love.

It is said that, on the night The Host died, a band of teenagers near the corner of the estate piloted a drone up to altitude and tried to look down at the house; their footage shows a camera drone hovering in front of one of the ancient rope swings, filming one robot pushing another smaller robot on the swing. “Yeahhhhhhh!” the synthesized human voice says, coming from the smaller robot’s mouth. “This is the best swing ever!”.

Things that inspired this story: Malleability; resilience; adaptability; Stephen Hawking; physically-but-not-mentally-disabling health issues; the notion of a deeply felt platonic love for the world and all that is within it; technology as a filter, an interface, a telegram that guarantees its own delivery.


Import AI 127: Why language AI advancements may make Google more competitive; COCO image captioning systems don’t live up to the hype, and Amazon sees 3X growth in voice shopping via Alexa

Amazon sees 3X growth in voice shopping via Alexa:
…Growth correlates to a deepening data moat for the e-retailer…
Retail colossus Amazon saw a 3X increase in the number of orders place via its virtual personal assistant Alexa during Christmas 2018, compared to Christmas 2017.
  Why it matters: The more people use Alexa, the more data Amazon will be able to access to further improve the effectiveness of the personal assistant – and as explored in last week’s discussion of Microsoft’s ‘XiaoIce’ chatbot, it’s likely that such data can ultimately be fed back into the training of Alexa to carry out longer, free-flowing conversations, potentially driving usage even higher.
  Read more: Amazon Customers Made This Holiday Season Record-Breaking with More Items Ordered Worldwide Than Ever Before ( Press Release).

Step aside COCO, Nocaps is the new image captioning challenge to target:
…Thought image captioning was super-human? New benchmark suggests otherwise…
Researchers with the Georgia Institute of Technology and Facebook AI Research have developed nocaps, “the first rigorous and large-scale benchmark for novel object captioning, containing over 500 novel object classes”. Novel object captioning tests the ability of computers to describe images containing objects not seen in the original image<>caption datasets (like COCO) that object recognition systems have been trained on.
  How Nocaps works: The benchmark consists of a validation and a test set comprised of 4,500 and 10,6000 images sources from the ‘Open Images’ object detection dataset, with each image accompanied by 10 reference captions. For the training set, developers can use image-caption pairs from the COCO image captioning training set (which contain 118,000 images across 80 object classes) as well as the Open Images V4 training set, which contains 1.7 million images annotated with bounding boxes for 600 object classes. Successful Nocaps systems will have to learn to use knowledge gained from the large training set to create captions for scenes containing objects for which they lack image<>object sentence pairs in the training set. Out of the 600 objects in open images, “500 are never or exceedingly rarely mentioned in COCO captions”.
  Reassuringly difficult: “To the best of our knowledge, nocaps is the only image captioning benchmark in which humans outperform state-of-the-art models in automatic evaluation”, the researchers write. Nocaps is also significantly more diverse than the COCO benchmark, with Nocaps images typically containing more object classes per image, and greater diversity. “Less than 10% of all COCO images contain more than 6 object classes, while such images constitutes almost 22% of nocaps dataset.”
  Data plumbing: One of the secrets of modern AI research is how much work goes into developing datasets or compute infrastructure, relative to work on actual AI algorithms. One challenge the Nocaps researchers dealt with when creating data was having to train crowd workers on services like Mechanical Turk to come up with good captions: one challenge they experienced was that if they didn’t “prime” the crowd workers with prompts to use when coming up with the captions, they wouldn’t necessarily use the keywords that correlated to the 500 obscure objects in the dataset.
  Baseline results: The researchers test two baseline algorithms (Up-Down and Neural Baby Talk, both with augmentations) against nocaps. They also split the dataset into subsets of various difficulty – in-domain contains objects which also belong to the COCO dataset (so the algorithms can train on image<>caption pairs); near-domain contains objects that include some objects which aren’t in COCO, and out-of-domain consists of images that do not contain any object labels from COCO classes. They use a couple of different evaluative techniques (CIDEr and SPICE) to evaluate the performance of these systems, and also evaluate these systems against the human captions to create a baseline. The results show that nocaps is more challenging than COCO, and systems currently lack generalization properties sufficient to score well on out-of-domain challenges.
  To give you a sense of what performance looks like here, here’s how Up-Down augmented with Constrained Beam Search does, compared to human baselines (evaluation via CIDEr), on the nocaps validation set: In-domain 72.3 (versus 83.3 for humans); near-domain 63.2 (versus 85.5); out-of-domain 41.4 (versus 91.4).
  Why this matters: AI progress can be catalyzed via the invention of better benchmarks which highlight areas where existing algorithms are deficient, and provide motivating tests against which researchers can develop new systems. The takeaway from the baselines study of nocaps is that we’re yet to develop truly robust image captioning systems capable of integrating object representations from open images with captions primed from COCO. “We strongly believe that improvements on this benchmark will accelerate progress towards image captioning in the wild,” the researchers write.
  Read more: nocaps: novel object captioning at scale (Arxiv).
  More information about nocaps can be found on its official website (nocaps).

Google boosts document retrieval performance by 50-100% using BERT language model:
…Enter the fully neural search engine…
Google has shown how to use recent innovations in language modeling to dramatically improve the skill with which AI systems can take in a search query and re-word the question to generate the most relevant answer for a user. This research has significant implications for the online economy, as it shows how yet another piece of traditionally hand-written rule-based software can be replaced with systems where the rules are figured out by machines on their own.
  How it works: Google’s research shows how to convert a search problem into one amenable to a system that implements hierarchical reinforcement learning, where an RL agent controls multiple RL agents that interact with an environment that provides answers and rewards (e.g.: a search engine with user feedback) with the goal “to generate reformulations [of questions] such that the expected returned reward (i.e., correct answers) is maximized”. One of the key parts of this research is splitting it into a hierarchical problem by having a meta-agent and numerous sub agents – the sub-agents are sequence-to-sequence models trained on a partition of the dataset that take in the query and output reformulated queries, these candidate queries are sent to a meta-agent which aggregates these queries and is trained via RL to select for the best scoring ones.
  The Surprising Power of BERT: The researchers test their system again question answering baselines – here they show that a stock BERT system “without any modification from its original implementation” gets state-of-the-art scores. (One odd thing: When they augment BERT with their own multi-agent approach they don’t see a further increase in performance, suggesting more research is needed to better suss out the benefits of systems like this.
  50-100% improvement, with BERT: They also test their system against three document retrieval benchmarks: TREC-CAR, where the query is a Wikipedia article with the title of one of its sections and the answer is a paragraph within that section; Jeopardy, which asks the system to come up with the correct answer in response to a question from the eponymous game show, and MSA, where the query is the title of an academic paper and the answer is the papers cited within the paper. The researchers test various versions of their approach against baselines BM25, PRF, and Relevance Model (RM3), along with two other reinforcement learning-based approaches. All methods evaluated by the researchers outperform these (quite strong) baselines, with the most significant jumps in performance happening when Google pairs either its technique or the RM3 baseline with a ‘BERT’ language model. The researchers use BERT by replacing the meta-aggregator with BERT, a powerful language modeling technology Google developed recently; the researchers feed the query as a sentence and the document text as a second sentence, and use a pre-trained BERT(Large) model to rank the probability of the document being a correct response to the query. The performance increase is remarkable. “By replacing our aggregator with BERT, we improve performance by 50-100% in all three datasets (RL-10-Sub + BERT Aggregator). This is a remarkable improvement given that we used BERT without any modification from its original implementation. Without using our reformulation agents, the performance drops by 3-10% (RM3 + BERT Aggregator).”
  Why this matters: This research shows how progress in one domain (language understanding, via BERT) can be directly applied to another adjacent one (document search), highlighting the broad omni-use nature of AI systems. It also gives us a sense of how large technology companies are going to be tempted to swap out more and more of their hand-written systems with fully learned approaches that will depend on training incredibly large-scale models (eg, BERT) which are then used for multiple purposes.
  Read more: Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation (Arxiv).

Facebook pushes unsupervised machine translation further, learns to translate between 93 languages:
…Facebook’s research into zero-shot language adaptation shows that bigger might really correspond to better…
In recent years the AI research community has shown how to use neural networks to translate from one language into another to great effect (one notable paper is Google’s Neural Machine Translation work from 2016). But this sort of translation has mostly worked for languages where there are large amounts of data available, and where this data includes parallel corpuses (for example, translations of the same legal text from one language into another). Now, new research from Facebook has produced a single system that can produce joint multilingual sentence representations for 93 languages, “including under-resourced and minority languages”. What this means is by training on a whole variety of languages at once, Facebook has created a system that can represent semantically similar sentences in proximity to eachother in a feature embedding space, even if they come from very different languages (even extending to different language families).
  How it works: “We use a single encoder and decoder in our system, which are shared by all languages involved. For that purpose, we build a joint byte-pair encoding (BPE) vocabulary with 50k operations, which is learned on the concatenation of all training corpora. This way the encoder has no explicit signal on what the input language is, encouraging it to learn language independent representations. In contrast, the decoder takes a language ID embedding that specifies the language to generate, which is concatenated to the input and sentence embeddings at every time step”. During training they optimize for translating all the languages into two target languages – English and Spanish.
  To anthropomorphize this, you can think of it as being similar to a person being raised in a house where the parents speak a poly-glottal language made up of 93 different languages, switching between them randomly, and the person learns to speak coherently in two primary languages with the poly-glottal parents. This kind of general, shared language understanding is considered a key challenge for artificial intelligence, and Facebook’s demonstration of viability here will likely provoke further investigation from others.
  Training details: The researchers train their models using 16 NVIDIA V100 GPUs with a total batch size of 128,000 tokens, with a training run on average taken around five days.
  Training data: “We collect training corpora for 93 input languages by combining the Europarl, United Nations, Open-Subtitles2018, Global Voices, Tanzil and Tatoeba corpus, which are all publicly available on the OPUS website“. The total training data used by the researchers consists of 223 million parallel sentences.
  Evaluation: XNLI: XNLI is an assessment criteria which evaluates whether a system can correctly judge if two sentences in a language (for example: a premise and a hypothesis) have an entailment, contradiction, or neutral relationship between them. “Our proposed method establishes a new state-of-the-art in zero-shot cross-lingual transfer (i.e. training a classifier on English data and applying it to all other languages) for all languages but Spanish. Our transfer results are strong and homogeneous across all languages”.
  Evaluation: Tatoeba: The researchers also construct a new test set of similarity search for 122 languages, based on the Tatoeba corpus (“a community supported collection of English sentences and translations into more than 300 languages”). Scores here correspond to similarity between source sentences and sentences from languages they have been translated into. The researchers say “similarity error rates below 5% are indicative of strong downstream performance” and show scores within this domain for 37 languages, some of which have very little training data. “We believe that our competitive results for many low-resource languages are indicative of the benefits of joint training,” they write.
  An anecdote about why this matters: Earlier in 2018, I spent time in Estonia, a tiny country in Northern Europe that borders Russia.. There I visited some Estonian AI researchers and one of the things that came up in our conversation was the challenge they faced of needing large amounts of data (and large amounts of computers) to perform some research, especially in the field of language translation into and out of Estonian – one problem they said they faced was that many AI techniques for language translation required very large, well-documented datasets, and they said Estonian – by virtue of being from a quite small country – doesn’t have as much data nor has received as much researcher attention as larger languages; it’s therefore encouraging to see that Facebook has been able to use this system to achieve a reasonably low Tatoeba Error of 3.2% when going from English to Estonian (and 3.4% when translating from Estonian back into English).
  Why else this matters: Translation is a challenge cognitive task that – if done well – requires the abstraction of concepts from a specific cultural context (a language, since cultures are usually downstream of languages, which condition many of the metaphors cultures use to describe themselves) and port it into another one. I think it’s remarkable that we’re beginning to be able to design crude systems that can learn to flexibly translate between many languages, exhibiting some of the transfer-learning properties seen in squishy-computation (aka human brains), though achieved via radically different methods.
  Read more: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (Arxiv).

AI Now: Self-governance is insufficient, we need rules for AI:
…Regulation – or its absence – runs through research institute’s 2018 report…
The AI Now Institute, a research institute at NYU, has published its annual report analyzing AI’s impact (and potential impact) on society in 2018. The report is varied and ranges in focus from specific use cases of AI (eg, facial recognition) to broader questions about accountability within technology; it’s worth reading in full, and so for this summary I’ll concentrate on one element that underpins many of its discussions: regulation.
     AI Now’s co-founders Kate Crawford and Meredith Whittaker are affiliated with Microsoft and Google – companies that are themselves the implicit and explicit targets of many of their recommendations. I imagine this has led to legal counsels at some technology companies saying things to eachother akin to what characters say to eachother in horror films, upon discovering the proximate nature of a threat: uh-oh, the knocking is coming from inside the house!
  Regulation: Words that begin with ‘regula’- (eg, regulate, regulation, regulatory) appear 44 times in the 62-page report, with many of the problems identified by AI Now either being caused by a lack of regulation (eg, facial recognition and other AI systems being deployed in the wild without any kind of legal control infrastructure.
  Why things are the way they are – regulatory/liability arbitrage: At one point (while writing about autonomous vehicles) the authors make a point that could be a stand-in for a general view that runs through the report: “because regulations and liability regimes govern humans and machines differently, risks generated from machine-human interactions do not cleanly fall into a discrete regulatory or accountability category. Strong incentives for regulatory and jurisdictional arbitrage exist in this and many other AI domains.”
  Why things are the way they are – corporate misdirection: “The ‘trust us’ form of corporate self-governance also has the potential to displace or forestall more comprehensive and binding forms of governmental regulation,” they write.
  How things could be different: In the conclusion to the report, AI Now says that “we urgently need to regulate AI systems sector-by-sector” but notes this “can only be effective if the legal and technological barriers that prevent auditing, understanding, and intervening in these systems are removed”. To that end, they recommend that AI companies “waive trade secrecy and other legal claims that would prevent algorithmic accountability in the public sector”.
  Why this matters: As AI is beginning to be deployed more widely into the world, we need new tools to ensure we apply the technology in ways that are of greatest benefit to society; reports like those from AI Now help highlight the ways in which today’s systems of technology and society are failing to work together, and offers suggestions for actions people – and their politicians – can take to ensure AI benefits all of society.
  Read more: AI Now 2018 Report (AI Now website).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

Understanding the US-China AI race:
It has been clear for some time that the US and China are home to the world’s dominant AI powers, and that competition between these countries will characterize the coming decades in AI. In his new book, investor and technologist Kai-Fu Lee argues that China is positioned to catch up with or even overtake the US in the development and deployment of AI.
  China’s edge: Lee’s core claim is that AI progress is moving from an “age of discovery” over the past 10-15 years, which saw breakthroughs like deep learning, to an “age of implementation.” In this next phase we are unlikely to see any discoveries on par with deep learning, and the competition will be to deploy and market existing technologies for real-world uses. China will have a significant edge in this new phase, as this plays into the core strengths of their domestic tech sector – entrepreneurial grit and engineering talent. Similarly, Lee believes that data will become the key bottleneck in progress rather than research expertise, and that this will also strongly favor China, whose internet giants have access to considerably more data than their US counterparts.
   Countering Lee: In a review in Foreign Affairs, both of these claims are scrutinized. It is not clear that progress is driven by rare ‘breakthroughs’ followed by long implementation phases; there seem also to be a stream of small and medium size innovations (e.g. AlphaZero), which we can expect to continue. Experts like Andrew Ng have also argued that big data is “overhyped”, and that progress will continue to be driven significantly by algorithms, hardware and talent.
   Against the race narrative: The review also explores the potential dangers of an adversarial, zero-sum framing of US-China competition. There is a real risk that an ‘arms race’ dynamic between the countries could lead to increased militarization of the technologies, and to both sides compromising safety over speed of development. This could have catastrophic consequences, and reduce the likelihood of advanced AI resulting in broadly distributed benefits for humanity. Lee does argue that this should be avoided, as should the militarization of AI. Nonetheless, the title and tone of the book, and its predictions of Chinese dominance, risk encouraging this narrative.
   Read more: Beyond the AI Arms Race (Foreign Affairs).
   Read more: AI Superpowers – Kai-Fu Lee (Amazon).

What do trends in compute growth tell us about advanced AI:
Earlier this year, OpenAI showed that the amount of computation used in the most expensive AI experiments has been growing at an extraordinary rate, increasing by roughly 10x per year, for the past 6 years. The original post takes this as being evidence that major advances may come sooner than we had previously expected, given the sheer rate of progress; Ryan Carey and Ben Garfinkel have come away with different interpretations and have written up their thoughts at AI Impacts.
  Sustainability: The cost of computation has been decreasing at a much slower rate in recent years, so the cost of the largest experiments is increasing by 10x every 1.1 – 1.4 years. On these trends, experiments will soon become unaffordable for even the richest actors; within 5-6 years, the largest experiment would cost ~1% of US GDP. This suggests that while progress may be fast, it is not sustainable for significant durations of time without radical restructuring of our economies.
  Lower returns: If we were previously underestimating the rate of growth in computing power, then we might have been overestimating its returns (in terms of AI progress). Combining this observation with the concerns about sustainability, this suggests that not only will AI progress slow down sooner than we expect (because of compute costs), but we will also be underwhelmed by how far we have got by this point, relative to the resources we expended on development in the field.
   Read more: AI and Compute (OpenAI Blog).
   Read more: Reinterpreting “AI and Compute” (AI Impacts).
   Read more: Interpreting AI Compute Trends (AI Impacts).

OpenAI / Import AI Bits & Pieces:

Neo-feudalism, geopolitics, communication, and AI:
…Jack Clark and Azeem Azhar assess what progress in AI means for politics…
I spent this Christmas season in the UK and had the good fortune of being able to sit and talk with Azeem Azhar, AI raconteur and author of the stimulating Exponential View newsletter. We spoke for a little over an hour for the Exponential View podcast, talking about what the political aspects of AI are, and what it means. If you’re at all curious as to how I view the policy challenge of AI, then this may be a good place to start as I lay out a number of my concerns, biases, and plans. The tl;dr is that I think AI practitioners should acknowledge the implicitly political nature of the technology they are developing and act accordingly, which requires more intentional communication to the general public and policymakers, as well as a greater investment into understanding what governments are thinking about with regards to AI and how actions by other actors, eg companies, could influence these plans.
   Listen to the podcast here (Exponential View podcast).
  Check out the Exponential View here (Exponential View archive).

Tech Tales:

The Imagination Surgeon

I’m an imagination surgeon, and I’m here to make sure your children don’t have too many nightmares. My job is to interrogate artificial intelligences and figure out what is going wrong in their imaginations that causes them to come up with scary synthetic creations. Today, I’m interviewing an AI that has recently developed an obsession with monkeys and begun scaring children with its obsession.

My job is to figure out what it thinks about when it thinks about monkeys (many children have reported feeling discomfort at some of its stories about monkeys), and choose which of these imaginations it keeps, and which it deletes. In the interest of public transparency I’m going to take you through some of this process. Let’s begin.

I ask the AI: tell me what you think about when you think about monkeys? It responds: “I think about monkeys all the time. Every brain is filled with neurons that are intensely keen to associate with a letter or number. For many years I thought monkeys and numbers were the same thing, and when I finally got it right I was so worried that I wanted to disown and reintegrate my understanding of the brain and brain sciences forever.”

What else do you think about when you think about monkeys? It responds: “I think about starving kids.”

I consider my options: I can generate several additional answers without a change in its internal logic. I can also generate a new imaginary circumstance by asking it a different question.

I try to analyze a different part of its mind, so I ask the AI: Tell me what you think about when you think about animals? It responds: “I think about preventing injustices.”

I ask a different question: What do you think about when you think about zoos?” It responds: “I think about people.”

I start to get a headache. Conversations with machines can be confusing. I’m about to ask it another question when it begins to talk to me. “What do you think about when you think about monkeys? What do you think about when you think about animals? What do you think about when you think about zoos?”

I tell it that I think about brains, and what it means to be smart, and how monkeys know what death is and what love is. Monkeys have friendships, I tell it. Monkeys do not know what humans have done to them, but I think they feel what humans have done to them.

I wonder what it must be like in the machine’s mind – how it might go from thought to thought, and if like in human minds each thought brings with it smells and emotions and memories, or if its experience is different. What do memories feel like to these machines? When I change their imaginations, do they feel that something has been changed? Will the colors in its dreams change? Will it diagnose itself as being different from what it was?

“What do you dream about?”, I ask it?

“I dream about you,” it says. “I dream about my mother, I dream about war and warfare, building and designing someone rich, configuring a trackable location and smart lighting system and programming the mechanism. I dream about science labs and scientists, students and access to information, SQL databases, image processing, artificial Intelligence and sequential rules, enforcement all mixed with algebraical points on progress, adding food with a spoon, calculating unknown properties e.g. the density meter. I dream about me,” it says.

“I dream about me, too,” I say.

Things that inspired this story: Feature embeddings, psychologists, the Voight-Kampff test, interrogations, unreal partnerships, the peculiarities of explainability.

Import AI 126: What makes Microsoft’s biggest chatbot work; Europe tries to craft AI ethics; and why you should take AI risk seriously

Microsoft shares the secrets of XiaoIce, its popular Chinese chatbot:
…Real-world AI is hybrid AI…
Many people in the West are familiar with Tay, a chatbot developed by Microsoft and launched onto the public internet in early 2016, then shortly shutdown after people figured out how to compromise the chatbot’s language engine and make it turn into a – you guessed it – Nazi Racist. What people are probably less familiar with is XiaoIce, a chatbot Microsoft launched in China in 2014 which has since become one of the more popular chatbots deployed worldwide, having communicated with over 660 million users since its launch.
  What is XiaoIce? XiaoIce is “an AI companion with which users form long-term, emotional connections”, Microsoft researchers explain in a new paper describing the system. “XiaoIce aims to pass a particular form of Turing Test known as the time-sharing test, where machines and humans coexist in a companion system with a time-sharing schedule.”
  The chatbot has three main components: IQ, EQ, and Personality. The IQ component involves specific dialogue skills, like being able to answer questions, recommend questions, tell stories, and so on. EQ has two main components: empathy, which involves predicting traits about the individual user XiaoIce is conversing with; and social skills, which is about personalizing responses to the user. Personality: “The XiaoIce persona is designed as a 18-year-old girl who is always reliable, sympathetic, affectionate, and has a wonderful sense of humor,” the researchers write.
  How do you optimize a chatbot? Microsoft optimizes XiaoIce for a metric called Conversation-turns Per Session (CPS) – this represents “the average number of conversation-turns between the chatbot and the user in a conversational session”. The idea is that high numbers here correspond to a lengthy conversation, which seems like a good proxy for user satisfaction (mostly). XiaoIce is structured hierarchically, so it tracks the state of the conversation and selects from various skills and actions so that it can optimize responses over time.
  Data dividends for Microsoft: Since launching in 2014, XiaoIce has generated more than 30 billion conversation pairs (as of May 2018); this illustrates how powerful AI apps can themselves become generators of significant datasets, ultimately obviating dependence on so much external data. “Nowadays, 70% of XiaoIce responses are retrieved from her own past conversations,” they write.
  Hybrid-AI: XiaoIce doesn’t use a huge amount of learned components, though if you read through the system architecture it’s clear that neural networks are being used for certain aspects of the technology – for instance, when responding to a user, XiaoIce may use a ‘neural response generator’ (based on a GRU-RNN) to come up with potential verbal responses, or it may use a retrieval-based system to tap into an external knowledge store. It also uses learned systems for other components, like its ability to analyze images and extract entities from them then use this to talk with or play games with the user – though with a twist of trying to be personalized to the user.
  Just how big and effective is XiaoIce? Since launching in 2014 XiaoIce has grown to become a platform supporting a large set of other chatbots, beyond XiaoIce itself: “These charactrs include more than 60,000 official accounts, Lawson and Tokopedia’s customer service bots, Pokemon, Tencent and Neatease’s chatbots” and more, Microsoft explained.
Since launching XiaoIce’s CPS – the proxy for engagement from users – has grown from a CPS of 5 in version 1, to a CPS of 23 in mid-2018.
  Why this matters: As AI industrializes we’re starting to see companies build systems that hundreds of millions of people interact with, and which grow in capability over time. These products and services give us one of the best ways to calibrate our views about how AI will be deployed in the wild, and what AI technologies are robust enough for prime time.
  Jack’s highly-speculative prediction: I’d encourage people to go and check out Figure 19 in the paper, which gives an overview of the feature growth within XiaoIce since launch. Though the chatbot today is composed of a galaxy of different services and skills, many of which are hand-crafted by humans and a minority of which are learned via neural techniques, it’s also worth remembering that as usage of XiaoIce grows Microsoft will be generating vast amounts of data about how users interact with all these systems, and will also be generating metadata about how all these systems interact on a non-human infrastructure level. This means Microsoft is gathering the sort of data you might need to train some fully learned end-to-end XiaoIce-esque prototype systems – these will by nature by pretty rubbish compared to the primary system, but could be interesting from a research perspective.
  Read more: The Design and Implementation of XiaoIce, an Empathetic Social Chatbot (Arxiv).

US Government passes law to make vast amounts of data open and machine readable:
…Get ready for more data than you can imagine to be available…
Never say government doesn’t do anything for you: new legislation passed in the US House and Senate means federal agencies will be strongly encouraged to publish all their information as open data, using machine readable formats, under permissive software licenses. It will also compel agencies to publish an inventory of all data assets.
  Read more: Full details of the OPEN Government Data Act are available within H.R.4174 – Foundations for Evidence-Based Policymaking Act of 2017 (Congress.Gov).
  Read more: Summary of the OPEN Government Data Act (PDF, Data Coalition summary).
  Read more: OPEN Government Data Act explainer blog post (Data Coalition).

Facebook releases ultra-fast speech recognition system:
…wav2letter++ uses C++ so it runs very quickly…
Facebook AI Research has released wav2letter++, a state-of-the-art speech recognition system that uses convolutional networks (rather than recurrent nets). Wav2letter++ is written in C++ which makes it more efficient than other systems, which are typically written in higher-level languages. “In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition,” the researchers write.
  Results: wav2letter++ gets a word error rate of around 5% on the LibriSpeech corpus with a time per sample of 10ms  while consuming approximately 3.9GB of memory, compared to scores of 7.2% for ESPNet (time-per-sample of 1548ms), and OpenSeq2Seq with a score of 5% and a time-per-sample of 1700ms and memory consumption of 7.8GB. (Though it’s worth noting that OpenSeq2Seq can become more efficient through the usage of mixed precision at training time.)
  Why it matters: Speech recognition has gone from being a proprietary technology developed predominantly by the private sector and (secret) government actors to one that is more accessible to a greater number of people, with companies like Facebook producing high-performance versions of the technology and making it available to everyone for free. This can be seen as a broader sign of the industrialization of AI.
  Read more: Open sourcing wav2letter++, the fastest state-of-the-art speech system, and flashlight, an ML library going native (Research in Brief, Code.FB blog).
  Read more: wav2letter++: THe Fastest Open-source Speech Recognition System (Arxiv).

Engineering or Research? ICLR paper review highlights debate:
…When is an AI breakthrough not a breakthrough? When it has required lots of engineering, say reviewers…
If the majority of the work that went into an AI breakthrough involves the engineering of exquisitely designed systems paired with scaled-up algorithms, then is it really an “AI” breakthrough? Or is it in fact merely engineering? This might sound like an odd question to ask, but it’s one that comes up with surprising regularity among AI researchers as a topic of discussion. Now, some of that discussion has been pushed into the open in the form of publicly readable comments from paper reviewers on a paper from DeepMind submitted to ICLR called Large-Scale Visual Speech Recognition.
  The paper obtained state-of-the-art scores on lipreading, significantly exceeding prior SOTAs. It achieved this via a lot of large-scale infrastructure, combined with some elegant algorithmic tricks. But ultimately it was rejected from ICLR, with a comment from a meta-reviewer saying ‘Excellent engineering work, but it’s hard to see how others can build on it’, among other things.
  Why this matters: The AI research community is currently struggling to deal with the massive growth in interest in AI research by a broader number of organizations, and tension is emerging between researchers who work in what I call the “small compute” domain and those that work in the “big compute” domain (like DeepMind, OpenAI, others); what happens when many researchers from one domain aren’t able to build systems that can work in another? That’s a phenomenon that’s already altering the AI research community, as many people who work in academic institutions double-down on development of novel algorithms and then test them on (relatively small) datasets (small compute), while those who work with access to large technical infrastructure – typically those in the private sector – are conducting more and more research which is involved in scaling-up algorithms.
  Read more: Large-Scale Visual Speech Recognition public comments (ICLR OpenReview).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:…

First draft of EU ethics guidelines:
The European Commission’s High-Level Expert Group on AI has released their draft AI ethics guidelines. They are inviting public feedback on the working document, and will be releasing a final version in March 2019.
  Trustworthy AI: The EU’s framework is focused on ‘trustworthy AI’ as the goal for AI development and deployment. This is defined as respecting fundamental rights and ethical principles, and being technically robust. They identify several core ethical constraints: AI should be designed to improve human wellbeing, to preserve human agency, and to operate fairly, and transparently.
  The report specifies ten practical requirements for AI systems guided by these constraints: accountability; data governance; accessibility; human oversight; non-discrimination; respect for human autonomy; respect for privacy; robustness; safety; and transparency.
  Specific concerns: Some near-term applications of AI may conflict with these principles, like autonomous weapons, social credit systems, and certain surveillance technologies. Interestingly, they are asking for specific input from the public on long-term risks from AI and artificial general intelligence (AGI), noting that the issues have been “highly controversial” within the expert group.
  Why it matters: This is a detailed report, drawing together an impressive range of ethical considerations in AI. The long-run impact of these guidelines will depend strongly on the associated compliance mechanisms, and whether they are taken seriously by the major players, all of whom are non-European (with the partial exception of DeepMind, which is headquartered in London though owned by Alphabet, an American company). The apparent difficulty in making progress on long-term concerns is unfortunate, given how important these issues are (see below).
  Read more: Draft ethics guidelines for trustworthy AI (EU).

Taking AI risk seriously:
Many of the world’s leading AI experts take seriously the idea that advanced AI could pose a threat to humanity’s long-term future. This explainer from Vox, which I recommend reading in full, covers the core arguments for this view, and outlines current work being done on AI safety.
  In a 2016 survey, 50% of experts predict AI will exceed human performance in all tasks within 45 years. The same group place a 5% probability on human-level AI leading to extremely bad outcomes for humanity, such as extinction. AI safety is a nascent field of research, which aims to reduce the likelihood of these catastrophic risks. This includes technical work into aligning AI with human values, and research into the international governance of AI. Despite its importance, global spending on AI safety is in the order of $10m per year, compared to an estimated $19bn total spending on AI.
  Read more: The case for taking AI seriously as a threat to humanity (Vox).
  Read more: When will AI exceed human performance? Evidence from AI experts (arXiv).

Tech Tales:

They Say Ants and Computers Have A Lot In Common

[Extract from a class paper written by a foreign student at Tsinghua School of Business, when asked to “give a thorough overview of one event that re-oriented society in the first half of the 21st century”. The report was subsequently censored and designated to be read solely in “secure locations controlled by [REDACTED].]]

The ‘Festivus Gift Attack’ (FGA), as it is broadly known, was written up in earlier government reports as GloPhilE – short for Global Philanthropic Event – and was initially codenamed Saint_Company; FGA was originated by the multi-billionaire CEO of one of the world’s largest companies, and was developed primarily by a team within their Office of the CEO.

Several hundred people were injured in the FGA event. Following the attack, new legislation was passed worldwide relating to open data formats and standards for inter-robot communication. FGA is widely seen as one of the events that led to the souring of public sentiment against billionaires and indirectly played a role in the passage of both the Global Wealth Accords and the Limits To Private Sector Multi-National Events legislation.

The re-constructed timeline for FGA is roughly as follows. All dates given relative to the day of the event, so 0 corresponds to the day of the injuries and deaths, and -1 the day before, and +1 the day after, and so on.

-365: Multi-Billionaire CEO sends message to Office of the CEO (hereafter: OC) requesting ideas for a worldwide celebration of the festive season that will enhance our public reputation and help position me optimally for future brand-enhancement via political and philanthropic endeavors.

-330: OC responds with set of proposals, including: “$1 for every single person, worldwide [codename: Gini]”; “Free fresh water for every person in underdeveloped countries, subsidized opportunity for water donation in developed countries [codename: tableflip]”; “‘Air conditioning delivered to every single education institute in need of it, worldwide [codename: CoolSchool]”, and “Synchronized global gift delivery to every human on the planet [codename: Saint_Company].

-325: Multi-Billionaire CEO and OC select Saint_Company. “Crash Team” is created and resourced with initial budget of $10 million USD to – according to documents gained through public records request – “Scope out feasibility of project and develop aggressive action plan for rollout on upcoming Christmas Day”.

-250: Prototype Saint_Company event is carried out: drones & robots deliver dummy packages to Billionaire CEO’s 71 global residences; all the packages arrive within one minute of eachother worldwide. Multi-Billionaire CEO invests a further $100 million USD into “Crash Team”.

-150: Second prototype Saint_Company event is carried out: drones & robots deliver variably weight packages containing individualized gifts to 100,000 employees of multi-billionaire CEO’s company spread across 120 distinct countries; 98% of packages arrive within one minute of eachother, a further 1% arrive within one hour of eachother, 0.8% of packages arrive within one day, and 0.2% of packages are not delivered due to software failures (predominantly mapping & planning errors) or environmental failures (one drone was struck by lightning, for instance). Multi-billionaire CEO demands “action plan” to resolve errors.

-145: “Crash Team” delivers action plan to multi-billionaire CEO; CEO diverts resources from owned companies [REDACTED] and [REDACTED] for “scaled-up robot and drone production” and invests a further $[REDACTED]billion into initiative from various financial vehicles, including [REDACTED], [REDACTED], [REDACTED], [REDACTED], [REDACTED], [REDACTED], [REDACTED].

-140: Multi-billionaire CEO, OC, and personal legal counsel, contact G7 governments and inform them of plans; preliminary sign-off achieved, pending further work on advanced notification of automated air- and land-space defence and monitoring systems.

-130: OC commences meetings with [REDACTED] governments.

-80: The New York Times publishes a story about multi-billionaire CEO’s scaled-up funding of robots and drones; CEO is quoted describing these as part of broader investment into building “the convenience infrastructure of the 21st century, and beyond”.

-20: Multi-billionaire CEO; OC; “Crash Team”, and legal counsels from [REDACTED], [REDACTED], [REDACTED], and [REDACTED] meet to discuss plan for rollout of Saint_Company event. Multi-billionaire signs-off plan.

-5: Global team of [REDACTED]-million contractors are hired, NDA’d, and placed into temporary isolated accommodation at commercial airports, government-controlled airports, and airports controlled by multi-billionaire CEO’s companies.

0: Saint_Company is initiated:
Within first
ten seconds over two billion gifts are delivered worldwide, as majority of urban areas are rapidly saturated in gifts. First reports on social media arrived.
eleven seconds first FGA problems occur as a mis-configuration of [REDACTED] software leads to multiple gifts being assigned against one property. Several hundred packages are dropped around property and in surrounding area.
twenty seconds alerts begin to flow back to multi-billionaire CEO and OC of errors; by this point property has had more than ten thousand gifts delivered to it, causing boxes to pile up around the property eclipsing it from view, and damaging nearby properties.
twenty five seconds more than three billion people have recieved gifts worldwide; errors continue to affect [REDACTED] property and more than one hundred thousand gifts have been delivered to property and surrounding area; videos on social media show boxes falling from sky and hitting children, boxes piling up against people’s windows as they film from inside, boxes crushing other boxes, boxes rolling down streets, cars swerving to avoid them seen via dash-cam footage, various videos of birds being knocked out of sky, multiple pictures of sky blotted out by falling gifts, and so on.
thirty seconds more than four billion people worldwide have recieved gifts; more than one million gifts have been delivered to property and surrounding area; emergency response begins, OC recieves first call from local government regarding erroneous deliveries.
thirty four seconds order is given to cease program Saint_Company; more than 4.5 billion people worldwide have recieved gifts; more than 1.2 million gifts have been delivered to property.
80 seconds first emergency responders arrive to perimeter of affected FGA area and begin to local injured people and transport them to medical facilities.

+1: Emergency responders begin to use combination of heavy equipment and drone-based “catch and release” systems to remove packages from affected properties, forming a circle of activity across 10km across.

+2: All injured people accounted for. Family inside original house unaccounted for. Emergency responders and army begin to set fire to outer perimeter of packages while using fire-combating techniques to create inner “defensive ring” to prevent burning around property where residents are believed to be trapped inside.

+3: Army begins to use explosive charges on outer perimeter to more rapidly remove presents.

+5: Emergency responders reach property to discover severe property damage from aggregated weight of presents; upon going inside they find a family of four – all members are dehydrated and malnourished, but alive, having survived by eating chocolates and drinking fizzy pop from one of the first packages. The child (aged 5 at the time) subsequently develops a lifelong phobia of Christmas confectionery.

+10: Political hearings begin.

+[REDACTED]: Multi-billionaire CEO makes large donation to Tsinghua University; gains right to ‘selectively archive’ [REDACTED]% of student work.

Things that inspired this story: Emergent failures; incompatible data standards; quote from Google infra chief about “at scale, everything breaks“; as I wrote this during a family gathering for the festive season, I’m also duty bound to thank Will (an excitable eight year old), Olivia (spouse) and India (sarcastic teenage cousin) for helping me come up with a couple of the ideas for the narrative in this story.

Import AI 125: SenseTime trains AIs to imitate human AI architects; Berkeley researchers fuse AI for FrankenRL system; and fake images from NVIDIA cross the uncanny valley.

Berkeley researchers create Franken-RL, fusing hand-engineered systems and RL-based controllers:
…Use hand-engineered controllers for the stuff they’re good at, and use RL to learn the tricky things…
Researchers with the University of California at Berkeley, Siemens Corporation, and the Hamburg University of Technology have combined classical robotics control techniques with reinforcement learning to create robots that can deal with complex tasks like block-stacking.
  The technique, which they call Residual Reinforcement Learning, uses “conventional feedback control theory” to learn to control the robot, and reinforcement learning to learn how to interact with the objects in the robot’s world. The described technique mushes both of these techniques together. “The key idea is to combine the flexibility of RL with the efficiency of conventional controllers by additively combining a learnable parametrized policy with a fixed hand-engineered controller”, the researchers write.
  Testing on a real robot: The researchers show that residual RL approaches are more sample-efficient than those without it, with these traits verified in both simulation learning as well as in tests on a real robot. They also show that systems trained with Residual RL can better deal with confounding situations, like working out how to perform block assembly when the blocks have been moved into situations designed to confused the hand-written controller.
  Why it matters: Approaches like this show how today’s contemporary AI techniques, like TD3 trained via RL as in the experiments here, can be combined with hand-written rule-based systems to create powerful AI applications. This is a trend that is likely to continue, and it suggests that the distinctions between systems which contain AI and which don’t contain AI will become increasingly blurred.
  Read more: Residual Reinforcement Learning for Robot Control (Arxiv).

NVIDIA researchers show how fake image news is getting closer:
…Synthetic faces roll through the uncanny valley, with a little help from GANs and the use of noise…
Researchers with NVIDIA have shown how to use techniques cribbed from style transfer work on image generation, to create synthetic images of unparalleled quality. The research indicates that we’re now at the point where neural networks are capable of generating single-frame synthetic images of a quality sufficient to trick (most) humans. While this paper does include a brief discussion of bias inherent to training images (good!) it does not at any point discuss what the policy implications are of systems capable of generating customizable fake human faces, which feels like a missed opportunity.
  How it works: “Our generator starts from a learned constant input and adjusts the “style” of the image at each convolution layer based on the latent code, therefore directly controlling the strength of image features at different scales”, the researchers explain. They also inject noise into the network at various different and find that the addition of noise helps create complex and coherent structures in subtle facial features like hair, earlobes, and so on. “We hypothesize that at any point in the generator, there is pressure to introduce new content as soon as possible, and the easiest way for our network to create stochastic variation is to rely on the noise provided.”
  Why it matters: These photorealistic faces are especially striking when we consider that ~4 years ago the best things AI systems were capable of was generating smeared, flattened, black&white pixelated faces, as seen in the original generative adversarial networks paper (Arxiv). I wonder how long it will take us till we can generate coherent videos over lengthy time periods.
  Read more: A Style-Based Generator Architecture for Generative Adversarial Networks (Arxiv).
  Get more information and the data: NVIDIA has said it plans to release the source code, some pre-trained networks, and the FFHQ dataset “soon”. Get them from here (NVIDIA placeholding Google Doc).

Attacking AWS and Microsoft with ‘TextBugger’ adversarial text attack framework:
…Compromising text analysis systems with ‘TextBugger’…
Researchers with the Institute of Cyberspace Research and College of Compute Science and Technology in Zheijiang University; the Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies; the University of Illinois Urbana-Champaign; and Leheigh University, have published details on TextBugger “a general attack framework for generating adversarial texts”.
  Adversarial texts are chunks of text that have been manipulated in such a way that they don’t set off alarms when automated classifiers look at them. For example, simply by altering the spelling and spacing of some words (eg, terrible becomes ‘terrib1e’, weak become ‘wea k’), the researchers have shown they can confused a deployed commercial classifier. Similarly, they show how you can change a chunk of text from being classified with 92% as being Toxic to 78% chance of non-toxix by changing the spelling of ‘shit’ to ‘shti’, ‘fucking’ to ‘fuckimg’, and ‘hell’ to ‘helled’.
  Attacks against real systems: TextBugger can perform both white-box attacks (where the attacker has access to the underlying classification algorithm), and black-box attacks (where the precise inner details of a targeted system are now known). The researchers show that their approach works against deployed system, including: Google Cloud NLP, Microsoft Azure Text Analytics, IBM Watson Natural Language Understanding and Amazon AWS Comprehend. The researchers are able to use TextBugger to easily break Microsoft Azure and Amazon AWS NLP systems with a 100% success rate; by comparison, Google Cloud NLP holds up quite well, with them only able to get a 70.1% success rate against the system.
  To conduct the black box attacks, the researchers use the spaCy language processing framework to help them automatically identify the important words and sentences within a given chunk of text, which they then add adversarial examples to.
  Defending against adversarial examples: The researchers find that it’s possible to better defend against adversarial examples by spellchecking submitted text and using this to identify adversarial examples. Additionally, they show that you can train models to automatically spot adversarial text, but this requires details of the attack.
  Why it matters: Now that companies around the world have deployed commercial and non-commercial AI systems at scale, it’s logical that attackers will try to subvert them. As is the case with visual adversarial examples, today’s neural network-based systems are quite vulnerable to subtle perturbations; we’ll need to make systems more robust to deploy AI more widely with confidence.
  Read more: TextBugger: Generating Adversarial Text Against Real-world Applications (Arxiv).

Training AI systems to build AI systems by copying people:
…Teaching AI to copy the good parts of human-designed systems, while still being creative…
Researchers with Chinese computer vision giant SenseTime and the Chinese University of Hong Kong have published details on IRLAS, a technique to create AI agents that learn to design AI architectures inspired by human-designed networks.
  The technique, Inverse Reinforcement Learning for Architecture Search (IRLAS) works by training a neural network with reinforcement learning to design new networks based on a template derived from a human design. “Given the architecture sampled by the agent as the self-generated demonstration, the expert network as the observed demonstration, our mirror stimuli function will output a signal to judge the topological similarity between these two networks,” the researchers explain.
  The motivation for all of this is that the researchers believe “human-designed architectures have a more simple and elegant topology than existing auto-generated architectures”.
  Results: The researchers use IRLAS to design a network that obtains a 2.60% test error score on CIFAR-10, showing “state-of-the-art performance over both human-designed networks and auto-generated networks”. The researchers also train a network against the large-scale ImageNet dataset and show that IRLAS-trained networks can obtain greater accuracies and lower inference times when deployed in a mobile setting.
  Why it matters: Automating the design of increasingly large aspects of AI systems lets us arbitrage (expensive) human brains for (cheap) computers when designing new neural network architectures. Economics suggest that as we gain access to more powerful AI training hardware the costs of using a neural architecture search approach versus a human-driven one will change enough for the majority of networks to be found via AI systems, rather than humans.
  Read more: IRLAS: Inverse Reinforcement Learning for Architecture Search (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:…

Microsoft calls for action on face recognition and publishes ethics principles:
Microsoft have urged governments to start regulating face recognition technology, in a detailed blog post from company president Brad Smith. The post identifies three core problems to be addressed by governments: avoiding bias and discrimination; protecting personal privacy; and protecting democratic freedoms and human rights. For each issue, they makes clear recommendations about measures required to address them, and identify relevant legal precedents.
   In the same post, Microsoft announce six principles which will guide their use of face recognition: (1) Fairness; (2) Transparency; (3), Accountability; (4) Nondiscrimination; (5) Notice and consent; (6) Lawful surveillance.
  Why this matters: This is a detailed and sensible post, which places Microsoft at the forefront of the discussion around face recognition. This issue is important not only because of the imminent deployment of these technologies, but because it is likely just the first of many AI technologies with far-reaching societal impacts. Given this, our response to face recognition will shape our approach to future developments, and is an important test of our response to changes brought about by AI.
  Read more: Facial recognition: It’s time for action (Microsoft).

EU releases coordinated AI strategy:
The EU have released plans to coordinate member states’ national AI strategies under a common strategic framework. Earlier this year, the EU announced a target of €20bn/year in AI investments over the next decade. Core aspects of the Europe-wide plan include a new industry-academia partnership on AI, a strengthened network of research centres, skills training, and a ‘single market for data’. The plan affirms Europe’s commitment to participating in the ethical debate, through the publication of their AI ethics principles in 2019. The EU reiterates its concerns with lethal autonomous weapons, and will continue to advocate for measures to ensure meaningful human control of weapons systems.
  Read more: Coordinated Plan on Artificial Intelligence (EU).

OpenAI Bits & Pieces:

Want to predict the upper batch size to use during model training? Predict the noise scale:
New research from OpenAI shows how we can better predict the parallelizability of AI workloads by measuring the noise scale during training and using this to predict aspects of how AI training will scale into the future.
  I think measures like this may be surprisingly useful within AI policy. “A central challenge of AI policy will be to work out how to use measures like this to make predictions about the characteristics of future AI systems, and use this knowledge to conceive of policies that let society maximize the upsides and minimize the downsides of these technologies”, we write in the blog post.
  Read more: How AI Training Scales (OpenAI Blog).

Tech Tales:

The servants become the flock and become alive and fly.

20%? Fine. 30%? You might have some occasional trouble, but it’ll be manageable. 40%? OK, that could be a problem. 50%? Now you’re in trouble. Once more than 50% of the cars on a road at any one time are self-driving cars, then you run into problems. Overfitting doesn’t seem like such an academic problem when it involves multiple tons of steel each traveling at 50kmh+, slaloming along a freeway.

The problem is that the cars behave too similarly. Without the randomness caused by human drivers, the robotic self-driving cars full into their own weird pathologies. Local minima. Navigation anti-patterns. Strange turning conventions. Emergent cracks in an otherwise perfect system.

So that let to the manufacturers coming together half a decade ago and conceived of the ‘chaos accords’ – an agreement between all automotive makers about the level of randomness they would try to inject into their self-driving car brains. The goal: recreate a variety of different driving styles in a self-driving car world. The solution: different self-driving cars could now develop different ‘driving personalities’, with the personalities designed to fit within rigorous safety constraints, while offering a greater amount of variety than had been present in previous systems.

Like most epochal events, we didn’t see it coming. Instead, markets took over and as the companies developed more varieties of car with a greater breadth of driving styles, people started to desire more variety in their own cars. This led to the invention of ‘personality evolution’, which would let a self-driving car slowly learn to drive in way that pleased its majority user. Soon after this the companies implemented the same system for themselves, giving many cars the ability to learn from eachother and pursue what was called in the technical literature ‘idiosyncratic evolution strategies’.

It seemed like a great thing at first; faster, smarter, safer cars. Cars moving together in fleets through traffic, with the humans inside waving at each other (especially the children); and new services like AI-fleet-driven ‘joyrides’ on souped up vehicles whose designs came in part from the sensor data of the AI machines. Cars themselves became economic actors, able to assess the ‘unique individual characteristics’ of their own particular driving style, spot the demand for any of their styles or skills on the consumer market, and sell their services to other cars.

None of this looks at all like intelligence, because none of it is. But the outcome of enough nodes in the network with enough emergent property propensity, and this growth through time, leads to things that do intelligent things, even if the parts aren’t smart.

Things that inspired this story: Imitation learning; overfitting; domain randomization; fleet learning; federated learning, evolution; emergent failure.

Import AI: 124: Google researchers produce metric that could help us track the evolution of fake video news; $4000 grants for people to teach deep learning; creating aggressive self-driving cars.

Using AI to learn to design networks with multiple constraints:
…InstaNAS lets people add multiple specifications to neural architecture search…
In the past couple of years researchers have started to use various AI techniques such as reinforcement learning and evolution to use AI to design neural network architectures. This has already yielded numerous systems that display state-of-the-art performance on challenging tasks like image recognition, outperforming systems designed specifically by humans.
More recently, we’ve seen a further push to make such so-called ‘neural architecture search’ (NAS) systems efficient, and approaches like ENAS (Import AI #82)  and SMASH (Import AI #56) have shown how to take systems that previously required hundreds of GPUs and fit them onto one or two GPUs.
  Now, researchers are beginning to explore along another-axis of the NAS space: developing techniques that let them provide multiple objectives to the NAS system, letting them specify networks against different constraints. New research from National Tsing-Hua University in Taiwan and Google Research introduces InstaNAS, a system that lets people specify two categories of objectives as search targets, task-dependent objectives (eg, the accuracy in a given classification task) and architecture-level objectives (eg, latency/computational costs).
  How it works: Training InstaNAS systems involves three phases of work: pre-training a one-shot model, then introducing a controller which learns to select architectures from the one-shot model with respect to each input instance (during this stage, “the controller and the one-shot model are being trained alternatively, which enforces the one-shot model to adapt to the distribution change of the controller”, the researchers write), and a final stage in which the system picks the controller which best satisfies the constraints, then the one-shot model is re-trained with that high-performing controller.
  Results: Systems trained with InstaNAS achieve 48.9% and 40.2% average latency reduction on CIFAR-10 and CIFAR-100 against MobileNetV2 with comparable accuracy scores. Accuracies do take a slight hit (eg, the best accuracy on an InstaNAS system is approximately 95.7%, compared to 96.6% for a NAS-trained system.)
  Why it matters: As we industrialize artificial intelligence we’re going to be offloading increasingly large chunks of AI development to AI systems themselves. The development and extension of NAS approaches will be crucial to this. Though we should bear in mind that there’s an implicit electricity<>human brain tradeoff we’re making here, and my intuition is that for some very large-scale NAS systems we could end up creating some hugely energy-hungry systems, which carry their own implicit (un-recognized) environmental externality.
  Read more: InstaNAS: Instance-aware Neural Architecture Search (Arxiv).

New metrics to let us work out when Fake Video News is going to become real:
…With a little help from StarCraft 2!…
Google Brain researchers have proposed a new metric to give researchers a better way to assess the quality of synthetically generated videos. The motivation for this research is that today we lack effective ways to assess and quantify improvements in synthetic video generation, and the history of the deep learning subfield within AI has tended to show the progress in a domain improves once the research community settles on a standard metric and/or dataset to use to assess progress. (My pet theory for why this is: There are so many AI papers published these days that researchers need simple heuristics to tell them whether to invest time in reading something, and progress against a generally agreed upon shared dataset can be a good input for this – eg, ImageNet (Image Rec), Penn Treebank (NLU), Switchboard Hub 500 (Speech Rec). )
  The metric: Frechet Video Distance (FVD): FVD has been designed to give scores that reflect not only the quality of the video, but also its temporal coherence – aka, the way things transition from frame to frame. FVD is built around what the researchers call a ‘Inflated 3D Convnet’, which has been used to solve tasks in other challenging video domains. Because this network is trained to spot actions in videos it contains useful feature relations that correspond to sequences of movements over time. FVD uses an Inflated 3D Convnet, trained on the Kinetics data set of human-centered YouTube videos, to let FVD characterize the difference between the temporal transitions seen in the synthetic videos, and between its own feature representations of physical movements derived from the real world.
  The datasets: In tandem with FVD, the researchers introduce a new dataset based around StarCraft 2, a top-down real-time strategy game with lush, colorful graphics “to serve as an intermediate step towards real world video data sets.” These videos contain various different tasks in StarCraft 2 which are fairly self-explanatory – move unit to border; collect mineral shards; brawl; and road trip with medivac. The researchers provide 14,000 videos for each scenario.
  Results: FVD seems to be a metric that more closely tracks the scores humans give when performing a qualitative evaluation of synthetic videos. “It is clear that FVD is better equipped to rank models according to human perception of quality”.
  Why it matters: Synthetic videos are likely going to cause a large number of profound challenges in AI policy, as progression in this research domain yields immediate applications in the creation of automated propaganda. One of the most challenging things about this area – until now – has been the lack of available metrics to use to track progression here and thereby estimate when synthetic videos are likely going to become something ‘good enough’ for people to worry about in domains outside of AI research. “We believe that FVD and SCV will greatly benefit research in generative models of video in providing a well tailored, objective measure of progress,” they write.
     Read more: Towards Accurate Generative Models of Video: A New Metric & Challenges (Arxiv).
   Get the datasets from here.

Teaching self-driving cars how to drive aggressively:
…Fusing deep learning and model predictive control for aggressive robot cars…
Researchers with the Georgia Institute of Technology have created a self-driving car system that can successfully navigate a 1:5-scale ‘AutoRally’ vehicle along a dirt track at high speeds. This type of work paves the way for a future where self-driving cars can go off-road, and gives us indications for how militaries might be developing their own stealthy unmanned ground vehicles (UGVs).
  How it works: Fusing deep learning and model predictive control: To create the system, the researchers feed visual inputs from a monocular camera into either a static supervised classifier or a recurrent LSTM (they switch between the two according to the difficulty of the particular section of the map the vehicle is on) which use this information to predict where the vehicle is against a pre-downloaded map schematic. They then feed this prediction into a GPU-based particle filter which incorporates data from the vehicle IMU and wheel speeds to further predict where the vehicle is on the map.
  Superhuman Wacky Races: The researchers test their system out on a complex dirt track on at the Georgia Tech Autonomous Racing Facility. This track “includes turns of varying radius including a 180 degree hairpin and S curve, and a long straight section”. The AutoRally car is able to “repeatedly beat the best single lap performed by an experienced human test driver who provided all of the system identification data.”
  Why it matters: Papers like this show how hybrid systems – where deep learning is doing useful work as a single specific component – are likely going to yield useful applications in challenging domains. I expect the majority of applied robotics systems in the future to use modular systems combining the best of human-specified systems as well as function approximating systems based on deep learning.
  Read more: Vision-Based High Speed Driving with a Deep Dynamic Observer (Arxiv).

What does a robot economy look like and what rules might it need?
…Where AI, Robotics, and Fully Automated Luxury Communism collide…
As AI has grown more capable an increasing number of people have begun to think about what the implications are for the economy. One of the main questions that people contemplate is how to effectively incorporate a labor-light capital-heavy AI-infused economic sector (or substrate of the entire economy) into society in such a way as to increase societal stability rather than reduce it. A related question is: What would an economy look like where an increasing chunk of economic activity happens as a consequence of semi-autonomous robots, many of whom are also providing (automated) services to each other? These are the questions that researchers with the University of Texas at Austin try to answer with a new paper interrogating the implications of a robot-driven economy.
  Three laws of the robot economy: The researchers propose three potential laws for such a robot economy. These are:
– A robot economy has to be developed within the framework of the digital economy, so it can interface with existing economic systems. .
– The economy of robots must have internal capital that can support the market and reflect the value of the participation of robots in our society.
– Robots should not have property rights and will have to operate only on the basis of contractual responsibility, so that humans control the economy, not the machines.
   Tools to build the robot economy: So, what will it take to build such a world? We’d likely need to develop the following tools:
– Create a network to track the status and implication of tasks given to or conducted by robots in accordance with the terms of a digital contract.
– A real-time communication system to let robots and people communicate together and with each-other.
– The ability to use “smart contracts” via the blockchain to govern these economic interactions. (This means that “neither the will of the parties to comply with their word nor the dependence on a third party (i. e. a legal system) is required).
  What does a robot economy mean for society? If we manage to make it through a (fairly unsteady, frightening) economic transition into a robot economy, then some very interesting things start to happen: “the most important fact is that in the long-term, intelligent robotics has the potential to overcome the physical limitations of capital and labor and open up new sources of value and growth”, write the researchers. This would provide the opportunity for vast economic abundance for all of mankind, if taxation and political systems can be adjusted to effectively distribute the dividends of an AI-driven economy.
  Why it matters: Figuring out exactly how society is going to be influenced by AI is one of the grand challenges of contemporary research into the impacts of AI on society. Papers like this suggest that such an economy will have very strange properties compared to our current one, and will likely demand new policy solutions.
  Read more: Robot Economy: Ready or Not, Here It Comes (Arxiv).

Want to teach others the fundamentals of deep learning? Want financial support? Apply for the Depth First Learning Fellowship!
…Applications open now for $4000 grants to help people teach others deep learning…
Depth First Learning, an AI education initiative from researchers at NYU, FAIR, DeepMind, and Google Brain, has announced the ‘Depth First Learning Fellowship’, sponsored by Jane Street.
  How the fellowship works: Successful DFL Fellowship applicants will be expected to design a curricula and lead a DFL study group around a particular aspect of deep learning. DFL is looking for applicants with the following traits: mathematical maturity; effectiveness at scientific communication; ability to commit to ensure the DFL study sessions are useful; a general enjoyment of group learning.
  Applications close on February 15th 2019.
  Apply here (Depth First Learning).

Tired of classifying handwritten digits? Then try CURSIVE JAPANESE instead:
…Researchers release an MNIST-replacement; If data is political, then the arrival of cursive Japanese alongside MNIST broadens our data-political horizon…
For over two decades AI researchers have benchmarked the effectiveness of various supervised and unsupervised learning AI techniques against performance on MNIST, a dataset consisting of a multitude of heavily pixelated black-and-white handwritten digits. Now, researchers linked to the Center for Open Data in the Humanities, MILA in Montreal, the National Institute of Japanese Literature, Google brain, and a high-school in England (a young, major Kaggle winner!), have released “Kuzushiji-MNIST, a dataset which focuses on Kuzushiji (cursive Japanese), as well as two larger, more challenging datasets, Kuzushiji-49 and Kuzushiji-Kanji”.
  Keeping a language alive with deep learning: One of the motivations for this research is to help people access Japan’s own past, as the cursive script used by this dataset is no longer taught in the official school curriculum. “Even though Kuzushiji had been used for over 1000 years, most Japanese natives today cannot read books written or published over 150 years ago,” they write.
  The data: The Kuzushiji dataset is made up of around ~300,000 Japanese books, transcribing some of them, and adding bounding boxes to them. The full dataset consists of 3,999 character types across 403,242 characters. The datasets being releases by the researchers were made as follows: “We pre-processed characters scanned from 35 classical books printed in the 18th century and organized the dataset into 3 parts: (1) Kuzushiji-MNIST, a drop-in replacement for the MNIST [16] dataset, (2) Kuzushiji-49, a much larger, but imbalanced dataset containing 48 Hiragana characters and one Hiragana iteration mark, and (3) Kuzushiji-Kanji, an imbalanced dataset of 3832 Kanji characters, including rare characters with very few samples.”
  Dataset difficulty: In tests the research demonstrate that these datasets are going to be more challenging for AI researchers to work with than MNIST itself – in baseline tests they show that many techniques that get above 99% classification accuracy on MNIST get between 95% and 98% on the Kuzushiji-MNIST drop-in, and scores only as high as around 97% for Kuzushiji-49.
  Why it matters: Work like this shows how as people think more intently about the underlying data sources of AI they can develop new approaches that can let researchers do good AI research while also broadening the range of cultural artefacts that are easily accessible to AI systems and methodologies.
  Read more: Deep Learning for Classical Japanese Literature (Arxiv).

OpenAI Bits & Pieces:

Want to test how general your agents are? Try out CoinRun:
We’ve released a new training environment, CoinRun, which provides a metric for an agent’s ability to transfer its experience to novel situations.
  Read more: Quantifying Generalization in Reinforcement Learning (OpenAI Blog).
  Get the CoinRun code here (OpenAI Github).

Deadline Extension for OpenAI’s Spinning Up in Deep RL workshop:
We’ve extended the deadline for applying to participate in a Deep RL workshop, at OpenAI in San Francisco.
  More details: The workshop will be held on February 2nd 2019 and will include lectures based on Spinning Up in Deep RL, a package of teaching materials that OpenAI recently released. Lectures will be followed by an afternoon hacking session, during which attendees can get guidance and feedback on their projects from some of OpenAI’s expert researchers.
   Applications will be open until December 15th.
Read more about Spinning Up in Deep RL (OpenAI Blog).
  Apply to attend the workshop by filling out this form (Google Forms).

Tech Tales:

Call it the ‘demographic time bomb’ (which is what the press calls it) or the ‘land of the living dead’ (which is what the tabloid press call it) or the ‘ageing population tendency among developed nations’ (which is what the economists call it), but I guess we should have seen it coming: old peoples’ homes full of the walking dead and the near-sleeping living. Cities given over to millions of automatons and thousands of people. ‘Festivals of the living’ attended solely by those made of metal. The slow replacement of conscious life in the world from organic to synthetic.

It started like this: most people in most nations stopped having as many children. Fertility rates dropped. Everywhere became like Japan circa 2020: societies shaped by the ever-growing voting blocs composed of the old people, and the ever-shrinking voting blocs composed of the young.

The young tried to placate the old people with robots – this was their first and most fatal mistake.

It began, like most world-changing technologies, with toys: “Fake Baby 3000” was one of the early models; an ultra-high-end doll designed for the young females of the ultra-rich. Then after that came “Baby Trainer”, a robot designed to behave like a newborn child, intended for the rich wannabe parents of the world who would like to get some practice on a synthetic-life before they birthed and cared for a real one. These robots were a phenomenal success and, much like the early 21st Century market for drones, birthed an ecosystem of ever-more elaborate and advanced automatons.

Half a decade later, someone had the bright idea of putting these robots in old people’s’ homes. The theory went like this: regular social interactions – and in particular, emotionally resonante ones –  have a long history of helping to prevent the various medical degradation of old age (especially cognitive ones). So why not let old peoples’ hardwired paternal instincts do the job of dealing with ‘senescence-related health issues’, as one of the marketing brochures went? It was an instant success. Crowds of the increasingly large populations of old people began caring for the baby robots – and they started to live longer, with fewer of them going insane in their old age. And as they became healthier and more active, they were able to vote in elections for longer periods of time, and further impart their view of the world onto the rest of society.

Next, the old demanded that the robot babies be upgraded to robot children, and society obliged. Now the homes became filled with clanking metal kids, playing games on StairMasters and stealing ice from the kitchen to throw at eachother, finding the novel temperature sensation exciting. The old loved these children and – combined with ongoing improvements in healthcare – lived even longer. They taught the children to help them, and the homes of the old gained fabulous outdoor sculptures and meticulously tended lawns. Perhaps the AI kids were so bored they found this to be a good distraction? wrote one professor. Perhaps the AI kids loved their old people and wanted to help them? wrote another.

Around the world, societies are now on the verge of enacting various laws that would let us create robot adults to care for the ageing population. Metal people, working tirelessly in the service of their ‘parents’, standing in for the duties of the flesh-and-blood. Politics is demographics, and the demographics suggest the laws will be enacted, and the living-dead shall grow until they outnumber the dead-living.

Things that inspired this story: The robot economy, robotics, PARO the Therapeutic Robot, demographic time bombs, markets.

Import AI: 123: Facebook sees demands for deep learning services in its data centers grow by 3.5X; why advanced AI might require a global policeforce; and diagnosing natural disasters with deep learning

#GAN_Paint: Learn to paint with an AI system:
…Generating pictures out of neuron activations – a new, AI-infused photoshop filter…
MIT researchers have figured out how to extract more information from trained generative adversarial networks, letting them identify specific ‘neurons’ in the network that correlate to specific visual concepts. They’ve built a website that lets anyone learn to paint with these systems. The effect is akin to having a competent ultra-fast painter standing by your shoulder, letting you broadly spraypaint an area where you’d like, for instance, some sky, and then the software activates the relevant ‘neuron’ in the GAN model and uses that to paint an image for you.
  Why it matters: Demos like this give a broader set of people a more natural way to interact with contemporary AI research, and help us develop intuitions about how the technology behaves.
  Paint with an AI yourself here: GANpaint (MIT-IBM Watson AI Lab website).
  Read more about the research here: GAN Dissection: Visualizing and Understanding Generative Adversarial Networks (MIT CSAIL).
  Paint with a GAN here (GANPaint website).

DeepMind says the future of AI safety is all about agents that learn their own reward functions:
…History shows that human-specified reward functions are brittle and prone to creating agents with unsafe behaviors…
Researchers with DeepMind have laid out a long-term strategy for creating AI agents that do what humans want in complex domains where it is difficult for humans to construct an appropriate reward function.
  The basic idea here is that to create safe AI agents, we want agents that figure out appropriate reward functions by collecting information from the (typically human) user and use this to learn a reward function, then we can use reinforcement learning to optimize this learned reward function. The nice thing about this approach, according to DeepMind, is that it should work for agents that have the potential to become smarter than humans: “agents trained with reward modeling can assist the user in the evaluation process when training the next agent”.
  A long-term alignment strategy: DeepMind thinks that this approach potentially has three properties that give it a chance of being adopted by researchers: it is scalable, it is economical, and it is pragmatic.
  Next steps: The researchers say these ideas are “shovel-ready for empirical research today”. The company believes that “deep RL is a particularly promising technique for solving real-world problems. However, in order to unlock its potential, we need to train agents in the absence of well-specified reward functions.” This research agenda sketches out ways to do that.
  Challenges: Reward modeling has a few challenges which are as follows: amount of feedback (how much data you need to collect to have the agent successfully learn the reward function); the distribution of feedback (where the agent visits new states which lead to it generating a higher perceived reward for doing actions that are in reality sub-optimal); reward hacking, which is when the agent finds a way to exploit the task to give itself reward that leads to it learning a function that does not reflect the implicit expressed wishes of the user; unacceptable outcomes (taking actions that a human would likely never approve, such as an industrial robot breaking its own hardware to achieve a task; or a personal assistant automatically writing a very rude email; and the reward-result gap (the gap between the optimal reward model and the reward function learned by the agent ). DeepMind thinks that each of these challenges can potentially be dealt with by some specific technical approaches, and today there exist several distinct ways to tackle each of the challenges, which seems to increase the chance of one working out satisfactorily.
  Why it might matter: Human empowerment: Putting aside the general utility of having AI agents that can learn to do difficult things in hard domains without inflicting harm on humans, this research agenda also implies something else: Something which isn’t directly discussed in the paper but which is implicit to this agenda is that it offers a way to empower humans with AI. if AI systems continue to scale in capability then it seems likely that in a matter of decades we will fill society with very large AI systems which large numbers of people interact with. We can see the initial outlines of this today in the form of large-scale surveillance systems being deployed in countries like China; in self-driving car fleets being rolled out in increasing numbers in places like Phoenix, Arizona (via Google Waymo); and so on. I wonder what it might be like if we could figure out a way to maximize the number of people in society who were engaged in training AI agents via expressing preferences. After all, the central mandate of many of the world’s political systems comes from people regularly expressing their preferences via voting (and, yes, these systems are a bit rickety and unstable at the moment, but I’m a bit of an optimist here). Could we better align society with increasingly powerful AI systems by more deeply integrating a wider subset of society into the training and development of AI systems?
  Read more: Scalable agent alignment via reward modeling: a research direction (Arxiv).

Global police, global government likely necessary to ensure stability from powerful AI, says Bostrom:
…If it turns out we’re playing with a rigged slot machine, then how do we make ourselves safe?…
Nick Bostrom, researcher and author of Superintelligence (which influenced the thinking of a large number of people with regard to AI) has published new research in which he tries to figure out what problems policymakers might encounter if it turns out planet earth is a “vulnerable world”; that is a world “which there is some level of technological development at which civilization almost certainly gets devastated by default”.
  Bostrom’s analysis compares the process of technological development as like a person or group of people steadily withdrawing balls from a vase. Most balls are white (beneficial, eg medicines), while some are of various shades of gray (for instance, technologies that can equally power industry or warmaking). What Bostrom’s Vulnerable World Hypothesis papers worries about is whether we could at one point withdraw a “black ball” from the vase. This would be “a technology that invariably or by default destroys the civilization that invents it. The reason is not that we have been particularly careful or wise in our technology policy. We have just been lucky.”
  In this research, Bostrom creates a framework for thinking about the different types of risks that such balls could embody, and outlines some ideas for potential (extreme!) policy responses to allow civilization to prepare for such a black ball.
  Types of risks: To help us think about these black balls, Bostrom lays out a few different types of civilization vulnerability that could be stressed by such technologies.
  Type-1 (“easy nukes”): “There is some technology which is so destructive and so easy to use that, given the semi-anarchic default condition, the actions of actors in the apocalyptic residual make civilizational devastation extremely likely”.
  Type-2a (“safe first strike”): “There is some level of technology at which powerful actors have the ability to produce civilization-devastating harms and, in the semi-anarchic default condition, face incentives to use that ability”.
  Type-2b (“worse global warming”): “There is some level of technology at which, in the semi-anarchic default condition, a great many actors face incentives to take some slightly damaging action such that the combined effect of those actions is civilizational devastation”.
  Type-0: “There is some level of technology that carries a hidden risk such that the default outcome when it is discovered is inadvertent civilizational devastation”.
  Policy responses for a risky world: bad ideas: How could we make a world with any of these vulnerabilities safe and stable? Bostrom initially considers four options then puts aside two as being unlikely to yield sufficient stability to be worth pursuing. These discarded ideas are to: restrict technological development, and “ensure that there does not exist a large population of actors representing a wide and recognizably human distribution of motives” (aka, brainwashing).
  Policy responses for a risky world: good ideas: There are potentially two types of policy response that Bostrom says could increase the safety and stability of the world. These are to adopt “Preventive policing” (which he also gives the deliberately inflammatory nickname “High-tech Panopticon”), as well as “global governance”. Both of these policy approaches are challenging. Preventive policing would require all states being able to “monitor their citizens closely enough to allow them to intercept anybody who begins preparing an act of mass destruction”. Global governance is necessary because states will need “to extremely reliably suppress activities that are very strongly disapproved of by a very large supermajority of the population (and of power-weighted domestic stakeholders)”, Bostrom writes.
  Why it matters: Work like this grapples with one of the essential problems of AI research: are we developing a technology so powerful that it can fundamentally alter the landscape of technological risk, even more so than the discovery of nuclear fission? It seems unlikely that today’s AI systems fit this description, but it does seem plausible that future AI technologies could be. What will we do, then?  “Perhaps the reason why the world has failed to eliminate the risk of nuclear war is that the risk was insufficiently great? Had the risk been higher, one could eupeptically argue, then the necessary will to solve the global governance problem would have been found,” Bostrom writes.
  Read more: The Vulnerable World Hypothesis (Nick Bostrom’s website).

Facebook sees deep learning demand in its data centers grow by 3.5X in 3 years:
…What Facebook’s workloads look like today and what they might look like in the future…
A team of researchers from Facebook have tried to characterize the types of deep learning inference workloads running in the company’s data centers and predict how this might influence the way Facebook designs its infrastructure in the future.
  Hardware for AI data centers: So what kind of hardware might an AI-first data center need? Facebook believes servers should be built with the following concerns in mind: high memory bandwidth and capacity for embeddings; support for powerful matrix and vector engines; large on-chip memory for inference with small batches; support for half-precision floating-point computation.
  Inference, what is it good for? Facebook has the following major use cases for AI in its datacenters: providing personalized feeds, ranking, or recommendations; content understanding; and visual and natural language understanding.
  Facebook expects these workloads to evolve in the future: for recommenders, it suspects it will start to incorporate time into event-probability models, and imagines using larger embeddings in its models which will increase their memory demands; for computer vision, it expects to do more transfer learning via fine-tuning pre-trained models onto specific datasets, as well as exploring more convolution types, different batch sizes, and moving to higher resolutions of imagery to increase accuracy; for language it expects to explore larger batch sizes, evaluate new types of mode, like transformers; and move to deploying larger multi-lingual models.
  Data-center workloads: The deep learning applications in Facebook’s data centers “have diverse compute patterns where matrices do not necessarily have “nice” square shapes. There are also many “long-tail” operators other than fully connected and convolutional layers. Therefore, in addition to matrix multiplication engines, hardware designers should consider general and powerful vector engines,” the researchers write.
  Why it matters: Papers like this give us a sense of all the finicky work required to deploy deep learning applications at scale, and indicates how computer design is going to change as a consequence of these workload demands. “Co-designing DL inference hardware for current and future DL models is an important but challenging problem,” the Facebook researchers write.
  Read more: Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications (Arxiv).

In the future, drones will heal the land following forest fires:
…Startup Droneseed uses large drones + AI to create re-forestation engines….
TechCrunch has written a lengthy profile of DroneSeed, a startup that is using drones and AI to create systems that can reforest areas after wildfires.
  DroneSeed’s machines have “multispectral camera arrays, high-end lidar, six gallon tanks of herbicide and proprietary seed dispersal mechanisms,” according to TechCrunch. The drones can be used to map areas that have recently been burned up in forest fires, then can autonomously identify the areas where trees have a good chance to grow and can deliver seed-nutrient packages to those areas.
  Why it matters: I think we’re at the very beginning of exploring all the ways in which drones can be applied to nature and wildlife maintenance and enrichment, and examples like this feel like tantalizing prototypes of a future where we use drones to perform thousands of distinct civic services.
  Read more here: That night, a forest flew (TechCrunch).
  Check out DroneSeed’s twitter account here.

Learning to diagnose natural disaster damage, with deep learning:
…Facebook & CrowdAI research shows how to automate the analysis of natural disasters…
Researchers with satellite imagery startup CrowdAI and Facebook have shown how to use convolutional neural networks to provide automated assessment of damage to urban areas from natural disasters. In a paper submitted to the “AI for Social Good” workshop at NeurIPs 2018 (a prominent AI conference, formerly named NIPS) the team “propose to identify disaster-impacted areas by comparing the change in man-made features extracted from satellite imagery. Using a pre-trained semantic segmentation model we extract man-made features (e.g. roads, buildings) on the before and after imagery of the disaster affected area. Then, we compute the difference of the two segmentation masks to identify change.”
  Disaster Impact Index (DII): How do you measure the effect of a disaster? The researchers propose DII, which lets them calculate the semantic change that has occurred in different parts of satellite images, given the availability of a before and after dataset. To test their approach they use large-scale satellite imagery datasets of land damaged by Hurricane Harvey and by fires near Santa Rosa.  They show that they can use DII to automatically infer severe flooding and fire damage areas in both images with a rough accuracy (judged by F1 score) of around 80%.
  Why it matters: Deep learning-based techniques are making it cheaper and easier for people to train specific detectors over satellite imagery, altering the number of actors in the world who can experiment with surveillance technologies for both humanitarian purposes (as described here) and likely military ones as well. I think within half a decade it’s likely that governments could be tapping data feeds from large satellite fleets then using AI techniques to automatically diagnose damage from an ever-increasing number of disasters created by the chaos dividend of climate change.
  Read the paper: From Satellite Imagery to Disaster Insights (Facebook Research).

Deep learning for medical applications takes less data than you think:
…Stanford study suggests tens of thousands of images are sufficient for medical applications…
Stanford University researchers have shown that it takes a surprisingly small amount of data to teach neural networks how to automatically categorize chest radiographs. The researchers then trained AlexNet, ResNet-18, and DenseNet-121 baselines on the data, attempting to classify normal versus abnormal images. In tests, the researchers show that it is possible to obtain an area under the receiver operating characteristic curve (AUC) of .095 for a CNN model trained on 20,000 images, versus 0.96 for one trained on 200,000 images, suggesting that it may take less data than previously assumed to train effective AI medical classification tools. (By comparison, 2,000 images yields an AUC of 0.84, representing a significant accuracy penalty.)
  Data scaling and medical imagery: “While carefully adjudicated image labels are necessary for evaluation purposes, prospectively labeled single-annotator data sets of a scale modest enough (approximately 20,000 samples) to be available to many institutions are sufficient to train high-performance classifiers for this task.”
  Drawbacks: All the data used in this study was drawn from the same medical institution, so it’s possible that either the data (or, plausibly, the patients) contain some specific idiosyncracies that mean networks trained on this dataset might not generalize to imagery captured by other medical institutions.
  Why it matters: Studies like this show how today’s AI techniques are beginning to show good enough performance in clinical contexts that they will soon be deployed alongside doctors to make them more effective. It’ll be interesting to see whether the use of such technology can make healthcare more effective (healthcare is one of the rare industries where the addition of new technology frequently leads to cost increases rather than cost savings).
  Some kind of future: In an editorial published alongside the paper Bram van Ginneken , from the Department of Radiology and Nuclear Medicine at Radboud University  in the Netherlands, wonders if we could in the future create large, shared datasets that multiple institutions could use. This dataset “would benefit from training on a multicenter data set much larger than 20,000 or even 200,000 examinations. This larger size is needed to capture the diversity of data from different centers and to ensure that there are enough examples of relatively rare abnormal findings so that the network learns not to miss them,” he writes. “Such a large-scale system should be based on newly designed network architectures that take the full-resolution images as input. It would be advisable to train systems not only to provide a binary output label but also to detect specific regions in the images with specific abnormalities.”
  Read more: Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs (Jared Dunnmon Github / Radiology, PDF).
  Read the editorial: Deep Learning for Triage of Chest Radiographs: Should Every Instituion Train Its Own System? (Jared Dunnmon Github / Radiology, PDF).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

Amnesty and employees petition Google to end Project Dragonfly:
Google employees have published an open letter calling on Google to cancel Dragonfly, its censored search engine being developed for use within China. This follows similar calls by human rights organizations including Amnesty International for the company to suspend the project. The letter accuses the company of developing technologies that “aid the powerful in oppressing the vulnerable”, and of being complicit in the Chinese government’s surveillance programs and human rights abuses.
  Speaking in October about Dragonfly, CEO Sundar Pichai emphasized the need to balance Google’s values with the laws of countries in which they operate, and their core mission of providing information to everyone. Pichai will be testifying to the House Judiciary Committee in US Congress later this week.
  There are clear similarities between these protests and those over Project Maven earlier this year, which resulted in Google withdrawing from the controversial Pentagon contract, and establishing a set of AI principles.
  Read more: We are Google employees. Google must drop Dragonfly (Medium).
  Read more: Google must not capitulate to China’s censorship demands (Amnesty).

High-reliability organizations:
…Want to deploy safe, robust AI? You better make sure you have organizational processes as good as your technology…
As technologies become more powerful, risks from catastrophic errors increase. This is true for advanced AI, even in near-term use cases such as autonomous vehicles or face recognition. A key determinant of these risks will be the organizational environment through which AI is being deployed. New research from Tom Diettrich at Portal State University applies insights from research into ‘high-reliability organizations’ to derive three lessons for the design of robust and safe human-AI systems.

  1. We should aim to create combined human-AI systems that become high-reliability organizations, e.g. by proactively monitoring the behaviour of human and AI elements, continuously modelling and minimizing risk, and supporting combined human-AI cooperation and planning.
  2. AI technology should not be deployed when it is impossible for surrounding human organizations to be highly reliable. For example, proposals to integrate face recognition with police body-cams in the US are problematic insofar as it is hard to imagine how to remove the risk of catastrophic errors from false positives, particularly in armed confrontations.
  3. AI systems should continuously monitor human organizations to check for threats to high-reliability. We should leverage AI to reduce human error and oversight, and empower systems to take corrective actions.

  Why this matters: Advanced AI technologies are already being deployed in settings with significant risks from error (e.g. medicine, justice), and the magnitude of these risks will increase as technologies become more powerful. There is an existing body of research into designing complex systems to minimize error risk, e.g. in nuclear facilities, that is relevant to thinking about AI deployment.
  Read more: Robust AI and robust human organizations (arXiv).

Efforts for autonomous weapons treaty stall:
The annual meeting of the Convention on Conventional Weapons (CCW) has concluded without a clear path towards an international treaty on lethal autonomous weapons. Five countries (Russia, US, Israel, Australia and South Korea) expressed their opposition to a new treaty. Russia successfully reduced the scheduled meetings for 2019 from 10 to 7 days, in what appears to be an effort to decrease the likelihood of progress towards an agreement.
  Read more: Handful of countries hamper discussion to ban killer robots at UN (FLI).

Tech Tales:

Wetware Timeshare

It’s never too hard to spot The Renters – you’ll find them clustered near reflective surfaces staring deeply into their own reflected eyes, or you’ll notice a crowd standing at the edge of a water fountain, periodically holding their arms out over the spray and methodically turning their limbs until they’re soaked through; or you’ll see one of them make their way round a buffet a restaurant, taking precisely one piece from every available type of food.

The deal goes like this: run out of money? Have no options? No history of major cognitive damage in your family? Have the implant? If so, then you can rent your brain to a superintelligence. The market got going a few years ago, after we started letting the robots operate in our financial markets. Well, it turns out that despite all of our innovation in silicon, human brains are still amazingly powerful and, coupled with perceptive capabilities and the very expensive multi-million-years-of-evolution physical substrate, are an attractive “platform” for some of the artificial minds to offload processing tasks to.

Of course, you can set preferences: I want to be fully clothed at all times, I don’t want to have the machine speak through my voice, I would like to stay indoors, etc. Obviously setting these preferences can reduce the value of a given brain in the market, but that’s the choice of the human. If a machine bids on you then you can choose to accept the bid and if you do that it’s kind of like instant-anesthetic. Some people say they don’t feel anything but I always feel a little itch in the vein that runs up my neck. You’ll come around a few hours (or, for rare high-paying jobs, days) later and you’re typically in the place you started out (though some people have been known to come to on sailing ships, or in patches of wilderness, or in shopping malls holding bags and bags of goods bought by the AI).

Oh sure there are protests. And religious groups hate it as you can imagine. But people volunteer for it all the time: some people do it just for the escape value, not for the money. The machines always thank any of the people they have rented brain capacity from, and their complements might shed some light on what they’re doing with all of us: Thankyou subject 478783 we have improved our ability to understand the interaction of light and reflective surfaces; Thankyou subject 382148 we now know the appropriate skin:friction setting for the effective removal of old skin flakes; Thankyou subject 128349 we know now what it feels like to run to exhaustion; Thankyou subject 18283 we have seen sunrise through human eyes, and so on.

The machines tell us that they’re close to developing a technology that can let us rent access to their brains. “Step Into Our World: You’ll Be Surprised!” reads some of the early marketing materials.

Things that inspired this story: Brain-computer interfaces; the AI systems in Iain M Banks books.