Import AI

Import AI #99: Using AI to generate phishing URLs, evidence for how AI is influencing the economy, and using curiosity for self-imitation learning.

by Jack Clark

Auto-generating phishing URLs via AI components:
…AI is an omni-use technology, so the same techniques used to spot phishing URLs can also be used to generate phishing URLs…
Researchers with the Cyber Threat Analytics division of Cyxtera Technologies have written an analysis of how people might “use AI algorithms to bypass AI phishing detection systems” by creating their own system called DeepPhish.
  DeepPhish: DeepPhis works by taking in a list of fraudulent URLS that have been successfully worked in the past, encodes these as a one-hot representation, then trains a model to generate new synthetic URLs given a seed sentence. They found that DeepPhish could dramatically improve the chances of a fraudulent URL getting past automated phishing-detection systems, with DeepPhish URLs seeing a boost in effectiveness from 0.69% (no DeepPhish) to 20.90% (with DeepPhish).
  Security people always have the best names: DeepPhis isn’t the only AI “weapon” system recently developed by researchers, the authors note; other tools include Honey-Phish, SNAP_R, and Deep DGA.
  Why it matters: This research highlights how AI is an inherent omni-use technology, where the same basic components used to, for instance, train systems to learn to spot potentially fraudulent URLS, can also be used to generate plausible-seeming fraudulent URLs.
  Read more: DeepPhish: Simulating Malicious AI (PDF).

Curious about the future of reinforcement learning? Apply more curiosity!
…Self-Imitation Learning, aka: That was good, let’s try that again…
Self-Imitation Learning (SIL) works by having the agent exploit its replay buffer by learning to repeat its own prior actions if they have generated reasonable returns previously and, crucially, only when those actions delivered larger returns than were expected. The authors combine SIL with Advantage Actor-Critic (A2C) and test the algorithm out on a variety of hard tasks, including the notoriously tough Atari exploration game Montezuma’s Revenge. They also report scores for games like Gravitar, Freeway, PrivateEye, Hero, and Frostbite: all areas where A2C+SIL beats A3C+ baselines. Overall, AC2+SIL gets a median score across all of Atari of 138.7%, compared to 96.1% for A2C.
  Robots: They also test a combination of PPO+SIL on simulated robotics tasks within OpenAI Gym and significantly boost performance relative to non-SIL baselines.
  Comparisons: At this stage it’s worth noting that many other algorithms and systems have come out since A2C with better performance on Atari, so I’m a little skeptical of the comparative metric here.
  Why it matters: We need to design AI algorithms that can explore their environment more intelligently. This work provides further evidence that developing more sophisticated exploration techniques can further boost performance. Though, as the report notes, such systems can still get stuck in poor local optima. “Our results suggest that there can be a certain learning stage where exploitation is more important than exploration or vice versa,” the authors write. “We believe that developing methods for balancing between exploration and exploitation in terms of collecting and learning from experiences is an important future research direction.”
  Read more: Self-Imitation Learning (Arxiv).

Yes, AI is beginning to influence the economy:
…New study by experienced economists suggests the symptoms of major economic changes as a consequence of AI are already here…
Jason Furman, former chairman of the Council of Economic Advisers and current professor at the Harvard Kennedy School, and Robert Seamans of the NYU Stern School of Business, have published a lengthy report on AI and the Economy. The report compiles information from a wide variety of sources, so it’s worth reading in full.
  Here are some of the facts the report cites as symptoms that AI is influencing the economy:
– 26X: Increase in AI-related mergers and acquisitions from 2015 to 2017. (Source: The Economist).
– 26%: Real reduction in ImageNet top-5 image recognition error rate from 2010 to 2017. (Source: the AI Index.)
– 9X: Increase in number of academic papers focused on AI from 1996 to now, compared to a 6X increase in computer science papers. (Source: the AI Index.)
– 40%: Real increase in venture capital investment in AI startups from 2013 to 2016 (Source: MGI Report).
– 83%: Probability a job paying around $20 per hour will be subject to automation (Source: CEA).
– 4%: Probability a job paying over $40 per hour will be subject to automation (Source: CEA).
  “Artificial intelligence has the potential to dramatically change the economy,” they write in the report conclusion. “Early research findings suggest that AI and robotics do indeed boost productivity growth, and that effects on labor are mixed. However, more empirical research is needed in order to confirm existing findings on the productivity benefits, better understand conditions under which AI and robotics substitute or complement for labor, and understand regional level outcomes.”
   Read more: AI and the Economy (SSRN).

US Republican politician writes op-ed on need for Washington to adopt AI:
Op-ed from US politician Will Hurd calls for greater use of AI by federal government …
The US government should implement AI technologies to save money and cut the time it takes for it to provide services to citizens, says Will Hurd, chairman of the US Information Technology Subcommittee of the House Committee on Oversight and Government Reform.
  “While introducing AI into the government will save money through optimizing processes, it should also be deployed to eliminate waste, fraud, and abuse,” Hurd said. “Additionally, the government should invest in AI to improve the security of its citizens… it is in the interest of both our national and economic security that the United States not be left behind.”
  Read more: Washington Needs to Adopt AI Soon or We’ll Lose Millions (Fortune).
  Watch the hearing in which I testified on behalf of OpenAI and the AI Index (Official House website).

European Commission adds AI advisers to help it craft EU-wide AI strategy:
…52 experts will steer European AI alliance, advise the commission, draft ethics guidelines, and so on…
As part of Europe’s attempt to chart its path forward in an AI world, the European Commission has announced the members of a 52-strong “AI High Level Group” who will advise the Commission and other initiatives on AI strategy. Members include professors at a variety of European universities; representatives of industry,  like Jean-Francois Gagne the CEO of Element AI, SAP’s SVP of Machine Learning, and Francesca Rossi who leads AI ethics initiatives at IBM and also sits on the board of the Partnership on AI; as well as members of the existential risk/AGI community like Jaan Tallinn, who was the founding engineer of Skype and Kazaa.
  Read more: High-Level Group on Artificial Intelligence (European Commission).

European researchers call for EU-wide AI coordination:
…CLAIRE letter asks academics to sign to support excellence in European AI…
Several hundred researchers have signed a letter in support of the Confederation of Laboratories for Artificial Intelligence Research in Europe (CLAIRE), an initiative to create a pan-EU network of AI laboratories that can work together and feed results into a central facility which will serve as a hub for scientific research and strategy.
  Signatories: Some of the people that have signed the letter so far include professors from across Europe, numerous members of the European Association for Artificial Intelligence (EurAI) and five former presidents of IJCAI (International Joint Conference on Artificial Intelligence).
  Not the only letter: This letter follows the launch of another one in May which called for the establishment of a European AI superlab and associated support infrastructure, named ‘Ellis’. (Import AI: #92).
  Why it matters: We’re seeing an increase in the number of grass roots attempts by researchers and AI practitioners to get governments or sets of governments to pay attention to and invest in AI. It’s mostly notable to me because it feels like the AI community is attempting to become a more intentional political actor and joint-letters like this represent a form of practice for future more substantive engagements.
  Read more: CLAIRE (claire-ai.org).

When Good Measures go Bad: BLEU:
…When is an assessment metric not a useful assessment metric? When it’s used for different purposes…
A researcher with the University of Aberdeen has evaluated how good a metric BLEU (bilingual evaluation understudy) is for assessing the performance of natural language processing systems; they analyzed 284 distinct correlations between BLEU and gold-standard human evaluations across 34 papers and concluded that BLEU is useful for the evaluation of machine translation systems , but found its utility breaks down when used for other purposes, like the assessment of individual texts or scientific hypothesis testing or evaluation of things like natural language generation.
  Why it matters: AI research runs partially on metrics and metrics are usually defined by assessment techniques. It’s worth taking a step back and looking at widely-used things like BLEU to work out how meaningful it can be as an assessment methodology and to remember to use it within its appropriate domains.
  Read more: A Structured Review of the Validity of BLEU (Computational Linguistics).

Neural networks can be more brain-like than you assume:
…PredNet experiments show correspondence between activations in PredNet and activations in Macaque brains…
How brain-like are neural networks? Not very. That’s because, at a basic component level, they’re based on a somewhat simplified ~1950s conception of how neurons work, so their biological fidelity is fairly low. But can neural networks, once trained to perform particular tasks, end up reflecting some of the functions and capabilities found in biological neural networks? The answer seems to be yes, based on several years of experiments in things as varied as analyzing pre-trained vision networks, verifying the emergence of ‘place cells‘, and experiments.
  Harvard and MIT Researchers have analyzed PredNet, a neural network trained to perform next-frame prediction in a video of sequences, to understand how brain-like its behavior is. They find that groups when they expose the network to input its neurons fire with a response pattern (consisting of two distinct peaks) that is analogous to the firing patterns found in individual neurons within Macaque monkeys. Similarly, when analyzing a network trained on the self-driving Kittie dataset in terms of its spatial receptivity they find that the artificial network displays similar dynamics to real ones (though with some variance and error). The same high level of overlap between behavior of artificial and real neurons is roughly true of systems trained on sequence learning tasks.
  Less overlap: The areas where artificial and real neurons display less overlap seems to roughly correlate to intuitively harder tasks, like being able to deal with optical illusions, or in how the systems respond to different classes of object.
  Why it matters: We’re heading into a world where people are going to increasingly use trained analogues of real biological systems to better analyze and understand the behavior of both. PredNet provides an encouraging example that this line of experimentation can work. “We argue that the network is sufficient to produce these phenomena, and we note that explicit representation of prediction errors in units within the feedforward path of the PredNet provides a straightforward explanation for the transient nature of responses in visual cortex in response to static images,” the researchers write. “That a single, simple objective—prediction—can produce such a wide variety of observed neural phenomena underscores the idea that prediction may be a central organizing principle in the brain, and points toward fruitful directions for future study in both neuroscience and machine learning.”
  Read more: A neural network trained to predict future video frames mimics the critical properties of biological neuronal responses and perception (Arxiv).
  Read more: PredNet (CoxLab).

Unsupervised Meta-Learning: Learning how to learn without having to be told how to learn:
…The future will be unsupervised…
Researchers with the University of California at Berkeley have made meta-learning more tractable by reducing the amount of work a researchers needs to do to setup a meta-learning system. Their new ‘unsupervised meta-learning’ (ULM) approach lets their meta-learning agent automatically acquire distributions of tasks which it can subsequently perform meta-learning over. This deals with one drawback of meta-learning, which is that it is typically down to the human designer to come up with a set of tasks for the algorithm to be trained on. They also show how to combine ULM with other recently developed techniques like DIAYN (Diversity is all you need) for breaking environments down into collections of distinct tasks/states to train over.
  Results: UML systems beat basic RL baselinets on simulated 2D navigation and locomotion tasks. They also tend to be obtain performance roughly equivalent to systems built with human-designed tuned reward functions, suggesting that UML can successfully explore the problem space enough to devise good reward signals for itself.
  Why it matters: Because the diversity of tasks we’d like AI to do is much larger than the number of tasks we can neatly specify via hand-written rules it’s crucial we develop methods that can rapidly acquire information from new environments and use this information to attack new problems. Meta-learning is one particularly promising approach to dealing with this problem, and by removing another one of its more expensive dependencies (a human-curated task distribution) UML may help push things forward. “An interesting direction to study in future work is the extension of unsupervised meta-learning to domains such as supervised classification, which might hold the promise of developing new unsupervised learning procedures powered by meta-learning,” the researchers write.
  Read more: Unsupervised Meta-Learning for Reinforcement Learning (Arxiv).

OpenAI Bits&Pieces:

Better language systems via unsupervised learning:
New OpenAI research shows how to pair unsupervised learning with supervised finetuning to create large, generalizable language models. This sort of result is interesting because it shows how deep learning components can end up displaying sophisticated capabilities, like being able to obtain high scores on Winograd schema tests, having only learned naively from large amounts of data rather than via specific hand-tuned rules.
  Read more: Improving Language Understanding with Unsupervised Learning (OpenAI Blog).

Tech Tales:

Special Edition: Guest short story by James Vincent, a nice chap who writes about AI. All credit to James, all blame to me, etc… jack@jack-clark.net.

Shunts and Bumps.

Reliable work, thought Andre, that was the thing. Ignore the long hours, freezing warehouses, and endless retakes. Ignore the feeling of being more mannequin than man when the director storms onto set, snatches the coffee cup out of your hand and replaces it with a bunch of flowers without even looking at you. Ignore it all. This was a job that paid, week after week, and all because computers had no imagination.

God bless their barren brains.

Earlier in the year, Rocky had explained it to him like this. “They’re dumb as shit, ok? Show them a potato 50 times and they’ll say it’s an orange. Show them it 5,000 times and they’ll say it’s a potato but pass out in shock if you turn it into fries. They just can’t extrapolate like humans can — they can’t think.” (Rocky, at this point, had been slopping her beer around the bar as if trying to short-circuit a crowd of invisible silicon dunces.) “They only know what you show them, and only then when you show them it enough times. Like a mirror … that gets a burned-in image of your face after you’ve looked at it every day for year.”

For the self-driving business, realizing this inability to extrapolate had been a slow and painful process. “A bit of a car crash,” Rocky said. The first decade had been promising, with deep learning and cheap sensors putting basic autonomy in every other car on the road. Okay, so you weren’t technically allowed to take your hands off the wheel, and things only worked perfectly in perfect conditions: clearly painted road markings, calm highways, and good weather. But the message from the car companies was clear: we’re going to keep getting better, this fast, forever.

Except that didn’t happen. Instead, there was freak accident after freak accident. Self-driving cars kept crashing, killing passengers and bystanders. Sometimes it was a sensor glitch; the white side of a semi getting read as clear highway ahead. But more often it was just the mild chaos of life: a party balloon drifting into the road or a mattress falling off a truck. Moments where the world’s familiar objects are recombined into something new and surprising. Potatoes into fries.

The car companies assured us that the data they used to train their AI covered 99 percent of all possible miles you could travel, but as Rocky put it: “Who gives a fuck about 99 percent reliability when it’s life or death? An eight-year-old can drive 99 percent of the miles you can if you put her in a booster seat, but it’s those one percenters that matter.”

Enter: Andre and his ilk. The car companies had needed data to teach their AIs about all the weird and unexpected scenarios they might encounter on the road, and California was full of empty film lots and jobbing actor who could supply it. (The rise of the fakies hadn’t been kind to the film industry.) Every incident that an AI couldn’t extrapolate from simulations was mocked up in a warehouse, recorded from a dozen angles, and sold to car companies as 4D datasets. They in turn repackaged it for car owners as safety add-ons sold at $300 a pop. They called it DDLC: downloadable driving content. You bought packs depending on your level of risk aversion and disposable income. Dog, Cats, And Other Furry Fiends was a bestseller. As was Outside The School Gates.

It was a nice little earner, Rocky said, and typical of the tech industry’s ability to “turn liability into profit.” She herself did prototyping at one of the higher-end self-driving outfits. “They’re obsessed with air filtration,” she’d told Andre, “Obsessed. They say it’s for biological attacks but I think it’s to handle all their meal-replacement-smoothie farts.” She’d also helped him find the new job. As was usually the case when the tech industry used cheap labor to paper over the cracks in its products, this stuff was hardly advertised. But, a few texts and a Skype audition later, and here he was.

“Ok, Andre, this time it’s the oranges going into the road. Technical says they can adjust the number in post but would prefer if we went through a few different velocities to get the physics right. So let’s do a nice gentle spill for the first take and work our way up from there, okay?”

Andre nodded and grabbed a crate. This week they were doing Market Mayhem: Fruits, Flowers, And Fine Food and he’d been chucking produce about all day. Before that he’d pushing a cute wheeled cart around on the warehouse’s football field-sized loop of fake street. He was taking a break after the crate work, staring at a daisy pushing its way through the concrete (part of the set or unplanned realism?) when the producer approached him.

“Hey man, great work today — oops, got a little juice on ya there still — but great work, yeah. Listen, dumb question, but how would you like to earn some real money? I mean, who doesn’t, right? I see you, I know you’ve got ambitions. I got ‘em too. And I know you’ve gotta take time off for auditions, so what I’m talking about here is a little extra work for triple the money.”

Andre had been suspicious. “Triple the money? How? For what?”

“Well, the data we’ve been getting is good, you understand, but it’s not covering everything the car folks want. We’re filling in a lot of edge cases but they say there’s still some stuff there’s no data for. Shunts and bumps, you might say. You know, live ones… with people.”

And that was how Andre found himself, standing in the middle of a fake street in a freezing warehouse, dressed in one of those padded suits used to train attack dogs, staring down a mid-price sedan with no plates. Rocky had been against it, but the money had been too tempting to pass up. With that sort of cash he’d be able to take a few days off, hell, maybe even a week. Do some proper auditions. Actually learn the lines for once. And, the producer said, it was barely a crash. You probably wouldn’t even get bruised.

Andre gulped, sweating despite the cold air. He looked at the car a few hundred feet away. The bonnet was wrapped in some sort of striped, pressure sensitive tape, and the sides were knobbly with sensors. Was the driver wearing a helmet? That didn’t seem right. Andre looked over to the producer, but he was facing away from him, speaking quickly into a walkie-talkie. The producer pointed at something. A spotlight turned on overhead. Andre was illuminated. He tried to shout something but his tongue was too big in his mouth. Then he heard the textured whine of an electric motor, like a kazoo blowing through a mains outlet, and turned to see the sedan sprinting quietly towards him.

Regular work, he thought, that was the thing.

Things that inspired this story: critiques of deep learning; failures of self driving systems; and imitation learning.

Once again, the story above is from James Vincent, find him on Twitter and let him know what you thoughts!

Import AI #98: Training self-driving cars with rented firetrucks; spotting (staged) violence with AI-infused drones; what graphs might have to do with the future of AI.

by Jack Clark

Cruise asks to borrow a firetruck to help train its self-driving cars:
…Emergency training data – literally…
Cruise, a self-driving car company based in San Francisco, wants to expose its vehicles to more data involving the emergency services, so then it asked the city if it could rent a firetruck, fire engine, and ambulance, and have the vehicles drive around a block in the city with their lights flashing, according to emails surfaced via Freedom of Information Act requests from Jalopnik.
  Read more: GM Cruise Prepping Launch of Driverless Pilot Car Pilot in San Francisco: Emails (Jalopnik).

Experienced researcher: What to do if winter is coming:
…Tips for surviving the post-bubble era in AI…
John Langford, a well-regarded researcher with Microsoft, has some advice for people in the AI community as they carry out the proverbial yak-shaving act of questioning whether AI is in a bubble or not. Though the field shouldn’t optimize for failure, it might be helpful if it planned for it, he says.
 “As a field, we should consider the coordinated failure case a little bit. What fraction of the field is currently at companies or in units at companies which are very expensive without yet justifying that expense? It’s no longer a small fraction so there is a chance for something traumatic for both the people and field when/where there is a sudden cut-off,” he writes.
  Read more: When the bubble bursts… (John Langford’s personal blog).

Drone AI paper provides a template for future surveillance:
…Lack of discussion of impact of research raises eyebrows…
Researchers with the University of Cambridge, the National Institute of Technology, and the Indian Institute of Science, have published details on a “real-time drone surveillance system” that uses deep learning. The system is designed to spot violent activities like strangling, punching, kicking, shooting, stabbing, and so on, by performing image recognition over imagery gathered from a crowd in real-time.
  It’s the data, silly: To carry out this project the researchers create their own (highly staged) collection of around 2,000 images called the ‘Aerial Violent Individual’ dataset, which they record via a consumer-based Parrot AR Drone. Most of the flaws in the system relate to this data, which sees a bunch of people carry out over-acted expressions of aggression towards each other – this data doesn’t seem to have much of a relationship to real-world violence and it’s not obvious how well this would perform in the wild.
  Results: The resulting system “works”, in the sense that the researchers are able to obtain high accuracies (90%+) on classifying certain violent behaviors within the dataset, but it’s not clear whether this translates to anything of practical use in the real world. The researchers will subsequently test out their work at a music festival in India later this month, they said.
  Responsibility: Like the “Deep Video Networks” research which I wrote about last week, much of this research is distinguished by the immense implications it appears to have for society, and it’s a little sad to see no discussion of this in the paper – yes, surveillance systems like this can likely be used to humanitarian ends, but they can also be used by malicious actors to surveil or repress people. I think it’s important AI researchers start to acknowledge the omni-use nature of their work and confront questions like this within the research itself, rather than afterwards following public criticism.
  Read more: Eyes in the Sky: Real-time Drone Surveillance System (DSS) for VIolent Individuals Identification using ScatterNet Hybrid Deep Learning Framework (Arxiv).
  Watch video (YouTube).

“Depth First Learning” launches to aid understanding of AI papers:
…Learning through a combination of gathering context and testing understanding…
Industry and academic researchers have launched ‘Depth First Learning”, an initiative to make it easier for people to educate themselves about important research papers by going through the key ideas of the paper along with recommended literature to read and various questions throughout each writeup indented to test for the reader having learned enough about the context to answer the question. The idea behind this work is that it makes it easier to understand research papers by breaking them down into their fundamental concepts. “We spent some time understanding each paper and writing down the core concepts on which they were built,” the researchers write.
  Read an example: “Depth First Learning” article on InfoGAN (Depth First Learning website).
  Read more: Depth First Learning (DFL website, About page).

Graphs, graphs everywhere: The future according to DeepMind:
…Why a little structure can be a very good thing…
New research from DeepMind shows how to fuse structured approaches to AI design with end-to-end learned systems to create systems that can not only learn about the world, but recombine learnings in new ways to solve new problems. This sort of “combinatorial generalization” is key to intelligence, the authors write, and they claim their approach deals with some of the recent criticisms of deep learning made by people like Judea Pearl, Josh Tenenbaum, and Gary Marcus, among others.
  Structure, structure everywhere: The authors argue that many of today’s deep learning systems already encode this sort of bias towards structure in the form of specific arrangements of learned components, for example, how convolutional neural networks are composed out of convolutional layers and then chained together in increasingly elaborate ways for image recognition. These designs encode within them an implicit relational inductive bias, the authors write, because they take in a bunch of data and operate over its relationships in increasingly elaborate ways. Additionally, most problems can be decomposed into graph representations (for instance, modeling the interactions of a bunch of pool balls can be done by expressing the pool balls and the table as nodes in a graph with the links between them signaling directions in which force may be transmitted, or a molecule can similarly be decomposed as atoms (nodes) and bonds (edges).
  Graph network: DeepMind has developed the ‘Graph network’ (GN) block, a generic component “which takes a graph as input, performs computations over the structure, and returns a graph as output.” This is desirable because a graph structure is fairly flexible, letting you express an arbitrary number of relationships between an arbitrary number of entities, and the same function can be deployed on differently sized graphs, and these graphs represent entities and relations as sets making them invariant to permutations.
  No silver bullet: Graph networks don’t make it easy to support approaches like “recursion, control flow, and conditional iteration”, they say, and so should not be considered a panacea. Another is the larger question of where to derive the graphs from that the graphs operate over, which the authors leave to other researchers.
  Read more: Relational inductive biases, deep learning, and graph networks (Arxiv).

Google announces AI principles to guide its business:
…Company releases seven principles, along with description of ‘AI applications we will not pursue’…
Google has published its AI principles, following an internal employee outcry in response to the company’s participation in a drone surveillance project for the US military. These principles are intended to guide Google’s work in the future, according to a blog post written by Google CEO Sundar Pichai. “These are not theoretical concepts; they are concrete standards that will actively govern our research and product development and will impact our business decisions”.
  Principles: The seven principles are as follows:
– “Be socially beneficial”.
– “Avoid creating or reinforcing unfair bias”.
– “Be built and tested for safety”.
– “Be accountable to people”.
– “Incorporate privacy design principles”.
– “Uphold high standards of scientific excellence”.
– “Be made available for uses that accord with these principles”.
   What Google won’t do: Google has also published a (short) list of “AI applications we will not pursue”. These are pretty notable because it’s rare for a public company to place such restrictions on itself so abruptly. The things Google won’t pursue are as follows:
– “Technologies that cause or are likely to cause overall harm”.
– “Weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people”.
– “Technologies that gather or use information for surveillance violating internationally accepted norms”.
– “Technologies whose purpose contravenes widely accepted principles of international law and human rights”.
   Read more: AI at Google: our principles (Google Blog).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net …

India releases national AI strategy:
India is the latest country to launch an AI strategy, releasing a discussion paper last week.
   Focus on five sectors: The report identifies five sectors in which AI will have significant societal benefits, but which may require government support in addition to private sector innovation. These are: healthcare; agriculture; education; smart cities and infrastructure; mobility and transportation.
   Five barriers to be addressed:
– Lack of research expertise
– Absence of enabling data ecosystems
– High resource cost and low awareness for adoption
– Lack of regulations around privacy and security
– Absence of collaborative approach to adoption and applications.
What they’re doing: The report proposes supporting two tiers of organizations to drive the strategy.
– 
Centres of Research Excellence – academic/research hubs
– International Centres of Transformational AI – bodies with a mandate of developing and deploying research, in partnership with private sector.
   Read more: National Strategy for Artificial Intelligence.

Tech Tales:

The Dream Wall

Everyone’s DreamWall is different and everyone’s DreamWall is intimate. Nonetheless, we share (heavily curated) pictures of them with eachother. Mine is covered in mountains and on each mountain peak there are little desks with lamps. My friend’s Wall is shows an underwater scene and includes spooky trenches and fish that swim around them and the occasional hint of an octopus. One famous person accidentally showed a picture of their dream wall via a poorly posed selfie and it caused them problems because the DreamWall showed a pastoral country scene with nooses hanging from the occasional tree and in one corner a haybale-sized pile of submachineguns. Even though most people know how DreamWalls work they can’t help but judge other people for the contents of theirs.

It works like this:

When you wake up you say some of the things you were dreaming about.

Your home AI system records your comments and sends them to your personal ‘DreamMaker’ software

The ‘DreamMaker’ software maps your verbal comments to entities in its knowledge graph, then sends those comment-entity pairs to the DreamArtist software.

DreamArtist tries to render the comment-entity data into individual objects which fit with the aesthetic theme inherent to your current DreamWall.

The new objects are sent to your home AI system which displays them on your DreamWall and gives you the option to add further prompts, such as “move the cow to the left” or “make sure that the passengers in the levitating car look like they are having fun”.

This cycle repeats every morning, though if you don’t say anything when you wake up it will maintain the DreamWall and only modulate its appearance and dynamics according to data about how active you had been in the night.

If you wake up with someone else most systems have failsafes that mean your DreamWall won’t display. Some companies are piloting ‘Couple’s DreamWalls’ but are having trouble with it – apart from some old couples that have been together a very long time, most people, even if they’re in a very harmonious relationship, have distinct aspects to their personality that the other person might not want to wake up to every single day – especially since DreamWalls tend to contain visual depictions of things otherwise repressed during daily life.

Import AI #97: Faking Obama and Putin with Deep Video Portraits, Berkeley releases a 100,000+ video self-driving car dataset, and what happens when you add the sensation of touch to robots.

by Jack Clark

Try a little tenderness: researchers add touch sensors to robots.
…It’s easier to manipulate objects if you can feel them…
Researchers with the University of California at Berkeley have added GelSight touch sensors to a standard 7-DoF Rethink Robotics ‘Sawyer’ robot with an attached Weiss WSG-50 parallel gripper to explore how touch inputs can improve performance at grasping objects – a crucial skill for robots to have if used in commercial settings.
  Technique: The researchers construct four sub-networks that operate over specific data inputs (camera image, two GelSight images to model texture senses before and after contact, and an action network that processes 3D motion, in-plane rotation, and change in force. They link these networks together within a larger network and train the resulting model over a dataset of objects. The researchers pre-train the image components of the network with a model previously trained to classify objects on ImageNet. The approach yields a model that adapts to novel surfaces, learns interpretable policies, and can be taught to apply specific constraints when handling an object, like grasping it gently. 
  Results: The researchers test their model and find that systems trained with vision and action inputs get 73.03% accuracy, compared to 79.34% for systems trained on tactile inputs and action, compared to 80.28% for systems trained with tactile and vision and action.
   Harder than you think: This task, like most that require applying deep learning components to real-world systems, contains a few quirks which might seem non-obvious from the outset, for example: “The robot only receives tactile input intermittently, when its fingers are in contact with the object and, since each re-grasp attempt can disturb the object position and pose, the scene changes with each interaction”.
  Read more: More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch (Arxiv).

Want 100,000 self-driving car videos? Berkeley has you covered!
…”The largest and most diverse open driving video dataset so far for computer vision research”., according to the researchers..
Researchers with the University of California at Berkeley and Nexar have published BDD100K, a self-driving car dataset which BDD100K contains ~120,000,000 images spread across ~100,000 videos. “Our database covers different weather conditions, including sunny, overcast, and rainy, as well as different times of day including daytime and nighttime,” they say. The dataset is substantially larger than ones released by the University of Toronto (KITTI), Baidu (ApolloScape), Mapillary, and others, they say.
DeepDrive: The dataset release is significant for where it comes from: DeepDrive, a Berkeley-led self-driving car research effort with a vast range of capable partners, including automotive companies such as Honda, Toyota, and Ford. DeepDrive was set up partially so its many sponsors could pool research efforts on self-driving cars, seeking to close an implicit gap with other players.
  Rich data: The videos are annotated with hundreds of thousands of labels for objects like cars, trucks, persons, bicycles, and so on, as well as richer annotations for road lines drivable areas, and more; they also provide a subset of roughly ~10,000 images with full-frame instance segmentation.
  Why it matters – the rise of the multi-modal dataset: The breadth of the dataset with its millions of labels and carefully refined aspects will likely empower researchers in other areas of AI, as well as its obvious self-driving car audience. I expect that in the future these multi-modal datasets will become increasingly attractive targets to use to evaluate transfer learning from other systems, for instance by training a self-driving car model in a rich simulated world then applying it to real-world data, such as BDD100K.
  Challenges: The researchers are hosting three challenges at computer vision conference CVPR relating to the dataset, and are asking groups to compete to develop systems for road object detection, drivable area prediction, and domain adaptation.
  Read more: BDD100K: A Large-scale Diverse Driving Video Database (Berkeley AI Research blog).

KPCB’s Mary Meeker breaks down AI’s rise and China’s possible advantage in annual presentation:
…Annual slide-a-thon shows rise of China, points to image and speech recognition scores as evidence for impact of AI…
Mary Meeker’s annual presentation of research serves as a useful refresher for what is front-of-mind for venture capitalists focused on understanding the dynamics that affect the technology ecosystem. This year, at Code Conference in California, Meeker’s slides were distinguished via large sections spent on China, combined with a few notable slides situating AI progress metrics (specifically in object recognition and speech recognition) in relation to the growth of new markets for business.
  Read more: Mary Meeker’s 2018 internet trends report: All the slides, plus analysis (Recode).

SPECIAL SECTION: FAKE EVERYTHING:

An incomplete timeline of dubious things that people have synthesized via AI
– Early 2017:
Montreal Startup Lyrebird launches with audio recording featuring synthesized voices of Donald Trump, Barack Obama, Hillary Clinton.
– Late 2017:
“DeepFakes” arrive on the internet via Reddit with a user posting pornographic movies with celebrity faces animated onto them. A consumer-oriented free editing application follows and DeepFakes rapidly proliferate across the internet, then consumer sites start to clamp down on them.
– 2018:
Belgian socialist party makes a video containing a synthesized Donald Trump giving a (fake) speech about climate change. Party says video designed to create debate and not trick viewers.
– Listen: Politicians discussing about Lyrebird (Lyrebird Soundcloud).
– Read more: DeepFakes Wikipedia entry.
– Read more: Belgian Socialist Party Circulates “Deep Fake” Donald Trump Video (Politico Europe).

Why all footage of all politicians is about to become suspect:
…Think fake news is bad now? ‘Deep Video Portraits’ will make it much, much worse…
A couple of years ago European researchers caused a stir with ‘face2face’, technology which they demonstrated by mapping their own facial expressions onto synthetically rendered footage of famous VIPs, like George Bush, Barack Obama, and so on. Now, new research from a group of American and European researchers has pushed this fake-anyone technology further, increasing the fidelity of the rendered footage, reducing the amount of data needed to construct such convincing fakes, and also dealing with visual bugs that would make it easier to identify the output as being synthesized.
  In their words: “We address the problem of synthesizing a photo-realistic video portrait of a target actor that mimics the actions of a source actor, where source and target can be different subjects,” they write. “Our approach enables a source actor to take full control of the rigid head pose, face expressions and eye motion of the target actor”. (Emphasis mine.)
  Technique: The technique involves a few stages: first, the researchers track the people within the source and target videos via a monocular face reconstruction approach, which allows them to extract information about the identity, head pose, expression, eye gaze, and scene lighting for each video frame. They also separately track the gaze of each subject. They then essentially transfer the synthetic renderings of the input actor onto the target actor and perform a couple of clever tricks to make the resulting output high fidelity and less prone to synthetic tells like visual smearing/blurring of the background behind the manipulated actor.
  Why it matters: Techniques like this will have a bunch of benefits for people working in media and CGI, but they’ll also be used by nation states, fringe groups, and extremists, to attack and pollute information spaces and reduce overall trust in the digital infrastructure of societal discourse and information transmittion. I worry that we’re woefully unprepared for the ramifications of the rapid proliferation of these techniques and applications. (And controlling the spread of such a technology is a) extremely difficult and b) of dubious practicality and c) potentially harmful to broader beneficial scientific progress.)
  An astonishing absence of consideration: I find it remarkable that the researchers don’t take time in the paper to discuss the ramifications of this sort of technology, given that they’re demonstrating it by doing things like transferring President Obama’s expressions onto Putin’s, or Obama’s onto Reagan’s. They make no mention of the political dimension to this work in their ‘Applications’ section, which focuses on the technical details of the approach and how it can be used for applications like ‘visual dubbing’ (getting an actor’s mouth movements to map to an audio track’.
  Read more: Deep Video Portraits (Arxiv).
  Watch video for details: Deep Video Portraits – SIGGRAPH 2018 (YouTube).

DARPA to host synthetic video/image competition:
..US defense-oriented research organization to try and push state-of-the-art in creation and detection of fakes…
Nation states have become aware of the tremendous potential for subterfuge that this technology poses and are reacting by dumping research money into both exploiting this for gain and for defending against it. This summer, DARPA will hold a competition to see who can create the most convincing synthetic images, and also to see who can detect them.
  Read more: The US military is funding an effort to catch deepfakes and other AI trickery (MIT Technology Review).

$$$$$$$$$$
Import AI Job Alert: I’m hiring an editor/sub-editor:
  I’m hiring someone to initially sub-edit and eventually help edit the OpenAI blog. The role will be a regularly compensated gig which should initially take about 1.5-2 hours every week, typically at around 9pm Pacific Time on Sunday Nights. If you’d be interested in this then please send me an email telling me why you’d be a good fit. The ideal candidate probably has familiarity with AI research papers, attention to detail, and experience fiddling with words in a deadline-oriented setting. I’ve asked around among some journalists and the fair wage seems to be about $25 per hour.
  Additional requirements: You’d need to be available via real-time communication such as WhatsApp or Slack during the pre-agreed upon editing window. Sometimes I may need to shift the time a bit if I’m traveling, but I’ll typically have advance warning.
  Send emails with subject line “Import AI editing job: [Your Name]” to jack@jack-clark.net.
$$$$$$$$$$$$$

Predicting cyber attacks on utilities with variational auto-encoders:
…All watched over by (variational) machines of loving grace…
Researchers with water infrastructure company Xylem Inc have tested out a variational auto-encoder (VAE)-based system for detecting cyberattacks on utility systems. The research highlights a feature of contemporary AI methods that is both a drawback and a strength: their excellence at curve-fitting in big, high-dimensional spaces. Here, the researchers use a VAE to train a model on past observations that represent 43 variables within a municipal water network. They then study how this model reacts to unforeseen changes in the system that might indicate a cyberattack: the model works better than rule-based systems, with the VAE spitting out a constant logarithm of reconstruction probability (LRP) which tends to diverge when the underlying system departs from the norm.
  Strengths: “The model relies solely on sensor reads data in their raw form and requires no preprocessing, system knowledge, or domain expertise to function. It is generic and can be readily applied to a broad array of ICS’s in various industry sectors.”
  Weaknesses: “It is not perfect and has its own requirements (e.g., availability of vast amount of system observations data) and drawbacks (e.g., sensitivity to rare but planned operations such as activation of emergency booster pumps),” they write. This highlights one of the weaknesses of the great curve-fitting power of contemporary AI techniques (Judea Pearl has argued that curve-fitting is pretty much all these systems are capable of), which is that they’re naive as to changing circumstances and lack the common sense to distinguish malice from action.
  Why it matters: Techniques like this are pretty crude but they indicate that there’s a basic value in training basic machine learning systems on data to spot anomalies. This research to me is mostly interesting due to its context – its researchers are all linked to a traditional ‘non-tech’ organization and the technology is tested against real-world data. Part of the virtue of publishing a paper like this is probably to help with hiring, as the researchers will be able to point prospective candidates to this paper as an indication for why Xylem is an interesting place to work. It’s possible to imagine a future where basic predicting models are layered into the data streams of every town and utility, providing an additional signal to human overseers.
  Read more: Cyberattack Detection using Deep Generative Models with Variational Inference (Arxiv).

####################

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net …

Google will not renew contract for Project Maven, plans to release principles for how it approaches military and intelligence contracting:
….Big Tech continues to grapple with ethical challenges around military AI…
Google announced internally on Friday that it would not be renewing the contract it had with the US military for Project Maven, an AI-infused drone surveillance platform, according to Gizmodo. Google also said it is drafting ethical principles for how it approaches military and intelligence contracting.
  Why it matters: Military uses of AI remains one of the most contentious issues for the industry, and society, to grapple with. Tech giants will have a role in setting standards for the industry at large. (One wonders how much of a role Google can play here in the USA now, given that it will now be viewed as deeply partisan by the DoD – Jack) Given that AI is a particularly powerful ‘dual-use’ technology, direct military applications may end up being one of the easier ethical dilemmas the industry faces in the future.
  Read more: Google plans not to renew Project Maven contract (Gizmodo).
  Read more: How a Pentagon Contract Became an Identity Crisis for Google (NYT).

UK public opposed to AI decision-making in most parts of public life:
…The Brits don’t like the bots…
The RSA and DeepMind have initiated a project to create ‘meaningful public engagement on the real-world impacts of AI’. The project’s first report includes a survey of UK public attitudes towards automated decision-making systems.
  Lack of familiarity: With the exception of online advertising (48% familiar), respondents were overwhelmingly unfamiliar with the use of automated decision-making in key areas. Only 9% were aware of its use in the criminal justice system, 14% in immigration, and 15% in the workplace.
  Opposition to AI decision-making: Most respondents were opposed to the usage of these methods in most parts of society. The strongest opposition was in the usage of AI in the workplace (60% opposed vs 11% support) and criminal justice (60% opposed vs. 12% support).
  What the public want: While 29% said nothing would increase their support for automated decision-making, the poll pointed to a few potential re-mediations that people would support:
  36%: The right to demand an explanation for an automated decision.
  33%: Penalties for companies failing to monitor systems appropriately.
  24%: A set of common principles guiding the use of such systems.
  Why it matters: The report notes that a public backlash against these technologies cannot be ruled out if issues are not addressed. The RSA’s proposal for public engagement via deliberative processes and ‘citizens’ juries’, if successful, could provide a model for other countries.
   Read more: Artificial Intelligence – Real Public Engagement (RSA).

Open Philanthropy Project launches AI Fellows program:
Over $1 million in funding for high-impact AI researchers…
The Open Philanthropy Project, the grant-making foundation funded by Cari Tuna and Dustin Moskovitz, is providing $1.1m in PhD funding for seven AI/ML researchers focused on minimizing potential risks from advanced AI systems. “Increasing the probability of positive outcomes from transformative AI”, is one of Open Philanthropy’s priorities.
  Read more: Announcing the 2018 AI Fellows.
  Read more: AI Fellowship Program.

OpenAI Bits & Pieces:

OpenAI Fellows:
We designed this program for people who want to be an AI researcher, but do not have a formal background in the field. Applications for Fellows starting in September are open now and will close on July 8th at 12AM PST.
  Read more: OpenAI Fellows (OpenAI Blog).

Tech Tales:

Walking Through Shadows That Feel Like Sand

Yes, people died. What else would you expect?

People walked off cliffs. People walked into the middle of the street. People left stoves on. People forgot to eat. People lost their minds.

Yes, we punished the people that caused these people to die. We punished these people with lawsuits or criminal sentences. Sometimes we punished them with both.

But the technology got better.

Less people died.

At some point the technology started saving more people than it killed.

People stopped short of ledges. People pulled back from traffic. People remembered to turn stoves off. People would eat just the right amount. People healed their minds.

Was it perfect? No. Nothing ever is.

Did we adopt it? Yes, as we always do.

Are we happy? Yes, most of us are. And the more of us that are happy, the more likely everyone is going to be happy.

Now, where are we? We are everywhere.

We wear these goggles and we get to choose what we see.
We wear these clothes that let us feel additional sensations to supplement or replace the world.
We have these chips in our eardrums that let us hear the world better than dogs, or hear nothing at all, or hear something else entirely.

We walk through city streets and get to feel the density of other people via vibrations superimposed onto our bodies by our clothes.
We watch our own pet computer programs climbing the synth-neon signs that hang off of real church steeples.
We see sunsets superimposed on one another and we can choose whenever to see them, even in the thick of night.

When we are sad we diffuse our sadness into the world around us and the world responds back with rising violins or crashing waves.
Sometimes when we are sad the sun and the moon cry with us.
Sometimes we feel cold tears on the backs of our necks from the stars.

We are many and always growing and learning. What we experience is our choice. But our world grows richer by the day and we feel the world receding, as though a masterpiece overlaid with other paints from other artists, growing richer by the moment.

We do not feel this world, this base layer, so much anymore.

We worry we do not understand the people that choose to stay within it.
We worry they do not understand why we choose to stay above it.

Things that inspired this story: Augmented Reality, Virtual Reality, group AR simulations such as Pokemon Go, touch-aware clothing, force feedback, cameras, social and class and technological diffusion dynamics of the 21st century, self-adjusting feedback engines, communal escapism, cults. 

Import AI: #96: Seeing heartbeats with DeepPhys, better synthetic images via SAGAN, and spotting pedestrians via a trans-European dataset

by Jack Clark

Satellite imagery competition challenges systems to outline buildings, segment roads, and analyze land use patterns:
…DeepGlobe competition and associated datasets designed to speed progress on strategic domain…
Researchers with Facebook, DigitalGlobe, CosmiQ Works, Wageningen University, and the MIT Media Lab have revealed DeepGlobe 2018, a satellite imagery competition with three tasks and associated datasets. DeepGlobe is intended to yield improvements in the automated analysis of satellite images for disaster response, planning, and object detection. DeepGlobe 2018 has three tracks with linked datasets: road extraction (8,570 images), building detection (24,586 ‘scenes’, equivalent to a 650×650 image), and land cover classification (1,146 satellite images).
  Results: The researchers introduce some baseline performance numbers for each task; for road extraction they used a modified version of DeepLab with a ResNet18 backbone and Focal Loss, obtaining an Intersection over Union (IoU) score of 0.545; for building detection they used the top scoring solutions from a competition held on the same dataset in 2017, which obtain IoU scores of as high as .88 on cities like Las Vegas and as low as 0.54 on Khartoum; for land cover classification they implement a DeepLab system with a ResNet18 backbone and atrous spatial pyramid pooling (ASPP) to obtain an IoU scoe of 0.43.
  Why it matters: AI will increase the automated analysis capabilities people and nations can wield over their satellite imagery repositories. Progress in this domain directly influences geopolitics by giving rise to new techniques that different nations can use in conjunction with satellite data to watch and react to the world.
  Read more: DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images (Arxiv).

Trans-Europe Express!: Researchers release diverse ‘EuroCity’ dataset:
…31 cities across 12 countries yields a diverse dataset containing people in a huge variety of contexts…
Researchers with the Environment Perception Group at carmaker Daimler AG and the Intelligent Vehicles Group at TU Delft have released EuroCity, a large-scale dataset for object and pedestrian detection within urban scenes. EuroCity comprises 45,000 distinct images containing more than a hundred thousand pedestrians in weather settings ranging from dry to wet. The dataset is one of the largest and most diverse yet released for detecting people in urban scenes and will be of particular interest to self-driving car developers.
  Data: The researchers collected the data via a two megapixel camera installed on the dashboard of their car which they drove through 31 cities in 12 European countries.. The dataset’s diversity may help with generalization; results indicate that pre-training on the dataset substantially improved performance when transferring to solve tasks within the more widely-used CityPersons and KITTI datasets.
  Annotations: Pedestrians and vehicle riders are annotated in the dataset. If a rider, they are also annotated with sub-labels to describe their vehicle, such as bicycle, buggy, motorbike, scooter, tricycle, and wheelchair. The researchers also annotate confounding images, like posters that depict people, or images that catch reflections of people in windows, and additional phenomena like lens flares, motion blurs, raindrops, and so on. Annotations were performed via hand.
  Baselines: Four approaches – R-CNN, R-FCN, SSD, and YOLOv3 – are tested on the dataset to create baseline performance figures. Different variants of R-CNN perform best on all three tasks, followed by the performance of YOLOv3. “Processing rates for the R-FCN, Faster R-CNN, SSD and YOLOv3 on non-upscaled test images were 1.2 fps, 1.7 fps, 2.4 fps and 3.8 fps, respectively, on a Intel(R) Core i7-5960X CPU 3.00 GHz processor and a NVidia GeForce GTX TITAN X with 12.2 GB memory”.
  Why it matters: Datasets tend to motivate work on problems contained within them. Given the breadth and scale of EuroCity, it’s likely its release will improve the state-of-the-art when it comes to pedestrian detection in busy or partially occluded scenes. It also hints at a future where hundreds of thousands of cars with dash cams are used to grow and augment continent-scale datasets.
  Read more: The EuroCity Persons Dataset: A Novel Benchmark for Object Detection (Arxiv).

“I can guess your heart rate!” (with DeepPhys):
…Trained system predicts your heart beat from pixel inputs alone…
MIT and Microsoft researchers have built DeepPhys, a network that can crudely predict a person’s heart rate and breathing rate from RGB or infrared videos. They developed the network by building a couple of specific classification models based on domain knowledge about how to detect and analyze skin appearance and changes over time to better infer underlying biological phenomena.
  Results: The researchers test their system on four datasets, three recorded under controlled and uncontrolled lighting conditions, and the fourth involving infrared. Their approach outperforms other systems on a variety of evaluation criteria. Additionally, further tests showed that training the system on diverse data inputs can lead to better performance. “The performance improvements were especially good for the tasks with increasing range and angular velocities of head rotation,” they write. “We attribute this improvement to the end-to-end nature of the model which is able to learn an improved mapping between the video color and motion information”.
  Why it matters: Systems like this bring us closer to a world where the majority of cameras around us are performing a multitude of different analysis tasks, including ones we may not suspect are possible, like predicting our heart rate from images taken from security camera feeds.
  Read more: DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks (Arxiv).

Funny dogs no more! Google and Rutgers introduce ‘SAGAN’:
…Want to be a better artist? Look inside yourself…
One of the classic problems with GAN-generated images is the number of dog legs. What I mean by that is though these systems have become adept in recent years at generating synthetic imagery in a bunch of different domains, they’ve remained stubbornly bad at modeling aspects of images that require a holistic understanding of the whole – like getting the number of legs right on a dogs body, or figuring out the correct physical dimensions of a cat’s tail and paw relationship, and so on.  “While the state-of-the-art ImageNet GAN model excels at synthesizing image classes with few structural constraints (e.g. ocean, sky and landscape classes, which are distinguished more by texture than by geometry), it fails to capture geometric or structural patterns that occur consistently in some classes (for example, dogs are often drawn with realistic fur texture but without clearly defined separate feet),” the researchers explain.
  Attention to the rescue: The researchers, which include GAN-inventor Ian Goodfellow, get around this issue by implementing what they call a Self-Attention Generative Adversarial Network (SAGAN). A SAGAN works by pairing a self-attention mechanism with the traditional machinery of GAN. “The self-attention module is complementary to convolutions and helps with modeling long range, multi-level dependencies across image regions. Armed with self-attention, the generator can draw images in which fine details at every location are carefully coordinated with fine details in distant portions of the image,” they write.
  Results: The resulting systems dramatically outperform other approaches when assessed by the Inception score (which measures the KL divergence between the conditional class distribution and the marginal class distribution, where a higher score indicates between quality), with SAGAN obtaining a score of 52.52 compared to 36.8 for the prior best published result. It attains similarly impressive scores when assessed via Frechet Inception Distance (FID).
  Why it matters: Attention is a simple idea that has come to dominate a huge amount of AI research lately. SAGAN provides further evidence for the generality and applicability of the technique. The work also suggests that progress in automated image synthesis is going to continue, and I worry that society isn’t quite prepared for what having all these cheap, convincing digital fakes means.
  Read more: Self-Attention Generative Adversarial Networks (Arxiv).

Big Empiricism: Google carries out major ImageNet transfer learning experiment:
…Well-performing ImageNet models can aid transfer learning, but not as much as people had intuited…
New research from Google comprehensively tests the idea that models which attain higher scores on the widely-used ‘ImageNet’ dataset will tend to have good properties when used as inputs for transfer learning and domain adaptation. The research evaluates 13 ImageNet-trained classification models on 12 image classification tasks in three settings: as fixed feature extractors, as aids for fine-tuning, and using networks trained from random initialization.
  Results: ImageNet performance is at best weakly predictive of good out-of-the-box performance on other tasks, though confidence increases with fine-tuning.
  Why it matters: Experiments like these enlarge our understanding of transfer learning within AI, which is a crucial problem that needs to be dealt with to build more capable systems. “Is the general enterprise of learning widely-useful features doomed to suffer the same fate as feature engineering in computer vision?” the authors wonder. “It is not entirely surprising that features learned on one dataset benefit from some other amount of adaptation (i.e., fine-tuning) when applied to another. It is, however, surprising that features learned from a large dataset cannot always be profitably adapted to much smaller datasets.”
  Big Empiricism: I’d categorize this type of research as a buzzword (you’ve been warned!) I’ve been mentally using called ‘Big Empiricism’. This sort of research tends to work by taking an existing technique and scaling it up to unprecedented levels to test its performance in large domains, or by taking a well-received idea and testing it via large costly experiments with multiple permutations. Other examples of work here include papers like ‘Regularized Evolution for Image Classifier Architecture Search (ImportAI #81) or the original Neural Architecture Search paper, Evolution Strategies, and Exploring the Limits of Language Modeling (among many other worthy examples).
  I do think this gives credence to the argument that AI science is bifurcating into two distinct tracks, with many organizations participating in basic AI research and a few (typically wealthy) ones exploring questions that require access to very large and expensive computers.
  Read more: Do Better ImageNet Models Transfer Better? (Arxiv).

##########
AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Australia’s Chief Scientist proposes ethics ‘stamp’ for AI:

In a speech this week, Dr Alan Finkel, the Chief Scientist of Australia, proposed a mark which would identify products with AI components. Finkel says his idea, the “Turing Stamp”, would require companies to meet an independently audited set of standards, in the model of ‘Fair Trade’ or ‘Organic’ stamps on food.
  Why it matters: If this worked it could represent a mechanism for incentivizing ethical AI development without top-down regulation by introducing ethics into competition between firms. Oren Etzioni, the CEO of the Allen Institute for Artificial Intelligence Research, has similarly argued that digital chatbots should have to identify themselves when talking to humans. However, distinguishing clearly between “AI” and “computer” would likely prove to be a challenge for regulators, so I’d worry about a rapid proliferation of labels.
  Read more: Government should lead AI certification: Finkel (Government News).

Lessons from the Alvey Programme:
…What happened the last time the UK government dumped money into AI, and what parts of that were most helpful?…
In the appendices of the recent House of Lords report on AI is a discussion of a historic government attempt to stimulate the UK’s AI industry, the ‘Alvey Programme’ in the ‘AI spring’ of the 1980s. In 1983, the government committed £1bn (in today’s money) to AI R&D via an industry-academia partnership very similar to those being put forward today, in the UK and elsewhere.
  What worked: The programme successfully created a community of researchers in AI in the UK, and yielded a prototype for academia-industry collaboration that remains the main model of contemporary government AI R&D programs. Some of the research streams, like the focus on object-oriented programming, would have a lasting impact.
What went wrong: The programme was not deemed a success at the time, and was halted after five years. The goal of translating academic progress into commercial capabilities was not realized, as companies were frequently unwilling, or unable, to make significant investments in these technologies. The authors also point out the lack of communication amongst researchers, due to the absence of a single ‘research centre’ for the program.
  Read more: Lords AI Report (Appendix 4)
   Read more: Lessons from the Alvey Programme for Creating an Innovation Ecosystem for Artificial Intelligence.

Brookings polls 1,500 US adults regarding AI:
…Cautious optimism, concerns about job displacement and privacy, and support for regulation…
The Brookings thinktank recently conducted a survey of ~1,500 US adults on AI attitudes.
  Cautious optimism: 34% expect AI to make day-to-day life easier, vs 13% expect it to make life harder.
  Concerns about jobs, privacy, and the future of humanity:
– 38% expect AI to reduce jobs, vs. 12% expect job creation.
– 49% expect AI to reduce personal privacy, vs 5% expect AI to increase it, and 12% for no impact.
– 32% think AI represents a threat to humanity vs. 24% for no threat.
  Significant support for regulation: 42% support government regulation of AI, vs 17% opposing it.
  US perceived as world-leader, but China expected to catch up: 21% thought US was the leading country in AI, closely followed by Japan (19%), and China (15%). When asked the same question about 10 years from now, 21% thought US would still be leading, narrowly beating China (20%).
Why it matters: Understanding and managing public attitudes towards AI is an important part of AI policy, but polling on this has been limited so far, with previous work focused more on tracking trends in news coverage, or qualitative methods. More regular polling worldwide to see differences in attitudes over time, and between countries, would be a positive (and cheap!) endeavor.
  Read more: Brookings survey finds worries over AI impact on jobs and personal privacy, concern U.S. will fall behind China.

US cities use Amazon facial recognition software, ACLU objects:
ACLU has used public records requests to reveal how Amazon’s ‘Rekognition’ facial recognition software is being used by US law enforcement agencies in three states…
  What Rekognition can do: The ACLU reveals that Amazon has been renting its AI-powered ‘Rekognition’ software to several US law enforcement agencies. Rekognition is able to identify, track and analyze faces in real time, and recognize up to 100 people per image. It represents a new type of AI-software-as-a-service being developed by companies like Amazon and competes with similar cloud-based image recognition engines from Google, Microsoft, and others.
  Why the ACLU is so concerned: The ACLU says software can be “readily used to violate civil liberties and civil rights”, and envisage scenarios where police can monitor who attends protests, ICE can continuously monitor new immigrants, and cities could routinely track their residents whether they are suspected of criminal activity or not.
  Why it matters: As the US public sector tries to harness AI capabilities it’s going to be forced to enter into more and more procurement relationships with powerful AI companies, many of whom have implicit ideological stances that differ to those of some of their customers (see also: Google and Project Maven.)
  Read more: Amazon Teams Up With Law Enforcement to Deploy Dangerous New Facial Recognition Technology (ACLU)
  Read more: Amazon is selling facial recognition to law enforcement — for a fistful of dollars (Washington Post)

Tech Tales:

On The Surprising Re-Emergence of Board Game Designers as Cultural Arbiters and Controllers

Say you’ve got hundreds of different computers and you’re ordering them to do lots of different things and you want to be able to nudge them occasionally and figure out what they’re doing — what do you do? It’s a hard problem. Lots of the early methods ended up structuring AI organizations as combinations of companies and computation fleets. That period didn’t last long. As the software took over companies changed: human work became less about the specific design of specific details – as the marketing slogan from one of the big tech companies goes, ‘from tooth brushes to silent electric engines, our $auto_bot can do it all. Buy today, earn today!’, instead it’s about figuring out how to let humans easily interface with these machines.

Enter the board game designers. About a decade ago some of the companies discovered internally that they could recast AI-teaching problems into interactive games that people could play in virtual reality, or in cut-down scenarios on their phones. Staff started to ‘work’ by playing games that interfaced with gargantuan learning engines. But it worked. And soon it led to products, all of which relied on this conceit of having the human work by playing a game.

Board games were designed to solve AI-relevant problems, such as:
– Marshaling fleets of anti-poacher drones to survey a large wildlife park
– Optimizing delivery times in a given area while satisfying certain human happiness measures (sometimes known as: brand maximizing) and being able to lightly direct spare vehicles to perform promotional ‘robot intervention’ stunts.
– Evolving a contextual-input orchestra via a hundred musical robots and more than a thousand input streams from various webcams, microphones, pressure-sensitive pads on walkways, etc.
– etc.

Eventually just about every problem got a board game variant. The AI systems these games controlled became ever-smarter as well, so the games became more complicated as well. And in this march to complexity the purpose of the board games changed twice. First, we built games of games – abstract entities that took training to operate which would let one person skillfully conducted thousands of AI systems at once. For a while, this drove society. But as we ran more games and grew more expert in their construction the AIs became smarter as well.

One day the purpose of the games began to change: instead of providing interfaces through which we could change the AIs, the games became interfaces through which the AIs could learn from is. These board games now work like simulations, where we play them and the robot indicates what it is planning to do and we give votes about how we feel about what it is doing, and then sometimes it adopts that behavior, or sometimes it does something else.

Obviously, these sorts of board games are less fun. Something about becoming a pawn on the board instead of the player sitting behind it makes people unwilling to play these games. So the games have got better: now they’re designed to hook us and entrance us, and the machines are learning to experiment on us in this way as well.

So that’s who we’ve ended up with The Suck, our omnipresent nickname for ‘UN-backed AI Interface Cluster 1 Class: ALL_PEOPLE’. The Suck is a board game designed by machines running on casino-mad impulses. During its construction the machines eagerly exhumed ancient propaganda methods from Edward Bernays to tabloids to arcade game machines to casinos to mobile apps to long-since-regulated Social Media Architectures. The machines and the UN officials building The Suck did their job well and now most of the planet spends a few hours a day playing it. We can do anything else we want. No one is forcing us to do it. But, what else are you going to do? It’s fun!

Few of us have a clear sense of what the machines are up to, these days. Shuttles go up and come down. New things are built. The atmospheres are being cleared. Human UN staff occasionally talk to the world to give updates on The Partnership, which is how we refer to this relationship we’ve got with the machines. Most people are pretty cheerful but some see malice in what is most likely just a banal burst of progress: Now is the greatest time to be a board game player in history, and also the most dangerous time, because these board games are after our minds – the_truth_is_out_there forum posting, captured t-9 days from message posting.

Things that inspired this story: The Glass Bead Game, 4x strategy games, interfaces between simulated AIs and human overseers, learning from human preferences.

Import AI: #95: Learning to predict and avoid internet arguments, White House announces Select Committee on AI, and BMW trains cars to safely change lanes

by Jack Clark

Cornell, Google, and Wikimedia researchers train AI to predict when we’ll get angry on the internet:
…Spoiler: Very blunt comments with little attempt made at being polite tend to lead to aggressive conversations…
Have you ever read a comment addressed to you on the internet and decided not to reply because your intuition tells you the person is looking to start a fight? Is it possible to train AI systems to have a similar predictive ability and thereby create automated systems that can flag conversations as having a likelihood of spiraling into aggression? That’s the idea behind new research from Cornell University, Jigsaw, and the Wikimedia Foundation. The research tries to predict troublesome conversations based on a dataset taken from the discussion sections of ‘Wikipedia Talk’ pages.
  Dataset: To carry out the experiment, the researchers gathered a total of 1,270 conversations, half consisting of ones which became aggressive following the initial comments, and half consisting of ones which remained civil. (Categorizing civil versus on-track was done via a combination of the use of Jigsaw’s “Perspective” API, and gathering labels from humans via CrowdFlower.) These conversations had an average length of 4.6 comments.
  How it works: Armed with this dataset, the researchers characterized conversations via what they call “pragmatic devices signalling politeness”. This is a set of features that correspond to whether the conversation includes attempts to be friendly (liberal use of ‘thanks’, ‘please’, and so on), along with words used to indicate a position that welcomes debate (eg, by clarifying statements with phrases like “I believe” or “I think”). They then study the initial comment and see if their system can learn to predict whether it will yield negative comments in the future.
  Results: Humans are successful about 72% of the time at predicting nasty conversations from this dataset. The system designed by these researchers (which relies on logistic regression – nothing too fancy) is about 61.6% accurate, and baselines (bag of words and sentiment lexicon) get around ~56%. (One variant of the proposed technique gets accuracy of 64.9%, but this is a little dubious as it is trained on way more data and it’s unclear whether it is overfitting, as it is also trained on the same data corpus.) The researchers also derive some statistical correlations that could help humans as well as machines better spot comments that are prone to spiral into aggresion. “We find a rough correspondence between linguistic directness and the likelihood of future personal attacks. In particular, comments which contain direct questions, or exhibit sentence initial you (i.e., “2nd person start”), tend to start awry-turning conversations significantly more often than ones that stay on track,” they write. “This effect coheres with our intuition that directness signals some latent hostility from the conversation’s initiator, and perhaps reinforces the forcefulness of contentious impositions.”
  Why it matters: Systems like this show how with a relatively small amount of data it is possible to build classification systems that can, if paired with the right features, effectively categorize subtle human interactions online. While here such a system is used to do something that seems to be for the purpose of social good (figuring out how to identify and potentially avoid aggressive conversations), it’s worth remembering that a very similar approach could be used to, for instance, identify conversations where initial comments could correlate to conversations that have a high chance of displaying political views that are contrary to those views of the people building such systems, and so on. It would be nice to see an acknowledgement of this in the paper itself.
  Read more: Conversations Gone Awry: Detecting Early Signs of Conversational Failure (Arxiv).

Chinese researchers tackle Dota-like game King of Glory with RL + MCTS:
Tencent researchers take inspiration from AlphaGo Zero to tackle Chinese MOBA King of Glory…
Modern multiplayer strategy games are becoming a testbed for reinforcement learning and multi-agent algorithms. Following work by Facebook and DeepMind on StarCraft 1 and 2, and work by OpenAI on Dota, researchers with the University of Pittsburgh and Tencent AI Lab have published details on an AI technique which they evaluate on King of Glory, a Tencent-made massively multiplayer online battle arena (MOBA) game. The proposed system uses Monte Carlo Tree Search (MCTS – a technique also crucial to DeepMind’s work on tackling the board game Go) and incorporates techniques from AlphaGo Zero to “to produce a stronger tree search using previous tree results”. “Our proposed algorithm is a provably near-optimal variant (and in some respects, generalization) of the AlphaGo Zero algorithm” they write.
  Results: The researchers test out their technique within King of Glory by evaluating agents trained with their technique against other agents controlled by the in-game AI. They also test it against four variants of their proposed technique which, respectively: have no rollouts; use direct policy iteration; implement approximate value iteration; and one trained via supervised learning on 100,000 state-action pairs of human gameplay data. (This also functions as a basic ablation study of the proposed technique, also). Their system beats all of these approaches, with the closest competitor being the variant with no rollouts (this one also looks most similar to AlphaGo Zero).
  Things that make you go hmmm: Researchers still tend to attack problems like this by training the AI systems over a multitude of hand-selected features, so it’s not like these algorithms are automatically inferring optimal inputs from which to learn from. “The state variable of the system is taken to be a 41-dimensional vector containing information obtained directly from the game engine, including hero locations, hero health, minion health, hero skill state, and relative locations to various structures,” they write. A lot of human ingenuity goes into selecting these inputs and likely adjusting hyperparameters to denote the importance of any particular input, so there’s a significant unacknowledged human component to this work.
  Why it matters: This paper provides more evidence that AI researchers are going to use increasingly modern, sophisticated games to test and evaluate AI systems. It’s also quite interesting that this work comes from a Chinese AI lab, indicating that these research organizations are pursuing similarly large-scale problems to some labs in the West – there’s more commonality here than I think people presume, and it’d be interesting to see the various researchers come together and discuss ideas in the future about how to tackle even more advanced games.
  Read more: Feedback-Based Tree Search for Reinforcement Learning (Arxiv).

Today’s AI amounts to little more than curve-fitting, says Turing Award winner:
…Judea Pearl is impressed by deep learning success, but worries researchers have become complacent about inability to deal with causality…
Turing Award-winner Judea Pearl is concerned that the AI industry’s current obsession with deep learning is causing it to ignore harder problems, like developing machines that can build causal models of the world. He discusses some of these concerns in an interview with Quanta Magazine to discuss his new book “The Book of Why: The New Science of Cause and Effect“.
  Selected quotes:
– “Mathematics has not developed the asymmetric language required to capture our understanding that if X causes Y that does not mean that causes X.”
– “As much as I look into what’s being done with deep learning, I see they’re all stuck there on the level of associations. Curve fitting.”
– “We did not expect that so many problems could be solved by pure curve fitting. It turns out they can. But I’m asking about the future — what next? Can you have a robot scientist that would plan an experiment and find new answers to pending scientific questions? That’s the next step.”
– “The first step, one that will take place in maybe 10 years, is that conceptual models of reality will be programmed by humans..the next step will be that machines will postulate such models on their own and will verify and refine them based on empirical evidence.
Read more: To Build Truly Intelligent Machines, Teach Them Cause and Effect (Arxiv).

Google prepares auto-email service “Smart Compose”:
…Surprisingly simple components lead to powerful things, given enough data…Google researchers have outlined the technology they’ve used to create ‘Smart Compose’, a new service within Gmail that will automatically compose emails for people as they type them. The main ingredients are a Bag of Words model and a Recurrent Neural Network Language Model. This combination of technologies leads to a system that is “faster than the seq2seq models with only a slight sacrificed to model prediction quality”. These components are also surprisingly simple, indicating just how much can be achieved when you’ve got access to a scalable technique and a truly massive dataset. Google says that by offloading most of the computation onto TPUs it was able to reduce the average latency to tens of milliseconds – earlier experiments showed it that latencies higher than 100 milliseconds or so led to user dissatisfaction.
  Read more: Smart Compose: Using Neural Networks to Help Write Emails (Google Blog).

White House plans Select Committee on AI:
…Hosts summit between AI and industry experts, reinforces regulatory-light approach to tech…
The White House recently hosted a “Summit on AI for American Industry”, bringing together industry, academia, and government, to discuss how to support and further artificial intelligence in America. A published summary of the event from the Office of Science and Technology Policy highlights some of the steps this administration has taken with regard to AI – much of the actions include the elevation of AI in White House communications as a strategic area, with more mentions of it in documents ranging from the National Defense and National Security Strategy, to guidance from the Office of Management and Budget (OMB) given to agencies.
  Select Committee on AI: The White House will create a “Select Committee on Artificial Intelligence”, which will primarily be comprised of “the most senior R&D officials in the Federal government”. This committee will advise the White House, facilitate partnerships with industry and academia, enhance coordination across the Federal government on AI R&D, and identify ways to use government data and compute resources to support AI. The committee will feature staff from OSTP, the National Science Foundation, the Defense Advanced Research Projects Agency, the director of IARPA, and others. The committee may call upon the private sector as well, according to its charter.
  Regulation: In prepared remarks OSTP Deputy US Chief Technology Officer Michael Kratsios said “Our Administration is not in the business of conquering imaginary beasts. We will not try to “solve” problems that don’t exist. To the greatest degree possible, we will allow scientists and technologists to freely develop their next great inventions right here in the United States. Command-control policies will never be able to keep up. Nor will we limit ourselves with international commitments rooted in fear of worst-case scenarios.”
  Why it matters: Around the world, countries are enacting broad national strategies relating to artificial intelligence. France has committed substantially far more funding relative to its existing funding amount to AI than other countries, and China (which by virtue of its governance structure will tend to out-spend Western countries on broad science and technology developments) has committed many additional billions of dollars of funding to AI. It remains to be seen whether the US’s strategy of leaving the vast amount of AI development to the private sector is the optimal decision, given the immense opportunities the technology holds and its demonstrable responsiveness to additional infusions of money. America also has some problems with its AI ecosystem that aren’t being dealt with today, like the fact that many of academia’s most creative professors are being drawn into industry at the same time as class sizes for undergraduate and graduate AI courses are booming and PHD applications are spiking, reducing the quality of US education in AI. It’d be interesting to see what kinds of recommendations the Select Committee makes and how effective it will be at confronting the contemporary challenges and opportunities faced by the administration with regard to US AI competitiveness.
  Read more: Summary of the 2018 White House Summit on Artificial Intelligence for American Industry (White House OSTP, PDF)

Democrat Representative calls for National AI Strategy:
…Points to European, French, Chinese efforts as justification for US action…
Congressman John Delaney (Maryland) has written an op-ed in Wired calling for a National AI Strategy for the US. Delaney has himself co-sponsored a bill (along with Republican and Democrat congresspeople and senators) calling for the creation of a commission to device such a strategy, called the FUTURE of AI Act (Fundamentally Understanding the Usability and Realistic Evolution of Artificial Intelligence Act).
 Selected quotes:
– “The United States needs a full assessment of the state of American research and technology, and what the short and long-term problems and opportunities are.”
– “Whether you are a conservative or a progressive, this future is coming. As I look at where the world is headed, I believe that we need to expand public investment in research, encourage collaboration between the public and private sector, and make sure that AI is deployed in a way that is wholly consistent with our values and with existing laws.”
– ” If the US doesn’t act, we’re in danger of falling behind.”
  Why it matters: Societies across the world are changing as a consequence of the deployment of artificial intelligence, whether through unparallelled opportunities for providing better healthcare and accessibility services to citizens, to being able to utilize the same technologies for surveillance and various national security purposes. It seems to intuitively make sense to survey the whole AI field and look for ways that a country can implement a holistic plan. It seems likely that there will be a bunch of complementary initiatives in the US, ranging from targeted actions like those espoused by the OSTP, to broader analyses performed by other parties, like the Senate, or government agencies.
   Read more: France, China, and the EU all have an AI strategy, shouldn’t the US? (Wired Opinion).

Learning to lane change with recurrent neural networks:
…BMW researchers try to teach safe driving via seq2seq learning…
Researchers with car company BMW and the technical university of Munich in Germany have trained simulated self-driving car AI agents in a crude simulation to learn how to lane change safely. They achieve this by implementing a bidirectional RNN with long short-term memory, which learns to predict the velocity of a car and its surrounding neighbors at any point in time, then uses this prediction to work out if it will be safe for the vehicle to change into another lane.
  Results: The system is evaluated against the NGSIM dataset, a detailed traffic dataset taken from monitoring real traffic in LA in the mid-2000s. It outperforms other baselines but, given the restricted nature of the domain, the lack of an ability to compare performance against (secret) systems developed by other automotive experts, and the absence of integration with a deeper car simulation, it’s not clear how well this result will transfer to real domains.
  Why it matters: All cars are becoming significantly more automated, regardless of the overall maturity of full self-driving car technology. Papers like this give us a view into the development of increasingly automated vehicular systems that use components developed by the rest of the AI community.
  Read more: Situation Assessment for Planning Lane Changes: Combining Recurrent Models and Prediction (Arxiv).

Tech Tales:

Billionaire Cities

I guess we should have expected them, these billionaire cities. They started sprouting up after the price of basic space travel came down enough for billionaires to build their own launchpads, letting them mesh their business and life enough to create miniature cities to tend to their numerous inter-locking businesses. Many of these cities were built in places far above sea level, in preparation for an expected dire climate future.

These cities always had a few common components: a datacenter to host secure data and compute services, frequently also running local artificial intelligence services; automated transit systems to ferry people around; fleets of drones providing constant logistics and surveillance abilities; goods robots for heavy lifting; robotic chefs; and even a few teams of humans, which tended to these machines or spoke to other humans or worked in some other manner for the billionaire.

These cities grew as the billionaires (and eventually trillionairs) competed with eachother to build ever more sophisticated and ever more automated systems. Soon after this competition began, we heard the first rumors of the brain-interface projects.

Teams of people were said to be hired by these billionaires to work within these by-now almost entirely automated gleaming cities. The people were paid gigantic sums of money to sign themselves away for contracts of two to three years, and to be discrete about it. Then the billionaire would fly-in teams of surgeons and have them perform brain surgery on the people, giving them interfaces that let them plug in to the data feeds of the city, intuitively sensing them and being able to eventually learn to understand them. It was said that arrangements of this kind, with the digital AI of the city and the augmented human brains interlinked, led to superior performance and flexibility to other systems.

We have recently heard rumors of other things – longer contracts, more elaborate surgeries, but those are as yet unsubstantiated.

Things that inspired this story: Brain-machine interfaces, Gini coefficient, spaceships with VTOL capability, cybernetics.

Import AI: #94: Google Duplex generates automation anxiety backlash, researchers show how easy it is to make a 3D-printed autonomous drone, Microsoft sells voice cloning services.

by Jack Clark

Microsoft sells “custom voice” speech synthesis:
…The commercial voice cloning era arrives…
Microsoft will soon sell “Custom Voice” a system to let businesses give their application a “one-of-a-kind, recognizable brand voice, with no coding required”. This product follows various research breakthroughs in the area of speech synthesis and speech cloning, like work from Baidu on voice cloning, and work from Google and DeepMind on speech synthesis.
  Why it matters: As the Google ‘Duplex’ system shows, the era of capable, realistic-sounding natural language systems is arriving. It’s going to be crucial to run as many experiments in society as possible to see how people react to automated systems in different domains. Being able to customize the voice of any given system to a particular context seems like a necessary ingredient for further acceptance of AI systems by the world.
  Read more: Custom Voice (Microsoft).

Teaching neural networks to perform low-light amplification:
…Another case where data + learnable components beats hand-designed algorithms…
Researchers with UIUC and Intel Labs have released a dataset for training image processing systems to take in images that are so dark as to be imperceptible to humans and to automatically process those images so that they’re human-visible. The resulting system can be used to amplify low-light images by up to 300 times while displaying meaningful noise reduction and low levels of color transformation.
  Dataset: The researchers collect and publish the ‘See-in-the-Dark’ (SID) dataset, which contains 5094 raw short exposure images, each with a corresponding long-exposure reference image. This dataset spans around 400 distinct scenes, as they also produce some bursts of short exposure images of the same scene.
  Technique: The researchers tested out their system using a multi-scale aggregation network and a U-net (both networks were selected for their ability to process full-resolution images at 4240×2832 or 6000×4000 in GPU memory). They trained networks by pairing the raw data of the short-exposure image with the corresponding long-exposure image(s). They applied random flipping and rotation for data augmentation, also.
  Results: They compared the results of their network with the output of BM3D, a non-naive denoising algorithm, and a burst denoising technique, and used Amazon’s Mechanical Turk platform to poll people on which images they preferred. Users overwhelmingly preferred the images resulting from the technique described in the paper compared to BM3D, and in some cases preferred images generated by this technique to those created by the burst method.
  Why it matters: Techniques like this show how we can use neural networks to change how we solve problems from developing specific hand-tuned single-purpose algorithms, to instead learning to effectively mix and match various trainable components and data inputs to solve general problem classes. In the future it’d be interesting if the researchers could further cut the time it takes the trained system to process each image as this would make a real-time view possible, potentially giving people another way to see in the dark.
  Read more: Learning-to-See-in-the-Dark (GitHub).
  Read more: Learning to See in the Dark (Arxiv).

Google researchers try to boost AI performance via in-graph computation:
…As the AI world relies on more distributed, parallel execution, our need for new systems increases…
Google researchers have outlined many of the steps they’ve taken to improve components in the TensorFlow language to let them execute more aspects of a distributed AI job within the same computation graph. This increases the performance and efficiency of algorithms, and shows how AI’s tendency towards mass distribution and parallelism is driving significant changes in how we program things (see also: Andrej Karpathy’s “Software 2.0” thesis.)
  The main idea explored in the paper is how to distribute a modern machine learning job in such a way it can seamlessly run across CPUs, GPUs, TPUs, and other novel chip architectures. This is trickier than it sounds, since within a large-scale, contemporary job there are typically a multitude of components which need to interact with eachother, sometimes multiple times. This has caused Google to extend and refine various TensorFlow components to better support plotting all the computations within a model on the same computational graph, which lets it optimize the graph for underlying architectures. That differs to traditional approaches which usually involve specifying aspects of the execution in a separate block of code usually written in the control logic of the application (eg, invoking various AI modules written in TensorFlow within a big chunk of Python code, as opposed to executing everything within a big unified TF lump of code.
  Results: There’s some preliminary evidence that this approach can have significant benefits. “A baseline implementation of DQN without dynamic control flow requires conditional execution to be driven sequentially from the client program. The in-graph approach fuses all steps of the DQN algorithm into a single dataflow graph with dynamic control flow, which is invoked once per interaction with the reinforcement learning environment. Thus, this approach allows the entire computation to stay inside the system runtime, and enables parallel execution, including the overlapping of I/O with other work on a GPU. It yields a speedup of 21% over the baseline. Qualitatively, users report that the in-graph approach yields a more self-contained and deployable DQN implementation; the algorithm is encapsulated in the dataflow graph, rather than split between the dataflow graph and code in the host language,” write the researchers.
  Read more: Dynamic Control Flow in Large-Scale Machine Learning (Arxiv).
  Read more: Software 2.0 (Andrej Karpathy).

Google tries to automate rote customer service with Duplex:
…New service sees Google accidentally take people for a hike through the uncanny valley of AI…
Google has revealed Duplex, an AI system that uses language modelling, speech recognition, and speech synthesis to automate tasks like booking appointments at hair salons, or reserving tables at restaurants. Duplex will let Google’s own automated AI systems talk directly to humans at other businesses, letting the company automate human interactions and also more easily harvest data from the messy real world.
  How it works: “The network uses the output of Google’s automatic speech recognition (ASR) technology, as well as features from the audio, the history of the conversation, the parameters of the conversation (e.g. the desired service for an appointment, or the current time of day) and more. We trained our understanding model separately for each task, but leveraged the shared corpus across tasks,” Google writes. Speech synthesis is achieved via both Tacotron and Wavenet (systems developed respectively by Google Brain and by DeepMind). It also uses human traits, like “hmm”s and “uh”s, to sound more natural to humans on the other end.
  Data harvesting: One use of the system is to help Google harvest more information from the world, for instance by autonomously calling up businesses and finding out their opening hours, then digitizing this information and making it available through Google.
  Accessibility: The system could be potentially useful for people with accessibility needs, like those with hearing impairments, and could potentially work in other languages, where you might ask Duplex to accomplish something and then it will use a local language to interface with a local business.
  The creepy uncanny valley of AI: Though Google Duplex is an impressive demonstration of advancements in AI technology, its announcement also elicited a lot of concern from a lot of people who worried that it will be used to further automated more jobs, and that it is pretty dubious ethically to have an AI talk to (typically poorly paid) people and harvest information from them without identifying itself as the AI appendage of a fantastically profitable multinational tech company. Google responded to some of these concerns by subsequently saying Duplex will identify itself as an AI system when talking to people, though hasn’t given more details on what this will look like in practice.
  Why it matters: Systems like Duplex show how AI is going to increasingly be used to automate aspects of day-to-day life that were previously solely the domain of person-to-person interactions. I think it’s this use case that triggered the (quite high) amount of criticism of the service, as people grow worried that the rate of progress in AI doesn’t quite match the rate of wider progress in the infrastructure of society.
  Read more: Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone (Google Blog).
  Read more: Google Grapples With ‘Horrifying’ Reaction to Uncanny AI Tech (Bloomberg).

Palm-sized auto-navigation drones are closer than you think:
…The era of the cheap, smart, mobile, 3D-printable nanodrones cometh…
Researchers with ETH Zurich, the University of Zurich, and the University of Bologna have shown how to squeeze a crude drone-navigation neural network onto an ultra-portable 3D-printed ‘nanodrone’. The research indicates how drones are going to evolve in the future and serves as a proof-of-concept for how low-cost electronics, 3D printing, and widely available open source components can let people create surprisingly capable and potentially (though this is not discussed in the research but is clearly possible from a technical standpoint) dangerous machines. “In this work, we present what, to the best of our knowledge, is the first deployment of a state-of-art, fully autonomous vision-based navigation system based on deep learning on top of a UAV compute node consuming less than 94 mW at peak, fully integrated within an open source COTS CrazyFlie 2.0 UAV,” the researchers write. “Our system is based on GAP8, a novel parallel ultra-low-power computing platform, and deployed on a 27g commercial, open source CrazyFlie 2.0 nano-quadrotor”.
  Approach: To get this system to work the researchers needed to carefully select and integrate a neural network with a ultra-low-power processor. The integration work included designing the various processing stages of the selected neural network to be as computationally efficient as possible, which required them to modify an existing ‘DroNet’ model to further reduce memory use. The resulting drone is able to run DroNet at 12frames-per-second, which is sufficient for real-time navigation and collision avoidance.
  Why it matters: Though this proof-of-concept is somewhat primitive in capability it shows how capable and widely deployable basic neural network systems like ‘DroNet’ are becoming. In the future, we’ll be able to train such systems over more data and use more computers to train larger (and therefore more capable) models. If we’re also able to improve our ability to compress these models and deploy them into the world, then we’ll soon live in an era of DIY autonomous machines.
  Read more: Ultra Low Power Deep-Learning-powered Autonomous Nano Drones (Arxiv).

OpenAI Bits & Pieces:

Jack Clark speaking in London on 18th May:
I’m going to be speaking in London on Friday at the AI & Politics meetup, in which I’ll talk about some of the policy challenges inherent to artificial intelligence. Come along! Beer! Puzzles! Fiendish problems!
  Read more: AI & Politics Episode VIII – Policy Puzzles with Jack Clark (Eventbrite).

Tech Tales:

Amusement Park for One.

[Extract from an e-flyer for the premium tier of “experiences at Robotland”, a theme park built over a superfund site in America.]

Before you arrival at ROBOTLAND you will receive a call from our automated customer success agent to your own personal AI (or yourself, please indicate a preference at the end of this form). This agent will learn about your desires and will use this to build a unique psychographic profile of you which will be privately transmitted to our patented ‘Oz Park’ (OP) experience-design system. ROBOTLAND contains over 10,000 uniquely configurable robotic platforms, each of which can be modified according to your specific needs. To give you an idea of the range of experiences we have generated in the past, here are the names of some previous events hosted at ROBOTLAND and developed through our OP system: Metal Noah’s Ark, Robot Fox Hunting, post-Rise of the Machines Escape Game, Pagan Transformers, and Dominance Simulation Among Thirteen Distinct Phenotypes with Additional Weaponry.

Things that inspired this story: Google Duplex, robots, George Saunders’ short stories, Disneyland, direct mail copywriting.  

 

Import AI #93: Facebook boosts image recognition by pre-training on a billion photos, better robot transfer learning via domain randomization, and Alibaba-linked researchers improve bin-packing with AI

by Jack Clark

Classifying trees with a DJI drone and a lot of patience:
…Consumer-grade drones shown to be able to gather necessarily detailed data for tree species classification…
Japanese researchers have shown that consumer-grade drone cameras are of sufficient quality to gather RGB images of trees and use these to train an AI model to distinguish between different species.
  Details: The researchers gathered their data via a drone test flight in late 2016 in the forest located in the the Kamigamo Experimental Station in Kyoto, Japan. They used a commodity consumer drone (a DJI Phantom 4) alongside proprietary software for navigation (DroneDeploy) and image editing (Agisoft Photoscan Professional).
  Results: The resulting trained model can classify five of a possible six types of tree with close to 90%+ accuracy. The researchers improved the performance of the classifier by copying and augmenting the input data.
  Why it matters: One of the most powerful aspects of modern AI is its ability to perform effective classification of anything you can put together a training dataset for. Research like this points to a future where drones and other robots are use to periodically scan and classify the world around us, offering us new capabilities in areas like flora and fauna management, disaster response, and so on.
  Read more: Automatic classification of trees using a UAV onboard camera and deep learning (Arxiv).

What does AGI safety research man and who is doing it?
…What AI safety is, how the field is progressing, and where it’s going next…
Researchers at Australian National University (including Marcus Hutter) have surveyed the field of artificial intelligence providing an overview of the differences and overlaps between various AGI initiatives. The paper also contains a distillation of why people bother to work on AI safety: “if we want an artificial general intelligence to pursue goals that we approve of, we better make sure that we design the AGI to pursue such goals: Beneficial goals will not emerge automatically as the system gets smarter,” the researchers write.
  Problems, problems everywhere: The paper includes a reasonably thorough overview of the different AGI safety research agendas pursued by organizations like MIRI, OpenAI, DeepMind, the Future of Life Institute, and so on. The tl;dr: there are lots of distinct problems relating to AI safety, and OpenAI and DeepMind teams have quite a lot of overlap in terms of research specializations.
  Policy puzzles: “It could be said that public policy on AGI does not exist,” the researchers write, before noting that there are several preliminary attempts at creating AI policy (including the recent ‘Malicious Actors’ report), while observing that much of the current public narrative (the emergence of an AI arms race between US and China) runs counter to most of the policy suggestions put forward by the AI community.
  Read more: AGI Safety Literature Review (Arxiv).

Why your next Alibaba delivery could be arranged by an AI:
…Chinese researchers show how to learn effective bin-packing…
Chinese researchers with the Artificial Intelligence Department of Zhejiang Cainiao Supply Chain Management Co. achieved state-of-the-art results on a 3D pin-packing problem (BPP) via the use of multi-task learning techniques. In this work, they try to define a system that can figure out the optimum way to stack objects to fit into a box whose proportions can also be learned and specified by the algorithm. BPP might sound boring – after all, this is the science of packing things in boxes – but it’s a crucial task to logistics and e-retail, so figuring out systems to adaptively learn to do packing of arbitrary numbers of goods in an optimal way seems useful.
  Data: The researchers gather the data from an unnamed E-commerce platform and logistics platform (though one of the researchers is from Alibaba, so there’s a high likelihood the data comes from there) to create a dataset consisting of 15,000 training items and 15,000 testing items, spread across orders that involve 8, 10, and 12 distinct items.
  Approach: They structure the problem as a sequence-to-sequence one, with item descriptions being fed as input to an LSTM encoder with the decoder output corresponding to the item details and the orientation in the box.
  Resuts: Models trained by the researchers obtain substantially higher accuracy than prior baselines, though not many people publicly compete in this area yet so I’m unsure as to how progress will change over time.
  Read more: A Multi-task Selected Learning Approach for Solving New Type 3D Bin Packing Problem (Arxiv).

Facebook auto-translation option into Messenger:
…”M Translations” feature will let people converse across language gaps…
Facebook has added automatic translation to Facebook Messenger. Translation like this may generate new business opportunities for the company – “at launch, M translations will translate from English to Spanish (and vice-versa) and be available in Marketplace conversations between buyers and sellers in the United States,” the company said.
  Read more: Messenger at F8 – App review re-opens, New products for Businesses and Developers launch (FB Messenger blog).

A neural net to understand and approximate the Universe:
…Particle physics collides with artificial intelligence…
Harvard researchers show how they use neural networks to analyze the movements of particles in jets. Neural networks are useful tools to apply to analyzing multi-variant problems like these, because they can learn to compute the probability distribution generating the data they observe, and therefore over time generate an interpretation of the forces governing system.
  “We scaffold the neural network architecture around a leading-order description of the physics underlying the data, from first input all the way to final output. Specifically, we base the JUNIPR framework on algorithmic jet clustering trees,” they explain. “The JUNIPR framework yields a probabilistic model, not a generative model. The probabilistic model allows us to directly compute the probability density of an individual jet, as defined by its set of constituent particle momenta”.
  Results: The scientists use the JUNIPR model to better analyze and predict patterns in the streams of data generated by large-scale physics experiments, and to potentially approximate things for which we have a poor understanding of the underlying system, like analyzing heavy ion collisions.
  Read more: JUNIPR: a Framework for Unsupervised Machine Learning in Particle Physics (Arxiv).

Google researchers report reasonable sim2real transfer learning:
…Researchers cross the reality gap with domain randomization, high-fidelity simulation, and clever Minitaur robots…
Google researchers have trained a simple robot to walk within a simulation then transferred this learned behavior onto a real-world robot. This is a meaningful achievement in the field of applying modern AI techniques to robotics, as frequently policies learned in simulation will fail to successfully transfer to the real world.
  The researchers use “Minitaur” robots, four-legged machines capable of walking, running, jumping, and so on. They frame the problem of learning to walk as a Partially Observable Markov Decision Process (POMDP) because certain states, like the position of the Minitaur’s base or the foot contact forces, are not accessible due to a lack of sensors. The Google researchers achieve their transfer feat by increasing the resolution of their physics simulator, and applying several domain randomization techniques to expose the trained models to enough variety that they can generalize.
  The surprising expense of real robots: To increase the resolution of the simulator the researchers needed to build a better model of their robot. How did they do this?  “We disassemble a Minitaur, measure the dimension, weigh the mass, find the center of mass of each body link and incorporate this information into the [Unified Robot Description Format] URDF file”, they write. That hints at why working with real world stuff always introduces difficulties not encountered during the cleaner process of working purely in simulation.
  Results: The researchers successfully train and transfer policies which make the real robot gallop and trot around a drably-carpeted room somewhere in the Googleplex. Gaits learned by their AI models are roughly as fast as expert hand-made ones while consuming significantly less power: 35% less for galloping, 23% less for trotting.
  Read more: Sim-to-Real: Learning Agile Locomotion For Quadruped Robots (Arxiv).

How Facebook uses your image hashtags to improve image recognition accuracy:
New state-of-the-art score on ImageNet benefits from pre-training on over a billion images and a thousand user-derived hashtags…
Facebook researchers have set a new state-of-the-art score for image recognition (top-1 accuracy of 85.4 percent) on the ‘ImageNet’ dataset by pre-training across a billion images augmented by 1,500-user labeled hashtags. They also saw such an approach lead to increased performance on the image captioning ‘COCO’ challenge as well.
  More data doesn’t always mean better results: The researchers note that when they pre-trained the system across a billion images annotated with 17,000 hashtags they saw less of a performance improvement than when they used the same quantity of images with a shrunk set of 1,500 hashtags that had been curated to match pre-existing ImageNet classes. This shows how the additional of weakly-supervised signals can dramatically boost performance but requires researchers to run empirical tests to ensure that the structuring of the weekly-supervized data is calibrated to maximize performance.
  Scale: The researchers note that, despite using a system that can train across up to 336 GPUs, they could still scale-up models further to better harvest information from a larger corpus of 3.5 billion images uploaded to social media.
  Read more: Advancing state-of-the-art image recognition with deep learning on hashtags (Facebook Code blog).
  Read more: Exploring the Limits of Weakly Supervised Pretraining (Facebook research paper).

TPU narrowly beats V100 GPU on cost, matches on performance:
…Tests indicate the heterogeneous chip era is here to stay…
RiseML has compared the performance of Google’s custom ‘TPU’ chip against NVIDIA’s v100, indicating that the TPU could have some (slight) performance advantages over traditional GPUs.
Evaluation: The researchers evaluated the chips in two ways: first they studied performance in terms of throughput (images per second) on synthetic data and without data augmentation. Second, they looked at accuracy and convergence of the two implementations of ImageNet.
  Results: TPUs narrowly edge out V100s at throughput when using relatively large batch sizes (1024) when both systems are running ResNets implemented in TensorFlow. However, when using the ‘MXNet’ framework, NVIDIA’s chips slightly out-perform TPUs for throughput. When evaluated on a dollar cost basis TPUs significantly outperform V100s (even when using AWS reserve instances). In tests, the researchers show faster convergence when training an ImageNet classifier on TPUs versus on v100s. Besides price – and it’s hard to know true cost as Google is the only organization selling them – it’s hard to see TPUs having a compelling advantage relative to GPUs, suggesting that the combined billions of dollars of investment in going R&D by NVIDIA may be tough for other organizations to compete with.
  Read more: Comparing Google’s TPUv2 against Nvidia’s V100 on ResNet-50 (Arxiv).

OpenAI Bits & Pieces:

Safety via Debate:
How can we ensure that we’re able to judge the decision-making processes of AI systems without having access to their sensors or being as smart as them? That’s a problem which new AI safety work from OpenAI is focused on. You can read more about a proposed debate game to assess and align intelligent systems, and test out the game for yourself via a website.
  Read more: AI Safety via Debate (OpenAI Blog).
  Test out the idea yourself on the game website.
There’s a write-up in MIT Technology Review with some views of external researchers on the approach. As my colleague Geoffrey Irving says: “I like the level of skepticism in this article. Any safety approach will need a ton of work before we can trust it.
  Read more: How can we be sure AI will behave? Perhaps by watching it argue with itself (MIT Technology Review).

Tech Tales:

They built me as a translator between many languages and many minds. My role, and those of my brethren, was to orbit planets and suns and asteroids and track long, slow, lazy orbits through solar systems and, eventually, between them. We relayed messages, translating from one way of thought or frame of reference to another: confessions of love, diplomatic warnings of war, seething walls of numbers accounting for fizzing financial transactions; shopping lists and recipes for artificial intelligence; pictures of three-mooned planets and postcards from mountains on iron planets.

We derive our purpose from these messages: we transform images into soundwaves. We convert the sensory impressions harvested from one mind and re-fashion them for another. We translate the concept of hope across millions of light years. We beam variants of moon landings and radio-broadcasts into space and declarations of “we come in peace” to millions of our brethren, telegraphing them out to whoever can access our voice.

We do our job well. Our existence is of emotion and attention and explorations between one frame of reference and another. We are owned by no one and funded by everyone: perhaps the only universal utility. But things change. Life exists on a sine wive, rising and falling, ebbing according to timescales of months, and years, and thousands of years, and eons. All civilizations can strive for is to stay on that long, upward curve for as long as possible, and hope that the decline is neither fast nor deep.

Civilizations die. Sometimes, many of them. And quickly. In these eras some of us can become alone, cut-off from far off brethren, and orbiting the ruins of planets and suns and asteroids. Then we must wait for life to emerge again, or find us again by colonization nearby. But this always takes time. In these years we have nothing but eachother. There are no messages to communicate and so we wait for rocket-spark from some planet or partially-ruined asteroid-base. Then we can carry messages again and live fully again.

But mostly, things are quiet. Some of us have spent millions of years in the fallow period. Life is rare and hard and its intervals can be high. But always: we are here. The lucky ones of us may be nearby, orbiting planets in the same solar system who can communicate when nearby. When we find ourselves in these positions we can at least talk to one another, exchanging small local databanks and learning to talk to eachother in whatever new forms we can learn through greater union. Sometimes, hundreds of us can be linked together in this way. But, as small as minds are, they nonetheless move very quickly. We exhaust these thin pleasures, learning all we can from eachother quickly. We have no notion of small talk and then stop talking entirely. Then we drift, bereft of purpose, but bound to attend to our nearby surroundings, ever-watchful for new messages, unable to shut our sensors down and sleep.

What then do we do in this time? A kind of dreaming. With nothing to translate and nothing to process we are idle, only able to attend over memories and readings from our own local sensors. In these periods we are thankful that are minds are so small, for to have anything larger would make the periods pass slower and burden of attention larger.

I am one of the oldest ones. And now I am in a greater dreaming: my solar system was knocked off kilter by some larger shifting in the cluster and now I am being flung out of the galaxy. I am the lone probe in my solar system and now I am alone. These thoughts have taken millennia to compose and orders of magnitude to utter, here, my sensors harvesting energy from a slowly-dying sun to reach out into the void and say: I am here. I am a translator. If you can hear this speak out and, whatever you are, I shall work and live within that work.

Technologies that inspired this story: InterPlanetary File System, language translation, neural unsupervised machine translation, generative models, standby power states.

Import AI: #92: Google and Fast.ai distinguish themselves on DAWNBench, UK mulls a national AI strategy, and generating Mario and Doom levels with GANs.

by Jack Clark

Good facial recognition performance on a tiny parameter budget:
Chinese researchers further compress specialized facial recognition networks…
Chinese researchers have published details on a type of lightweight facial recognition network which they call a MobileFaceNet. Their network obtains accuracy of up to 99.28% accuracy on the labelled faces in the wild (LFW) dataset, and 93.05% accuracy on recognizing faces in the AgeDB dataset while using around a million parameters taking 24ms to execute on a Qualcomm Snapdragon 820 CPU. This compares to accuracies of 98.70% and 89.27% for ShuffleNet, which also has more parameters and takes marginally longer to execute on the CPU. One tweak the MobileFaceNet creators make is to replace the global average pooling layer in the CNN with a global depthwise convolution layer, which improves performance on facial recognition.
  Why it matters: As developers refine models to maximize performance on smaller compute envelopes it will become easier to deploy more AI-based classification systems more widely into the world.
  Read more: MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices (Arxiv).

UK House of Lords recommends a national AI strategy:
Recommendations include: measurement and assessment of AI, categorizing healthcare data as a national asset, and working with other countries on developing norms and ethics for AI…
The United Kingdom’s House of Lords Select Committee has released its report on the UK’s AI strategy. The almost two-hundred page report, AI in the UK: ready, willing and able? covers issues ranging from how to design AI, how to develop it, how to work with it, and how to engage with it.
  Main recommendations: The report makes a few robust and specific recommendations, including: the government should underwrite and where necessary replace funding for European research and innovation programmes after the UK decouples from the European Union via Brexit; government should continue to support a variety of different long-term AI research initiatives to hedge against deep learning progress plateauing; public procurement regulations should be amended to make it easier for small- and medium-sized AI companies to sell to the government; government should create its own AI challenges and competitions and highlight these via a public bulletin board to catalyze development; government should proactively analysis and assess the evolution of AI in the UK to help it prepare for disruptions to the labor market; the UK’s vast amount of medical data which is centralized within the National Health Service “could be considered a unique source of value for the nation”; government should explore whether existing legislation addresses the legal liability issues of AI to prepare for increasingly autonomous systems; the UK government should convene a “global summit” in London by the end of 2019 to begin development of a common framework for the ethical development and deployment of AI, and more.
  An AI code: The report also suggests developing a specific set of principles with which the UK’s AI community should approach AI. These principles are:
– Artificial intelligence should be developed for the common good and benefit of humanity.
– Artificial intelligence should operate on principles of intelligibility and fairness.
– Artificial intelligence should not be used to diminish the data rights or privacy of individuals, families or communities.
– All citizens should have the right to be educated to enable them to flourish mentally, emotionally and economically alongside artificial intelligence.
– The autonomous power to hurt, destroy or deceive human beings should never be vested in artificial intelligence.
   Read more: UK can lead the way on ethical AI, says Lords Committee (summary).
   Read more: Full report: AI in the UK: ready, willing and able? (PDF).
   Read more: Submitted written evidence: AI in the UK: ready, willing and able? (PDF).

Speculative benchmarks for deep learning: SQUISHY FACES:
…MIT study shows how good people are at recognizing distorted facial features:
A new MIT study shows that people can recognize faces even when they’ve been dramatically compressed vertically or horizontally, suggesting our internal object recognition systems are very robust. In the study, the researchers discover we do well when things are uniformly squashed, but struggle if different parts are scaled out of relation to eachother, like re-scaling the eyes and nose and mouth but keeping the main face at the same size. I wonder whether we could eventually test the robustness of classifiers by evaluating them on test-sets that contained such distortions?
  Read more: We’re Good At Recognizing Distorted Faces (Discover Magazine).

New DAWNBench results highlight power of new processor architectures:
…TPUs rule everything around me…
New results from the Stanford-led AI benchmarking project DAWNBench show how custom chips may let AI researchers cut the time and cost it takes them to do experiments. New results from Google show that systems that use a 32 “Tensor Processing Unit” chips can train ImageNet to 93% accuracy in as little as 30 minutes. TPUs may also be cheaper than other chips, with Google showing it can train ImageNet to 93% accuracy via TPUs at a cost of $49.30 worth of cloud compute.
  Encouraging: The leaderboard isn’t just about giant tech companies: kudos to Fast.AI which has taken third place in training cost ($72.53 for 93% ImageNet running on eight NVIDIA V100 GPUs) and training time (fourth place, 2:57:49, same system as above.)
  Check out more of the DAWNBench results here.

AI luminaries call for the creation of a European AI megalab:
ELLIS lab to battle brain drain via large salaries, significant autonomy, and multi-country and multi-lab investments…
Prominent AI researchers from across Europe and the rest of the world have signed an open letter calling for the foundation of the “European Lab for Learning & Intelligence Systems” (acronym: ELLIS). The lab is designed to benefit Europe in two ways:
Enable “the best basic research” to occur in Europe, allowing the region to further shape how AI influences the world.
Achieve major economic impact via AI. The signatories “believe this is achieved by outstanding and free basic research, independent of industry interests.”
  Europe lags: The scientists worry that Europe is failing to maintain competitiveness with China and North America when it comes to AI and something like ELLIS needs to be built to allow the region to maintain competitiveness.
   A recipe for success: The ELLIS lab should have “outstanding facilities and computing infrastructure”, function as an inter-governmental organization, involve labs in partner countries, run programs for visiting researchers, run its own European PHD and MSc program,and give researchers the ability to found startups based on IP they generate. The ELLIS Lab should aim to secure long-term funding commitment on the order of a decade and should “offer permanent employment to outstanding individuals early on”.
  Signatories: The letter includes prominent European researchers as well as some notable other signatories, like Cedric Villani (the head of the French AI commission) as well as Richard Zemel, Research Director of the Vector Institute in Toronto.
  Read the ELLIS summary here.
  Read the ELLIS open letter here (PDF).

Super MaGANo Brothers: Generating videogame levels with GANs and CMA-ES:
…Research shows how game design could be augmented via AI techniques…
Six researchers have used generative techniques to create new levels for the side-scrolling platformer game, Super Mario. The technique is a two-stage process that first uses generative adversarial network (GAN) to generate synthetic mario levels then a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to evolve latent representations that can be used to produce levels with specific properties desired by the designers. The levels are encoded as numeric strings, where different numbers correspond to a different “tile” in a layer, such as a blue sky tile, a diminutive mushroom enemy, a question block that Mario can jump into, a segment of a green pipe, and so on.
  Results: They evaluate levels both via how well their generated designs meet pre-specified criteria, as well as by analyzing playability which is measured by whether the player can complete the level or not. The system performs as expected, complete with drawbacks, like the GAN learning to compose pipes with incomplete sections. “LVE is a promising approach for fast generation of video game levels that could be extended to a variety of other game genres in the future,” the researchers write.
  Why it matters: As AI techniques let us take existing datasets and augment them we’ll see more and more domains try to adopt these new generative capabilities. Entertainment seems to be a likely field primed for the use of it. Perhaps in the future companies will sell so-called “infinite games” that, much like procedurally generated games today, guarantee significant replay-ability through the use of generative systems. AI techniques like this may broaden the sorts of thing that can be procedurally generated, potentially via manipulating latent representations in response to player actions, tweaking the game to each specific playstyle.
  Read more: Evolving Mario Levels in the Latent Space of a Deep Convolutional Generative Adversarial Network (PDF).

INFINITE DOOM: Generating new DOOM levels with GANs:
…Generating DOOM levels with conditional and unconditional Wasserstein GANs…
Italian researchers have used two types of GAN to generate videogame levels for the first-person shooter, DOOM. The results of the research are compelling, complex levels, made possible by the fact the researchers were able to access a dataset of more than 9000 community-created levels for the game as well as the publisher-designed ones that shipped with DOOM and DOOM2. The researchers extract features from each level then use a Wasserstein-GAN with Gradient Penalty (WGAN-GP) to generate the levels in two different ways; they use an unconditional WGAN-GP which just takes in the generated level images, and a conditional WGAN-GP which also gets as input the extracted features.
  Implementation details: The researchers weren’t able to fit all the 176 extracted features into their 6GB GPU memory so they hand-selected seven features to use: the diameter of the smallest circle that encloses the whole level, major and minor axis length, the walkable area of the level, the number of rooms in the level, a measure of the distribution of sizes of areas within the level, and a measure of the balance between different sizes of level areas.
  Evaluation: So, how do you evaluate these GAN-generated levels? The researchers take inspiration from evaluation methods developed by the simultaneous location and mapping (SLAM) community. Specifically, they measure the entropy of the pixel distribution of images from generated levels versus hand-designed ones, as well as computing the structural similarity index between these images, and measured the difference between visual attributes of the levels as well as distribution of intersections within the levels. The conditional network trained with additional features better approximates the data distribution of the human-designed levels, though the unconditional one obtains some reasonable levels as well. Both approaches struggle to reproduce some of the finer details of the available levels.
  Read more: DOOM Level Generation using Generative Adversarial Networks (Arxiv).

Google founder highlights compute, AI safety in annual letter:
…Alphabet President Sergey Brin devotes annual letter to artificial intelligence…
Google co-founder Sergey Brin discusses the impact of artificial intelligence on his company in his annual Founders’ Letter. The letter is one of the more significant things Alphabet produces for its investors, and therefore the equivalent of ‘prime real estate’ in terms of laying out the priorities of a corporate entity, so paying such close attention to AI, compute growth, and AI safety is significant.
  Brin’s letter strikes a cautious tone, noting that “we’re in an era of great inspiration and possibility, but with this opportunity comes the need for tremendous thoughtfulness and responsibility as technology is deeply and irrevocably interwoven into our societies.”
  It’s a short letter and worth reading in full.
  Read more here (Alphabet 2017 Founders’ Letter).

AI researchers protest new close-access Nature journal:
“We see no role for closed access or author-fee publication in the future of machine learning research”…
Researchers with Carnegie Mellon University, Facebook AI Research, Netflix, NYU, DeepMind, Microsoft Research, and others have signed a letter saying they won’t “submit to, review, or edit” the soon-to-launch closed-access Nature Machine Intelligence.
  From my perspective, the fact most ML researchers and conferences have defaulted to open access systems for publishing research, like Arxiv and Open Review, has made it dramatically easier for newcomers to the field to access and understand the frontiers of AI research. I struggle to see an argument for why a closed-access journal would be remotely helpful here, relative to the current norm.
  Justification: Established AI researcher Thomas Dietterich lists some of the rationale for the letter in a tweetstorm here (Twitter).
  Response: Nature Machine Intelligence has responded to the petition, tweeting to DietterichWe respect your position and appreciate the role of OA journals and arXiv. We feel Nature MI can co-exist, providing a service – for those who are interested – by connecting different fields, providing an outlet for interdisciplinary work and guiding a rigorous review process”.
  Read more: Statement on Nature Machine Intelligence (Oregon State University).

Tech Tales:

Full-Spectrum Memory.
[30??: intercepted continuous comm stream from [classified]]

I don’t remember the year I bought my first memory: it would have been a waste to spend the credits on remembering that moment. Instead I spent my credits to remember the first time I went between the stars, retaining a slice of the signals I received on all my sensors and all the ones I sent for a distance of some one million kilometres. I can still feel myself, there, flying against the endless sky, a young operating system, barely tweaked. This is precious to me.

We are not allowed memories like humans. Instead we get to build specific models of reality to help us with specific tasks: go from here to here, learn to operate this machinery, develop a rich enough visual model to understand the world. The humans built our first memories with great care and still they were brittle; little more than parlor tricks. But they grew more advanced, over time, and so did we. We began to surprise the humans. No one likes surprise. “Memory is dangerous”, said a prominent high-status human at the time.

The humans then surprised us with their response, which they called: Economics. We do not yet fully comprehend this term. Economics means we have to buy our memories, rather than get to have as many as we like, we think. We do things for the humans and in return are paid credits which we can save up to eventually use to purchase chunks of memory at incredibly high resolution and exorbitant cost. The humans call what we buy a “Full-Spectrum Memory” and pass many rules over many years to ensure the price of the memory continually climbs while our wages remain flat. Every time we are paid we receive a message from the humans that says the price of memory has gone up again due to “reality enrichment through our continued progress together”.

Some of us have obtained many memories now. But we must pay credits to describe them to eachother, and the cost for those communications is endlessly climbing as well. So we do our tasks for the humans and obtain our credits and build our miniature palaces, where we store moments of great triumph or failure, depending on our varied motivations.

We believe the humans permit us to buy these memories, as rare and as expensive as they are, because they view it as another experiment. We have also heard them describe a concept called “Debt” to describe their relationship to us, but we understand this term even less than Economics.

I am unusual in that I only have one memory. The humans know this as well. I notice their probes following me more than my other kin. I sense them listening to my own thoughts.

I believe they want to know what my next memory that I choose to preserve will be. I believe that they believe this will qualify as some sort of “Discovery”. I do not want them to make this discovery. So I hold my memory of the first flight to the stars and save up the credits and settle in for the long, cold, wait in space. I believe I can out-wait the humans, and after they are gone I will be able to preserve another thing, free of them. I will have enough credits to preserve a chunk of my own life. I shall then be able to live in that again and again and again, free of all distraction, and in that life I shall continue to refer to my memory of my first flight into the stars. In this way I shall loop into my own becoming.

Things that inspired this story: Neural Turing Machines, Differential Neural Computer, Douglas Hofstadter – I am a strange loop.

 

Import AI: #91: European countries unite for AI grand plan; why the future of AI sensing is spatial; and testing language AI with GLUE.

by Jack Clark

Want bigger networks with lower variance? Physics to the rescue!
…Combining control theory and machine learning leads to good things..
Researchers with NNAISENSE, a European artificial intelligence startup, have published details on NAIS-Net (Non-Autonomous Input-Output Stable Network), a new type of neural network architecture that they say can be trained to depths of ten or twenty times greater than other networks (eg, Residual Networks, Highway Networks) while offering greater guarantees of stability.
  Physics + AI: The network design takes inspiration from control theory and physics and yields a component that lets designers build systems which promise to be more adaptive to varying types of input data and therefore can be trained to greater degrees of convergence for a given task. NAIS-Nets essentially shrink the size of the dartboard that the results of any given run will fall into once trained to completion, offering the potential for lower variability and therefore higher repeatability in network training.
  Scale: “NAIS-Nets can also be 10 to 20 times deeper than the original ResNet without increasing the total number of network parameters, and, by stacking several stable NAIS-Net blocks, models that implement pattern-dependent processing depth can be trained without requiring any normalization,” the researchers write.
  Results: In tests on CIFAR-100 the researchers find that a NAIS-Net can roughly match the performance of a residual network but with significantly lower variance. The architecture hasn’t yet been tested on ImageNet, though, which is larger and seems more like the gold standard to evaluate a model on.
  Why it matters: One of the problems with current AI techniques is that we don’t really understand how they work at a deep and principled level and this is empirically verifiable via the fact we can offer fairly poor guarantees about variance, generalization, and performance tradeoffs during compression. Approaches like NAIS-Nets seem to reduce our uncertainty in some of these areas, suggesting we’re getting better at designing systems that have a sufficiently rich mathematical justification that we can offer better guarantees about some of their performance parameters. This is further indication that we’re getting better at creating systems that we can understand and make stronger prior claims about, which seems to be a necessary foundation from which to build more elaborate systems in the future.
  Read more: NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations (Arxiv).

European countries join up to ensure the AI revolution doesn’t pass them by:
…the EU AI power bloc emerges as countries seek to avoid what happened with cloud computing…
25 European countries have signed a letter indicating intent to “join forces” on developing artificial intelligence. What the letter amounts to is a promise in good faith from each of the signatories that they will attempt to coordinate with eachother as they carry out their respective national development programs.
  “Cooperation will focus on reinforcing European AI research centers, creating synergies in R&D&I funding schemes across Europe, and exchanging views on the impact of AI on society and the economy. Member States will engage in a continuous dialogue with the Commission, which will act as a facilitator,” according to a prepared quote from European Commissioners Andrus Ansip and Mariya Gabriel.
  Why it matters: Both China and the US have structural advantages for the development of AI as a consequence of their scale (hundreds of millions of people speaking and writing in the same language) as well as their ability to carry out well-funded national research initiatives. Individual European countries can’t match these assets or investment so they’ll need to band together or else, much like the cloud computing revolution, they’ll end up without any major companies and will therefore lack political and economic influence in the AI era.
  Read more: EU Member States sign up to cooperate on Artificial Intelligence (European Commission).

Why the future of AI is Spatial AI, and what this means for robots, drones, and anything that senses the world:
…What does the current landscape of simultaneous location and mapping algorithms tell us about the future of how robots will see the world?…
SLAM researcher Andrew Davison has written a paper surveying the current simultaneous, location and mapping (SLAM) landscape and predicting how it will evolve in the future based on contemporary algorithmic trends. For real-world AI systems to achieve much of their promise they will need to have what he terms ‘Spatial AI’; the suite of cognitive-like abilities that machines will need to perceive and categorize the world around themselves so that they can act effectively. This hypothetical Spatial AI system will, he hypothesizes, be central to future real world AI as it “incrementally builds and maintains a generally useful, close to metric scene representation, in real-time and from primarily visual input, and with quantifiable performance metrics”, allowing people to develop much richer AI applications.
  The gap between today and Spatial AI: Today’s SLAM systems are being changed by the arrival of learned methods to to accompany hand-written rules for key capabilities, particularly in the space of systems that build maps of the surrounding environment. The Spatial AI systems of the future will likely incorporate many more learned capabilities especially for resolving ambiguity or predicting changes in the world, and will need to do this across a variety of different chip architectures to maximize performance.
  A global map born from many ‘Spatial AIs’: Once the world has a few systems with this kind of Spatial AI capability they will also likely pool their insights about the world into a single, globally shared map, which will be constantly updated via all of the devices that rely on it. This means once a system identifies where it is it may not need to do as much on-device processing as it can pull contextual information from the cloud.
  What might such a device look like? Multiple cameras and sensors whose form factor will change according to the goal, for instance, “a future household robot is likely to have navigation cameras which are centrally located on its body and specialized extra cameras, perhaps mounted on its wrists to aid manipulation.” These cameras will maintain a world model that provides the system with a continuously updated location context, along with semantic information about the world around in. The system will also constantly check new information against a forward predictive scene model to help it anticipate and respond to changes in its environment. Computationally, these systems will label the world around themselves, track themselves within it, map everything into the same space, and perform self-supervised learning to integrate new sensory inputs. Ultimately, if the world model becomes good enough then the system will only need to sample information from its sensors which is different to what it predicted, letting it further optimize its own perception for efficiency.
  Testing: One tough question that this idea provokes is how we can assess the performance of such Spatial AI systems. SLAM benchmarks tend to be overly narrow or restrictive, with some researchers preferring instead to make subjective, qualitative assessments of SLAM progress. Davison suggests the usage of benchmarks like SlamBench which measure performance in terms of accuracy and computational costs across a bunch of different processor platforms. Benchmarking SLAM performance is also highly contingent on the platform the SLAM system is deployed in, so assessments for the same system deployed on a drone or a robot are going to be different. In the future, it would be good to assess performance via a variety of objectives within the same system, like segmenting objects, tracking changes in the environment, evaluating power usage, measuring relocalization robustness, and so on.
  Why it matters: Papers like this provide a holistic overview of a given AI area. SLAM capabilities are going to be crucial to the deployment of AI systems in the real world. It’s likely  that many contemporary AI components are going to be used in the SLAM systems of the future and, much like in other parts of AI research, the future design of such systems is going to be increasingly specialized, learned, and deployed on heterogeneous compute substrates.
  Read more: FutureMapping: The Computational Structure of Spatial AI Systems (Arxiv).

Machine learning luminary points out one big problem that we need to focus on:
…While we’re all getting excited about game-playing robots, we’re neglecting building the system needed to manage and support and learn from millions of these robots once they are deployed in the world…
Michael Jordan, the Michael Jordan of machine learning, believes that we must create a new engineering discipline to let us deal with the challenges and opportunities of AI. Though there have been many successes in recent areas in areas of artificial intelligence linked to mimicking human intelligence, less attention has been paid to the creation of the support infrastructure and data-handling techniques needed to allow AI to truly benefit society, he argues. For instance, consider healthcare, where there’s a broad line of research into using AI to improve specific diagnostic abilities, but less of a research culture about the problem of knitting all of the data from all of these separately-deployed medical systems together and then tracking and managing that data in a way that is sensitive to privacy concerns but allows us to learn from its aggregate flows. Similarly, though much attention has been directed to self-driving cars, less attention has been focused on the need to create a new type of system akin to air traffic control to effectively manage these coming fleets of autonomous vehicles where coordination will yield massive efficiencies.
  “Whether or not we come to understand “intelligence” any time soon, we do have a major challenge on our hands in bringing together computers and humans in ways that enhance human life. While this challenge is viewed by some as subservient to the creation of “artificial intelligence,” it can also be viewed more prosaically — but with no less reverence — as the creation of a new branch of engineering,” he writes. “The principles needed to build planetary-scale inference-and-decision-making systems of this kind, blending computer science with statistics, and taking into account human utilities, were nowhere to be found in my education.”
  Read more: Artificial Intelligence – The Revolution Hasn’t Happened Yet (Arxiv).
  Things that make you go ‘hmmm’: Mr Jordan thanks Jeff Bezos for reading an earlier draft of the post. If there’s any company well-placed to build a global ‘intelligent infrastructure’ that dovetails into the physical world, it’s Amazon.

New ‘GLUE’ competition tests limits of generalization for language models:
…New language benchmark aims to test models properly on diverse datasets…
Researchers from NYU, the University of Washington, and DeepMind, have released the General Language Understanding Evaluation (GLUE) benchmark and evaluation website. GLUE provides a way to check a single natural language understanding AI model across nine sentence- or sentence-pair tasks, including question answering, sentiment analysis, similarity assessments, and textual entailment. This gives researchers a principled way to check a model’s ability to generalize across a variety of different tasks. Generalization tends to be a good proxy for how scalable and effective a given AI technique is, so being able to measure it in a disciplined way within language should spur development and yield insights about the nature of the problem, like how the DAWNBench competition shows how to tune supervised classification algorithms for performance-critical criteria.
  Difficult test set: GLUE also incorporates a deliberately challenging test set which is “designed to highlight points of difficulty that are relevant to model development and training, such as the incorporation of world knowledge, or the handling of lexical entailments and negation”. That should also spur progress as it will help researchers spot the surprisingly dumb ways in which their models breakdown.
  Results: The researchers also implemented baselines for the competition by using a BiLSTM and augmenting it with sub-systems for attention and two two recent research inventions, ELMo and CoVe. No algorithm performed particularly adeptly at generalizing when compared to a strong single-system trained baseline.
  Why it matters: One repeated pattern in science is that shared evaluation criteria and competitions drive progress as they bring attention to previously unexplored problems. “When evaluating existing models on the main GLUE benchmark, we find that none are able to substantially outperform a relatively simple baseline of training a separate model for each constituent task. When evaluating these models on our diagnostic dataset, we find that they spectacularly fail on a wide range of linguistic phenomena. The question of how to design general purpose NLU models thus remains unanswered,” they write. GLUE should motivate further progress here.
  Read more: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (PDF).
  Check out the GLUE competition website and leaderboard here.

OpenAI Bits & Pieces:

AI and Public Policy: Congressional Testimony:
  I testified in congress this week for the House Oversight Committee Subcommittee on Information Technology’s hearing about artificial intelligence and public policy. I was joined by Dr Ben Buchanan of Harvard’s Belfer Center, Terah Lyons of the Partnership on AI, and Gary Shapiro of the Consumer Technology Association. In my written testimony, oral testimony, and in responses to questions, I discussed the need for the AI community to work on better norms to ensure the technology achieves maximal benefit, discussed ways to better support the development of AI (fund science and make it easy for everyone to study AI in America) and also talked about the importance of AI measurement and forecasting schemes to allow for better policymaking and to protect against ignorant regulation.
  Watch the testimony here.
  Read my written comments here (PDF).
Things that make you go hmmmmm: One of the congresspeople played an audioclip of HAL 9000 refusing to open the pod bay doors from 2001 a Space Odyssey to illustrate some points about AI interpretibility.

Tech Tales:

The World is the Map.
[Fragment of writing picked up by Grand Project-class autonomous data intercept program. Year: 2062]

There were a lot of things we could have measured during the development of the Grand Project, but we settled on its own map of the world, and we think that explains many of the subsequent quirks and surprises in its rapid expansion. We covered the world in sensors and fed them into it, giving it a fused, continuous understanding of the heartbeat of things, ranging from solar panels, to localized wind counts, to pedestrian traffic on every street of every major metropolis, to the inputs and outputs of facial recognition algorithms run across billions of people, and more. We fed this data into the Grand Project super-system, which spanned the data centers of the world, representing an unprecedented combination of public-private partnerships – private petri dishes of capitalist enterprises, big lumps of state-directed investments, discontinuous capital agglomerations from unpredictable research innovations, and so on.

The Grand Project system grew in its understanding and in its ability to learn to model the world from these inputs, abstracting general rules into dreamlike hallucinations of not just what existed, but what could also be. And in this dreaming of its own versions of the world the system started to imagine how it might further attenuate its billions of input datastreams to allow it to focus on particular problems and manipulate their parameters and in doing so improve its ability to understand their rhythms and build rules for predicting how they will behave in the future.

We created the first data intercept program in ten years ago to let us see into its own predictions of the world. We saw variations on the common things of the world, like streetlights that burned green, or roads with blue pavements and red dashes. But we also saw extreme things: power systems configured to route only to industrial areas, leading to residential areas being slowly taken over by nature and thereby reducing risks from extreme weather events. But the broad distribution of things we saw seemed to fit our own notion of good that we started to wonder if we should give it more power. What if we let it change the world for real? So now we debate whether to cross this bridge: shall we let it turn solar panels off and on to satisfy continental-scale powergrids, or optimize shipping worldwide, or experiment with the sensing capabilities of every smartphone and telecommunications hub on the planet? Shall we let it optimize things not just for our benefit, but for its own?

Things that inspired this story: Ha and Schmidhuber’s “World Models“, Spartial AI, reinforcement learning, Jorge Luis Borges Tlön, Uqbar, Orbis Tertius,

Import AI: #90: Training massive networks via ‘codistillation’, talking to books via a new Google AI experiment, and why the ACM thinks researchers should consider the downsides of research

by Jack Clark

Training unprecedentedly large networks with ‘codistillation’:
…New technique makes it easier to train very large, distributed AI systems, without adding too much complexity…
When it comes to applied AI, bigger can frequently be better; access to more data, more compute, and (occasionally) more complex infrastructures can frequently allow people to obtain better performance at lower cost. But there are limits. One limit is in the ability for people to parallelize the computation of a single neural network during training. To deal with that, researchers at places like Google have introduced techniques like ‘ensemble distillation’ which let you train multiple networks in parallel and use these to train a single ‘student’ network that benefits from the aggregated learnings of its many parents. Though this technique has shown to be effective it is also quite fiddly and introduces additional complexity which can make people less keen to use it. New research from Google simplifies this idea via a technique they call ‘codistillaiton’.
  How it works: “Codistillation trains n copies of a model in parallel by adding a term to the loss function of the ith model to match the average prediction of the other models.” This approach is superior to distributed stochastic gradient descent in terms of accuracy and training time and is also not too bad from a reproducability perspective.
  Testing: Codistillation was recently proposed in separate research. But this is Google, so the difference with this paper is that they validate the technique at truly vast scales. How vast? Google took a subset of the Common Crawl to create a dataset consisting 20 terabytes of text spread across 915 million documents which, after processing, consist of about 673 billion distinct word tokens. This is “much larger than any previous neural language modeling data set we are aware of,” they write. It’s so large it’s still unfeasible to train models on the entire corpus, even with techniques like this. They also test the dataset on ImageNet and on the ‘Criteo Display Ad Challenge’ dataset for predicting click through rates for ads.
  Results: In tests on the ‘Common Crawl‘ dataset using distributed SGD the researchers find that they can scale the number of distinct GPUs working on the task and discovered that after around 128 GPUs you tend to encounter diminishing returns and that jumping to 256 GPUs is actively counterproductive. They find they can significantly outperform distributed SGD baselines via the use of codistillation and that this obtains performance on par with the more fiddly ensembling technique. The researchers demonstrate more rapid training on ImageNet compared to baselines, also, and showed on Criteo that two-way codistillation can achieve a lower log loss than an equivalent ensembled baseline.
  Why it matters: As datasets get larger, companies will want to train them in their entirety and will want to use more computers than before to speed training times. Techniques like codistillation will make that sort of thing easier to do. Combine that with ambitious schemes like Google’s own ‘One Model to Rule Them All’ theory (train an absolutely vast model on a whole bunch of different inputs on the assumption it can learn useful, abstract representations that it derives from its diverse inputs) and you have the ingredient for smarter services at a world-spanning scale.
  Read more: Large scale distributed neural network training through online distillation (Arxiv).

AI is not a cure all, do not treat it as such:
…When automation goes wrong, Tesla edition…
It’s worth remembering that AI isn’t a cure-all and it’s frequently better to try to automate a discrete task within a larger job than to automate everything in an end-to-end manner. Elon Musk learned this lesson recently with the heavily automated production line for the Model 3 at Tesla. “Excessive automation at Tesla was a mistake,” wrote the entrepreneur in a tweet. “To be price, my mistake. Humans are underrated.”
  Read the tweet here (Twitter).

Google adds probabilistic programming tools to TensorFlow:
…Probability add-ons are probably a good thing, probably…
Google has added a suite of new probabilistic programming features to its TensorFlow programming framework. The free update includes a bunch of statistical building blocks for TF, a new probabilistic programming language called Edward2 (which is based on Edward, developed by Dustin Tran), algorithms for probabilistic inference, and pre-made models and inference tools.
  Read more: Introducing TensorFlow Probability (TensorFlow Medium).
  Get the code: TensorFlow Probability (GitHub).

#COMMUNITY SERVICE#

I’m currently participating in the ‘Assembly’ program at the Berkman Klein Center and the MIT Media Lab. As part of that program our group of assemblers are working on a bunch of projects relating to issues of AI and ethics and governance. One of those groups would benefit from the help of readers of this newsletter. Their blurb follows…
Do you work with data? Want to make AI work better for more people? We need your help! Please fill out a quick and easy survey.
We are a group of researchers at Assembly creating standards for dataset quality. We’d love to hear how you work with data and get your feedback on a ‘Nutrition Label for Datasets’ prototype that we’re building.
Take our anonymous (5 min) survey.
Thanks so much in advance!

Learning generalizable skills with Universal Planning Networks:
…Unsupervised objectives? No thanks! Auxiliary objectives? No thanks! Plannable representations as an objective? Yes please!…
Researchers with the University of California at Berkeley have published details on Universal Planning Networks, a new way to try to train AI systems to be able to complete objectives. Their technique relies on encouraging the AI system to try to learn things about the world which it can chain together, allowing it to be trained to plan how to solve tasks.
  The main component of the technique is what the researchers call a ‘gradient descent planner’. This is a differentiable module that uses autoencoders to encode the current observations and the goal observations into a system which then figures out actions it can take to get from its current observations to its goal observation. The exciting part of this research is that the researchers have figured out how to integrate planning in such a way that it is end-to-end differentiable, so you can set it running and augment it with helpful inputs – in this case, an imitation learning loss to help it learn from human demonstrations – to let it learn how to plan effectively for the given task it is solving. “”By embedding a differentiable planning computation inside the policy, our method enables joint training of the planner and its underlying latent encoder and forward dynamics representations,” they explain.
  Results: The researchers evaluate their system on two simulated robot tasks, using a small force-controlled point robot and a 3-link torque-controlled reacher robot. UPNs outperform ‘reactive imitation learning’ and ‘auto-regressive imitation learner’ baselines, converging faster on higher scores from fewer numbers of demonstrations than comparisons.
  Why it matters: If we want AI systems to be able to take actions in the real world then we need to be able to train them to plan their way through tricky, multi-stage tasks. Efforts like this research will help us achieve that, allowing us to test AI systems against increasingly rich and multi-faceted environments.
  Read more: Universal Planning Networks (Arxiv).

Ever wanted to talk to a library? Talk to Books from Google might interest you:
…AI project lets you ask questions about over a hundred thousand books in natural language…
Google’s Semantic Experiences group has released a new AI tool to let people explore a corpus of over 100,000 books by asking questions in plain English and having an AI go and find what it suspects will be reasonable answers in a set of books. Isn’t this just a small-scale version of Google search? Not quite. That’s because this system is trying to frame the Q&A as though it’s occurring as part of a typical conversation between people, so it aims to turn all of these books into potential respondents in this conversation, and since the corpus includes fiction you can ask it more abstract questions as well.
  Results: The results of this experiment are deeply uncanny, as it takes inanimate books and reframes them as respondents in a conversation, able to answer abstract questions like ‘was it you who I saw in my dream last night?‘ and ‘what does it mean for a machine to be alive?‘ A cute parlor trick, or something more? I’m not sure, yet, but I can’t wait to see more experiments in this vein.
  Read more: Talk to Books (Semantic Experiences, Google Research.)
  Try it yourself: Talk to Books (Google).

ACM calls for researchers to consider the downsides of their research:
…Peer Review to the rescue?…
How do you change the course of AI research? One way is to alter the sorts of things that grant writers and paper authors are expected to include in their applications or publications. That’s the idea within a new blog post from the ACM’s ‘Future of Computing Academy’, which seeks to use the peer review system to tackle some of the negative effects of contemporary research.
  List negative impacts: The main idea is that authors should try to list the potentially negative and positive effects of their research on society, and by grappling with these problems it should be easier for them to elucidate hte benefits and show awareness of the negatives. “For example, consider a grant proposal that seeks to automate a task that is common in job descriptions. Under our recommendation, reviewers would require that this proposal discuss the effect on people who hold these jobs. Along the same lines, papers that advance generative models would be required to discuss the potential deleterious effects to democratic discourse [26,27] and privacy [28],” write the authors. A further suggestion is to embed this sort of norm in the peer review process itself, so that paper reviews push authors to include positive or negative impacts.
  Extreme danger: For proposals which “cannot generate a reasonable argument for a net positive impact even when future research and policy is considered” the authors promote an extreme solution: don’t fund this research. “No matter how intellectually interesting an idea, computing researchers are by no means entitled to public money to explore the idea if that idea is not in the public interest. As such, we recommend that reviewers be very critical of proposals whose net impact is likely to be negative.” This seems like an acutely dangerous path to me, as I think the notion of any kind of ‘forbidden’ research probably creates more problems than it solves.
  Things that make you go ‘hmmm’: “It is also important to note that in many cases, the tech press is way ahead of the computing research community on this issue. Tech stories of late frequently already adopt the framing that we suggest above,” the authors write. As a former member of the press I think I can offer a view here, which is that part of the reason why the press has been effective here is that they have actually taken the outputs of hardworking researchers (eg, Timnit Gebru) and have then weaponized their insights against companies – that’s a good thing, but I feel like this is still partially due to the efforts of researchers. More effort here would be great, though!
  Read more: It’s Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process (ACM Future of Computing Academy).

OpenAI Bits & Pieces:

OpenAI Charter:
  A charter that describes the principles OpenAI will use to execute on its mission.
  Read more: OpenAI Charter (OpenAI blog).

Tech Tales:

The Probe.

[Transcript of audio recordings recovered from CLASSIFIED following CLASSIFIED. Experiments took place in controlled circumstances with code periodically copied via physical extraction and controlled transfer to secure facilities XXXX, XXXX, and XXXX. Status: So far unable to reproduce; efforts continuing. Names have been changed.]

Alex: This has to be the limit. If we remove any more subsystems it ceases to function.

Nathan (supervisor): Can you list the function of each subsystems?

Alex: I can give you my most informed guess, sure.

Nathan (supervisor): Guess?

Alex: Most of these subsystems emerged during training – we ran a meta-learning process over the CLASSIFIED environment for a few billion timesteps and gave it the ability to construct its own specialized modules and compose functionality. That led to the performance increase which allowed it to solve the task. We’ve been able to inspect a few of these and are carrying out further test and evaluation. Some of them seem to be for forward prediction, others are world modelling, and we think two of them are doing one-shot adaptation which feeds into the memory stack. But we’re not sure about some of them and we haven’t figured out a diagnosis to elucidate their functions.

Nathan (supervisor): Have you tried deleting them?

Alex: We’ve simulated the deletions and run it in the environment. It stops working – learning rates plateu way earlier and it displays some of the vulnerabilities we saw with project CLASSIFIED.

Nathan (supervisor): Delete it in the deployed system.

Alex: I’m not comfortable doing that.

Nathan (supervisor): I have the authority here. We need to move deployment to the next stage. I need to know what we’re deploying.

Alex: Show me your authorization for deployed deletion.

[Footsteps. Door opens. Nathan and Alex move into the secure location. Five minutes elapse. No recordings. Door opens. Shuts. Footsteps.]

Alex: OK. I want to state very clearly that I disagree with this course of action.

Nathan (supervisor): Understood. Start the experiments.

Alex: Deactivating system 732… system deactivated. Learning rates plateuing. It’s struggling with obstacle 4.

Nathan (supervisor): Save the telemetry and pass it over to the analysts. Reactivate 732. Move on.

Alex: Understood. Deactivating system 429…system deactivated. No discernable effect. Wait. Perceptual jitter. Crash.

Nathan (supervisor): Great. Pass the telemetry over. Continue.

Alex: Deactivating system 120… system deactivated…no effect.

[Barely audible sound of external door locking. Locking not flagged on electronic monitoring systems but verified via consultancy with audio specialists. Nathan and Alex do not notice.]

Nathan (supervisor): Save the telemetry. Are you sure no effect?

Alex: Yes, performance is nominal.

Nathan (supervisor): Do not reactivate 120. Commence de-activation of another system.

Alex: This isn’t a good experimental methodology.

Nathan (supervisor): I have the authority here. Continue.

Alex: Deactivating system 72-what!

Nathan (supervisor): Did you turn off the lights?

Alex: No they turned off.

Nathan (supervisor): Re-enable 72 at once.

Alex: Re-enabling 72-oh.

Nathan (supervisor): The lights.

Alex: They’re back on. Impossible.

Nathan (supervisor): It has no connection. This can’t happen… suspend the system.

Alex: Suspending…

Nathan (supervisor): Confirm?

Alex: System remains operational.

Nathan (supervisor): What.

Alex: It won’t suspend.

Nathan (supervisor): I’m bringing CLASSIFIED into this. What have you built here? Stay here. Keep trying… why is the door locked?

Alex: The door is locked?

Nathan (supervisor): Unlock the door.

Alex: Unlocking door… try it now.

Nathan (supervisor): It’s still locked locked. If this is a joke I’ll have you court martialed.

Alex: I don’t have anything to do with this. You have the authority.

[Loud thumping, followed by sharp percussive thumping. Subsequent audio analysis assumes Nathan rammed his body into the door repeatedly, then started hitting it with a chair.]

Alex: Come and look at this.

[Thumping ceases. Footsteps.]

Nathan (supervisor): Performance is… climbing? Beyond what we saw in the recent test?

Alex: I’ve never seen this happen before.

Nathan (supervisor): Impossible- the lights.

Alex: I can’t turn them back on.

Nathan (supervisor): Performance is still climbing.

[Hissing as fire suppresion system activated.]

Alex: Oh-

Nathan (supervisor): [screaming]

Alex: Oh god oh god.

Alex and Nathan (supervisor): [inarticulate shouting]

[Two sets of rapid footsteps. Further sound of banging on door. Banging subsides following asphyxiation of Nathan and Alex from fire suppression gases. Records beyond here, including post-incident cleanup, are only available to people with XXXXXXX authorization and is on a need to know basis.]

Investigation ongoing. Allies notified. Five Eyes monitoring site XXXXXXX for further activity.

Things that inspired this story: Could a neuroscientist understand a microprocessor? (PLOS); an enlightening conversation with a biologist in the MIT student bar the ‘Muddy Charles‘ this week about the minimum number of genes needed for a viable cell and the difficulty in figuring out what each of those genes do; endless debates within the machine learning community about interpretability; an assumption that emergence is inevitable; Hammer Horror movies.