Import AI

Import AI #99: Using AI to generate phishing URLs, evidence for how AI is influencing the economy, and using curiosity for self-imitation learning.

Auto-generating phishing URLs via AI components:
…AI is an omni-use technology, so the same techniques used to spot phishing URLs can also be used to generate phishing URLs…
Researchers with the Cyber Threat Analytics division of Cyxtera Technologies have written an analysis of how people might “use AI algorithms to bypass AI phishing detection systems” by creating their own system called DeepPhish.
  DeepPhish: DeepPhis works by taking in a list of fraudulent URLS that have been successfully worked in the past, encodes these as a one-hot representation, then trains a model to generate new synthetic URLs given a seed sentence. They found that DeepPhish could dramatically improve the chances of a fraudulent URL getting past automated phishing-detection systems, with DeepPhish URLs seeing a boost in effectiveness from 0.69% (no DeepPhish) to 20.90% (with DeepPhish).
  Security people always have the best names: DeepPhis isn’t the only AI “weapon” system recently developed by researchers, the authors note; other tools include Honey-Phish, SNAP_R, and Deep DGA.
  Why it matters: This research highlights how AI is an inherent omni-use technology, where the same basic components used to, for instance, train systems to learn to spot potentially fraudulent URLS, can also be used to generate plausible-seeming fraudulent URLs.
  Read more: DeepPhish: Simulating Malicious AI (PDF).

Curious about the future of reinforcement learning? Apply more curiosity!
…Self-Imitation Learning, aka: That was good, let’s try that again…
Self-Imitation Learning (SIL) works by having the agent exploit its replay buffer by learning to repeat its own prior actions if they have generated reasonable returns previously and, crucially, only when those actions delivered larger returns than were expected. The authors combine SIL with Advantage Actor-Critic (A2C) and test the algorithm out on a variety of hard tasks, including the notoriously tough Atari exploration game Montezuma’s Revenge. They also report scores for games like Gravitar, Freeway, PrivateEye, Hero, and Frostbite: all areas where A2C+SIL beats A3C+ baselines. Overall, AC2+SIL gets a median score across all of Atari of 138.7%, compared to 96.1% for A2C.
  Robots: They also test a combination of PPO+SIL on simulated robotics tasks within OpenAI Gym and significantly boost performance relative to non-SIL baselines.
  Comparisons: At this stage it’s worth noting that many other algorithms and systems have come out since A2C with better performance on Atari, so I’m a little skeptical of the comparative metric here.
  Why it matters: We need to design AI algorithms that can explore their environment more intelligently. This work provides further evidence that developing more sophisticated exploration techniques can further boost performance. Though, as the report notes, such systems can still get stuck in poor local optima. “Our results suggest that there can be a certain learning stage where exploitation is more important than exploration or vice versa,” the authors write. “We believe that developing methods for balancing between exploration and exploitation in terms of collecting and learning from experiences is an important future research direction.”
  Read more: Self-Imitation Learning (Arxiv).

Yes, AI is beginning to influence the economy:
…New study by experienced economists suggests the symptoms of major economic changes as a consequence of AI are already here…
Jason Furman, former chairman of the Council of Economic Advisers and current professor at the Harvard Kennedy School, and Robert Seamans of the NYU Stern School of Business, have published a lengthy report on AI and the Economy. The report compiles information from a wide variety of sources, so it’s worth reading in full.
  Here are some of the facts the report cites as symptoms that AI is influencing the economy:
– 26X: Increase in AI-related mergers and acquisitions from 2015 to 2017. (Source: The Economist).
– 26%: Real reduction in ImageNet top-5 image recognition error rate from 2010 to 2017. (Source: the AI Index.)
– 9X: Increase in number of academic papers focused on AI from 1996 to now, compared to a 6X increase in computer science papers. (Source: the AI Index.)
– 40%: Real increase in venture capital investment in AI startups from 2013 to 2016 (Source: MGI Report).
– 83%: Probability a job paying around $20 per hour will be subject to automation (Source: CEA).
– 4%: Probability a job paying over $40 per hour will be subject to automation (Source: CEA).
  “Artificial intelligence has the potential to dramatically change the economy,” they write in the report conclusion. “Early research findings suggest that AI and robotics do indeed boost productivity growth, and that effects on labor are mixed. However, more empirical research is needed in order to confirm existing findings on the productivity benefits, better understand conditions under which AI and robotics substitute or complement for labor, and understand regional level outcomes.”
   Read more: AI and the Economy (SSRN).

US Republican politician writes op-ed on need for Washington to adopt AI:
Op-ed from US politician Will Hurd calls for greater use of AI by federal government …
The US government should implement AI technologies to save money and cut the time it takes for it to provide services to citizens, says Will Hurd, chairman of the US Information Technology Subcommittee of the House Committee on Oversight and Government Reform.
  “While introducing AI into the government will save money through optimizing processes, it should also be deployed to eliminate waste, fraud, and abuse,” Hurd said. “Additionally, the government should invest in AI to improve the security of its citizens… it is in the interest of both our national and economic security that the United States not be left behind.”
  Read more: Washington Needs to Adopt AI Soon or We’ll Lose Millions (Fortune).
  Watch the hearing in which I testified on behalf of OpenAI and the AI Index (Official House website).

European Commission adds AI advisers to help it craft EU-wide AI strategy:
…52 experts will steer European AI alliance, advise the commission, draft ethics guidelines, and so on…
As part of Europe’s attempt to chart its path forward in an AI world, the European Commission has announced the members of a 52-strong “AI High Level Group” who will advise the Commission and other initiatives on AI strategy. Members include professors at a variety of European universities; representatives of industry,  like Jean-Francois Gagne the CEO of Element AI, SAP’s SVP of Machine Learning, and Francesca Rossi who leads AI ethics initiatives at IBM and also sits on the board of the Partnership on AI; as well as members of the existential risk/AGI community like Jaan Tallinn, who was the founding engineer of Skype and Kazaa.
  Read more: High-Level Group on Artificial Intelligence (European Commission).

European researchers call for EU-wide AI coordination:
…CLAIRE letter asks academics to sign to support excellence in European AI…
Several hundred researchers have signed a letter in support of the Confederation of Laboratories for Artificial Intelligence Research in Europe (CLAIRE), an initiative to create a pan-EU network of AI laboratories that can work together and feed results into a central facility which will serve as a hub for scientific research and strategy.
  Signatories: Some of the people that have signed the letter so far include professors from across Europe, numerous members of the European Association for Artificial Intelligence (EurAI) and five former presidents of IJCAI (International Joint Conference on Artificial Intelligence).
  Not the only letter: This letter follows the launch of another one in May which called for the establishment of a European AI superlab and associated support infrastructure, named ‘Ellis’. (Import AI: #92).
  Why it matters: We’re seeing an increase in the number of grass roots attempts by researchers and AI practitioners to get governments or sets of governments to pay attention to and invest in AI. It’s mostly notable to me because it feels like the AI community is attempting to become a more intentional political actor and joint-letters like this represent a form of practice for future more substantive engagements.
  Read more: CLAIRE (

When Good Measures go Bad: BLEU:
…When is an assessment metric not a useful assessment metric? When it’s used for different purposes…
A researcher with the University of Aberdeen has evaluated how good a metric BLEU (bilingual evaluation understudy) is for assessing the performance of natural language processing systems; they analyzed 284 distinct correlations between BLEU and gold-standard human evaluations across 34 papers and concluded that BLEU is useful for the evaluation of machine translation systems , but found its utility breaks down when used for other purposes, like the assessment of individual texts or scientific hypothesis testing or evaluation of things like natural language generation.
  Why it matters: AI research runs partially on metrics and metrics are usually defined by assessment techniques. It’s worth taking a step back and looking at widely-used things like BLEU to work out how meaningful it can be as an assessment methodology and to remember to use it within its appropriate domains.
  Read more: A Structured Review of the Validity of BLEU (Computational Linguistics).

Neural networks can be more brain-like than you assume:
…PredNet experiments show correspondence between activations in PredNet and activations in Macaque brains…
How brain-like are neural networks? Not very. That’s because, at a basic component level, they’re based on a somewhat simplified ~1950s conception of how neurons work, so their biological fidelity is fairly low. But can neural networks, once trained to perform particular tasks, end up reflecting some of the functions and capabilities found in biological neural networks? The answer seems to be yes, based on several years of experiments in things as varied as analyzing pre-trained vision networks, verifying the emergence of ‘place cells‘, and experiments.
  Harvard and MIT Researchers have analyzed PredNet, a neural network trained to perform next-frame prediction in a video of sequences, to understand how brain-like its behavior is. They find that groups when they expose the network to input its neurons fire with a response pattern (consisting of two distinct peaks) that is analogous to the firing patterns found in individual neurons within Macaque monkeys. Similarly, when analyzing a network trained on the self-driving Kittie dataset in terms of its spatial receptivity they find that the artificial network displays similar dynamics to real ones (though with some variance and error). The same high level of overlap between behavior of artificial and real neurons is roughly true of systems trained on sequence learning tasks.
  Less overlap: The areas where artificial and real neurons display less overlap seems to roughly correlate to intuitively harder tasks, like being able to deal with optical illusions, or in how the systems respond to different classes of object.
  Why it matters: We’re heading into a world where people are going to increasingly use trained analogues of real biological systems to better analyze and understand the behavior of both. PredNet provides an encouraging example that this line of experimentation can work. “We argue that the network is sufficient to produce these phenomena, and we note that explicit representation of prediction errors in units within the feedforward path of the PredNet provides a straightforward explanation for the transient nature of responses in visual cortex in response to static images,” the researchers write. “That a single, simple objective—prediction—can produce such a wide variety of observed neural phenomena underscores the idea that prediction may be a central organizing principle in the brain, and points toward fruitful directions for future study in both neuroscience and machine learning.”
  Read more: A neural network trained to predict future video frames mimics the critical properties of biological neuronal responses and perception (Arxiv).
  Read more: PredNet (CoxLab).

Unsupervised Meta-Learning: Learning how to learn without having to be told how to learn:
…The future will be unsupervised…
Researchers with the University of California at Berkeley have made meta-learning more tractable by reducing the amount of work a researchers needs to do to setup a meta-learning system. Their new ‘unsupervised meta-learning’ (ULM) approach lets their meta-learning agent automatically acquire distributions of tasks which it can subsequently perform meta-learning over. This deals with one drawback of meta-learning, which is that it is typically down to the human designer to come up with a set of tasks for the algorithm to be trained on. They also show how to combine ULM with other recently developed techniques like DIAYN (Diversity is all you need) for breaking environments down into collections of distinct tasks/states to train over.
  Results: UML systems beat basic RL baselinets on simulated 2D navigation and locomotion tasks. They also tend to be obtain performance roughly equivalent to systems built with human-designed tuned reward functions, suggesting that UML can successfully explore the problem space enough to devise good reward signals for itself.
  Why it matters: Because the diversity of tasks we’d like AI to do is much larger than the number of tasks we can neatly specify via hand-written rules it’s crucial we develop methods that can rapidly acquire information from new environments and use this information to attack new problems. Meta-learning is one particularly promising approach to dealing with this problem, and by removing another one of its more expensive dependencies (a human-curated task distribution) UML may help push things forward. “An interesting direction to study in future work is the extension of unsupervised meta-learning to domains such as supervised classification, which might hold the promise of developing new unsupervised learning procedures powered by meta-learning,” the researchers write.
  Read more: Unsupervised Meta-Learning for Reinforcement Learning (Arxiv).

OpenAI Bits&Pieces:

Better language systems via unsupervised learning:
New OpenAI research shows how to pair unsupervised learning with supervised finetuning to create large, generalizable language models. This sort of result is interesting because it shows how deep learning components can end up displaying sophisticated capabilities, like being able to obtain high scores on Winograd schema tests, having only learned naively from large amounts of data rather than via specific hand-tuned rules.
  Read more: Improving Language Understanding with Unsupervised Learning (OpenAI Blog).

Tech Tales:

Special Edition: Guest short story by James Vincent, a nice chap who writes about AI. All credit to James, all blame to me, etc…

Shunts and Bumps.

Reliable work, thought Andre, that was the thing. Ignore the long hours, freezing warehouses, and endless retakes. Ignore the feeling of being more mannequin than man when the director storms onto set, snatches the coffee cup out of your hand and replaces it with a bunch of flowers without even looking at you. Ignore it all. This was a job that paid, week after week, and all because computers had no imagination.

God bless their barren brains.

Earlier in the year, Rocky had explained it to him like this. “They’re dumb as shit, ok? Show them a potato 50 times and they’ll say it’s an orange. Show them it 5,000 times and they’ll say it’s a potato but pass out in shock if you turn it into fries. They just can’t extrapolate like humans can — they can’t think.” (Rocky, at this point, had been slopping her beer around the bar as if trying to short-circuit a crowd of invisible silicon dunces.) “They only know what you show them, and only then when you show them it enough times. Like a mirror … that gets a burned-in image of your face after you’ve looked at it every day for year.”

For the self-driving business, realizing this inability to extrapolate had been a slow and painful process. “A bit of a car crash,” Rocky said. The first decade had been promising, with deep learning and cheap sensors putting basic autonomy in every other car on the road. Okay, so you weren’t technically allowed to take your hands off the wheel, and things only worked perfectly in perfect conditions: clearly painted road markings, calm highways, and good weather. But the message from the car companies was clear: we’re going to keep getting better, this fast, forever.

Except that didn’t happen. Instead, there was freak accident after freak accident. Self-driving cars kept crashing, killing passengers and bystanders. Sometimes it was a sensor glitch; the white side of a semi getting read as clear highway ahead. But more often it was just the mild chaos of life: a party balloon drifting into the road or a mattress falling off a truck. Moments where the world’s familiar objects are recombined into something new and surprising. Potatoes into fries.

The car companies assured us that the data they used to train their AI covered 99 percent of all possible miles you could travel, but as Rocky put it: “Who gives a fuck about 99 percent reliability when it’s life or death? An eight-year-old can drive 99 percent of the miles you can if you put her in a booster seat, but it’s those one percenters that matter.”

Enter: Andre and his ilk. The car companies had needed data to teach their AIs about all the weird and unexpected scenarios they might encounter on the road, and California was full of empty film lots and jobbing actor who could supply it. (The rise of the fakies hadn’t been kind to the film industry.) Every incident that an AI couldn’t extrapolate from simulations was mocked up in a warehouse, recorded from a dozen angles, and sold to car companies as 4D datasets. They in turn repackaged it for car owners as safety add-ons sold at $300 a pop. They called it DDLC: downloadable driving content. You bought packs depending on your level of risk aversion and disposable income. Dog, Cats, And Other Furry Fiends was a bestseller. As was Outside The School Gates.

It was a nice little earner, Rocky said, and typical of the tech industry’s ability to “turn liability into profit.” She herself did prototyping at one of the higher-end self-driving outfits. “They’re obsessed with air filtration,” she’d told Andre, “Obsessed. They say it’s for biological attacks but I think it’s to handle all their meal-replacement-smoothie farts.” She’d also helped him find the new job. As was usually the case when the tech industry used cheap labor to paper over the cracks in its products, this stuff was hardly advertised. But, a few texts and a Skype audition later, and here he was.

“Ok, Andre, this time it’s the oranges going into the road. Technical says they can adjust the number in post but would prefer if we went through a few different velocities to get the physics right. So let’s do a nice gentle spill for the first take and work our way up from there, okay?”

Andre nodded and grabbed a crate. This week they were doing Market Mayhem: Fruits, Flowers, And Fine Food and he’d been chucking produce about all day. Before that he’d pushing a cute wheeled cart around on the warehouse’s football field-sized loop of fake street. He was taking a break after the crate work, staring at a daisy pushing its way through the concrete (part of the set or unplanned realism?) when the producer approached him.

“Hey man, great work today — oops, got a little juice on ya there still — but great work, yeah. Listen, dumb question, but how would you like to earn some real money? I mean, who doesn’t, right? I see you, I know you’ve got ambitions. I got ‘em too. And I know you’ve gotta take time off for auditions, so what I’m talking about here is a little extra work for triple the money.”

Andre had been suspicious. “Triple the money? How? For what?”

“Well, the data we’ve been getting is good, you understand, but it’s not covering everything the car folks want. We’re filling in a lot of edge cases but they say there’s still some stuff there’s no data for. Shunts and bumps, you might say. You know, live ones… with people.”

And that was how Andre found himself, standing in the middle of a fake street in a freezing warehouse, dressed in one of those padded suits used to train attack dogs, staring down a mid-price sedan with no plates. Rocky had been against it, but the money had been too tempting to pass up. With that sort of cash he’d be able to take a few days off, hell, maybe even a week. Do some proper auditions. Actually learn the lines for once. And, the producer said, it was barely a crash. You probably wouldn’t even get bruised.

Andre gulped, sweating despite the cold air. He looked at the car a few hundred feet away. The bonnet was wrapped in some sort of striped, pressure sensitive tape, and the sides were knobbly with sensors. Was the driver wearing a helmet? That didn’t seem right. Andre looked over to the producer, but he was facing away from him, speaking quickly into a walkie-talkie. The producer pointed at something. A spotlight turned on overhead. Andre was illuminated. He tried to shout something but his tongue was too big in his mouth. Then he heard the textured whine of an electric motor, like a kazoo blowing through a mains outlet, and turned to see the sedan sprinting quietly towards him.

Regular work, he thought, that was the thing.

Things that inspired this story: critiques of deep learning; failures of self driving systems; and imitation learning.

Once again, the story above is from James Vincent, find him on Twitter and let him know what you thoughts!

Import AI #98: Training self-driving cars with rented firetrucks; spotting (staged) violence with AI-infused drones; what graphs might have to do with the future of AI.

Cruise asks to borrow a firetruck to help train its self-driving cars:
…Emergency training data – literally…
Cruise, a self-driving car company based in San Francisco, wants to expose its vehicles to more data involving the emergency services, so then it asked the city if it could rent a firetruck, fire engine, and ambulance, and have the vehicles drive around a block in the city with their lights flashing, according to emails surfaced via Freedom of Information Act requests from Jalopnik.
  Read more: GM Cruise Prepping Launch of Driverless Pilot Car Pilot in San Francisco: Emails (Jalopnik).

Experienced researcher: What to do if winter is coming:
…Tips for surviving the post-bubble era in AI…
John Langford, a well-regarded researcher with Microsoft, has some advice for people in the AI community as they carry out the proverbial yak-shaving act of questioning whether AI is in a bubble or not. Though the field shouldn’t optimize for failure, it might be helpful if it planned for it, he says.
 “As a field, we should consider the coordinated failure case a little bit. What fraction of the field is currently at companies or in units at companies which are very expensive without yet justifying that expense? It’s no longer a small fraction so there is a chance for something traumatic for both the people and field when/where there is a sudden cut-off,” he writes.
  Read more: When the bubble bursts… (John Langford’s personal blog).

Drone AI paper provides a template for future surveillance:
…Lack of discussion of impact of research raises eyebrows…
Researchers with the University of Cambridge, the National Institute of Technology, and the Indian Institute of Science, have published details on a “real-time drone surveillance system” that uses deep learning. The system is designed to spot violent activities like strangling, punching, kicking, shooting, stabbing, and so on, by performing image recognition over imagery gathered from a crowd in real-time.
  It’s the data, silly: To carry out this project the researchers create their own (highly staged) collection of around 2,000 images called the ‘Aerial Violent Individual’ dataset, which they record via a consumer-based Parrot AR Drone. Most of the flaws in the system relate to this data, which sees a bunch of people carry out over-acted expressions of aggression towards each other – this data doesn’t seem to have much of a relationship to real-world violence and it’s not obvious how well this would perform in the wild.
  Results: The resulting system “works”, in the sense that the researchers are able to obtain high accuracies (90%+) on classifying certain violent behaviors within the dataset, but it’s not clear whether this translates to anything of practical use in the real world. The researchers will subsequently test out their work at a music festival in India later this month, they said.
  Responsibility: Like the “Deep Video Networks” research which I wrote about last week, much of this research is distinguished by the immense implications it appears to have for society, and it’s a little sad to see no discussion of this in the paper – yes, surveillance systems like this can likely be used to humanitarian ends, but they can also be used by malicious actors to surveil or repress people. I think it’s important AI researchers start to acknowledge the omni-use nature of their work and confront questions like this within the research itself, rather than afterwards following public criticism.
  Read more: Eyes in the Sky: Real-time Drone Surveillance System (DSS) for VIolent Individuals Identification using ScatterNet Hybrid Deep Learning Framework (Arxiv).
  Watch video (YouTube).

“Depth First Learning” launches to aid understanding of AI papers:
…Learning through a combination of gathering context and testing understanding…
Industry and academic researchers have launched ‘Depth First Learning”, an initiative to make it easier for people to educate themselves about important research papers by going through the key ideas of the paper along with recommended literature to read and various questions throughout each writeup indented to test for the reader having learned enough about the context to answer the question. The idea behind this work is that it makes it easier to understand research papers by breaking them down into their fundamental concepts. “We spent some time understanding each paper and writing down the core concepts on which they were built,” the researchers write.
  Read an example: “Depth First Learning” article on InfoGAN (Depth First Learning website).
  Read more: Depth First Learning (DFL website, About page).

Graphs, graphs everywhere: The future according to DeepMind:
…Why a little structure can be a very good thing…
New research from DeepMind shows how to fuse structured approaches to AI design with end-to-end learned systems to create systems that can not only learn about the world, but recombine learnings in new ways to solve new problems. This sort of “combinatorial generalization” is key to intelligence, the authors write, and they claim their approach deals with some of the recent criticisms of deep learning made by people like Judea Pearl, Josh Tenenbaum, and Gary Marcus, among others.
  Structure, structure everywhere: The authors argue that many of today’s deep learning systems already encode this sort of bias towards structure in the form of specific arrangements of learned components, for example, how convolutional neural networks are composed out of convolutional layers and then chained together in increasingly elaborate ways for image recognition. These designs encode within them an implicit relational inductive bias, the authors write, because they take in a bunch of data and operate over its relationships in increasingly elaborate ways. Additionally, most problems can be decomposed into graph representations (for instance, modeling the interactions of a bunch of pool balls can be done by expressing the pool balls and the table as nodes in a graph with the links between them signaling directions in which force may be transmitted, or a molecule can similarly be decomposed as atoms (nodes) and bonds (edges).
  Graph network: DeepMind has developed the ‘Graph network’ (GN) block, a generic component “which takes a graph as input, performs computations over the structure, and returns a graph as output.” This is desirable because a graph structure is fairly flexible, letting you express an arbitrary number of relationships between an arbitrary number of entities, and the same function can be deployed on differently sized graphs, and these graphs represent entities and relations as sets making them invariant to permutations.
  No silver bullet: Graph networks don’t make it easy to support approaches like “recursion, control flow, and conditional iteration”, they say, and so should not be considered a panacea. Another is the larger question of where to derive the graphs from that the graphs operate over, which the authors leave to other researchers.
  Read more: Relational inductive biases, deep learning, and graph networks (Arxiv).

Google announces AI principles to guide its business:
…Company releases seven principles, along with description of ‘AI applications we will not pursue’…
Google has published its AI principles, following an internal employee outcry in response to the company’s participation in a drone surveillance project for the US military. These principles are intended to guide Google’s work in the future, according to a blog post written by Google CEO Sundar Pichai. “These are not theoretical concepts; they are concrete standards that will actively govern our research and product development and will impact our business decisions”.
  Principles: The seven principles are as follows:
– “Be socially beneficial”.
– “Avoid creating or reinforcing unfair bias”.
– “Be built and tested for safety”.
– “Be accountable to people”.
– “Incorporate privacy design principles”.
– “Uphold high standards of scientific excellence”.
– “Be made available for uses that accord with these principles”.
   What Google won’t do: Google has also published a (short) list of “AI applications we will not pursue”. These are pretty notable because it’s rare for a public company to place such restrictions on itself so abruptly. The things Google won’t pursue are as follows:
– “Technologies that cause or are likely to cause overall harm”.
– “Weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people”.
– “Technologies that gather or use information for surveillance violating internationally accepted norms”.
– “Technologies whose purpose contravenes widely accepted principles of international law and human rights”.
   Read more: AI at Google: our principles (Google Blog).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: …

India releases national AI strategy:
India is the latest country to launch an AI strategy, releasing a discussion paper last week.
   Focus on five sectors: The report identifies five sectors in which AI will have significant societal benefits, but which may require government support in addition to private sector innovation. These are: healthcare; agriculture; education; smart cities and infrastructure; mobility and transportation.
   Five barriers to be addressed:
– Lack of research expertise
– Absence of enabling data ecosystems
– High resource cost and low awareness for adoption
– Lack of regulations around privacy and security
– Absence of collaborative approach to adoption and applications.
What they’re doing: The report proposes supporting two tiers of organizations to drive the strategy.
Centres of Research Excellence – academic/research hubs
– International Centres of Transformational AI – bodies with a mandate of developing and deploying research, in partnership with private sector.
   Read more: National Strategy for Artificial Intelligence.

Tech Tales:

The Dream Wall

Everyone’s DreamWall is different and everyone’s DreamWall is intimate. Nonetheless, we share (heavily curated) pictures of them with eachother. Mine is covered in mountains and on each mountain peak there are little desks with lamps. My friend’s Wall is shows an underwater scene and includes spooky trenches and fish that swim around them and the occasional hint of an octopus. One famous person accidentally showed a picture of their dream wall via a poorly posed selfie and it caused them problems because the DreamWall showed a pastoral country scene with nooses hanging from the occasional tree and in one corner a haybale-sized pile of submachineguns. Even though most people know how DreamWalls work they can’t help but judge other people for the contents of theirs.

It works like this:

When you wake up you say some of the things you were dreaming about.

Your home AI system records your comments and sends them to your personal ‘DreamMaker’ software

The ‘DreamMaker’ software maps your verbal comments to entities in its knowledge graph, then sends those comment-entity pairs to the DreamArtist software.

DreamArtist tries to render the comment-entity data into individual objects which fit with the aesthetic theme inherent to your current DreamWall.

The new objects are sent to your home AI system which displays them on your DreamWall and gives you the option to add further prompts, such as “move the cow to the left” or “make sure that the passengers in the levitating car look like they are having fun”.

This cycle repeats every morning, though if you don’t say anything when you wake up it will maintain the DreamWall and only modulate its appearance and dynamics according to data about how active you had been in the night.

If you wake up with someone else most systems have failsafes that mean your DreamWall won’t display. Some companies are piloting ‘Couple’s DreamWalls’ but are having trouble with it – apart from some old couples that have been together a very long time, most people, even if they’re in a very harmonious relationship, have distinct aspects to their personality that the other person might not want to wake up to every single day – especially since DreamWalls tend to contain visual depictions of things otherwise repressed during daily life.

Import AI #97: Faking Obama and Putin with Deep Video Portraits, Berkeley releases a 100,000+ video self-driving car dataset, and what happens when you add the sensation of touch to robots.

Try a little tenderness: researchers add touch sensors to robots.
…It’s easier to manipulate objects if you can feel them…
Researchers with the University of California at Berkeley have added GelSight touch sensors to a standard 7-DoF Rethink Robotics ‘Sawyer’ robot with an attached Weiss WSG-50 parallel gripper to explore how touch inputs can improve performance at grasping objects – a crucial skill for robots to have if used in commercial settings.
  Technique: The researchers construct four sub-networks that operate over specific data inputs (camera image, two GelSight images to model texture senses before and after contact, and an action network that processes 3D motion, in-plane rotation, and change in force. They link these networks together within a larger network and train the resulting model over a dataset of objects. The researchers pre-train the image components of the network with a model previously trained to classify objects on ImageNet. The approach yields a model that adapts to novel surfaces, learns interpretable policies, and can be taught to apply specific constraints when handling an object, like grasping it gently. 
  Results: The researchers test their model and find that systems trained with vision and action inputs get 73.03% accuracy, compared to 79.34% for systems trained on tactile inputs and action, compared to 80.28% for systems trained with tactile and vision and action.
   Harder than you think: This task, like most that require applying deep learning components to real-world systems, contains a few quirks which might seem non-obvious from the outset, for example: “The robot only receives tactile input intermittently, when its fingers are in contact with the object and, since each re-grasp attempt can disturb the object position and pose, the scene changes with each interaction”.
  Read more: More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch (Arxiv).

Want 100,000 self-driving car videos? Berkeley has you covered!
…”The largest and most diverse open driving video dataset so far for computer vision research”., according to the researchers..
Researchers with the University of California at Berkeley and Nexar have published BDD100K, a self-driving car dataset which BDD100K contains ~120,000,000 images spread across ~100,000 videos. “Our database covers different weather conditions, including sunny, overcast, and rainy, as well as different times of day including daytime and nighttime,” they say. The dataset is substantially larger than ones released by the University of Toronto (KITTI), Baidu (ApolloScape), Mapillary, and others, they say.
DeepDrive: The dataset release is significant for where it comes from: DeepDrive, a Berkeley-led self-driving car research effort with a vast range of capable partners, including automotive companies such as Honda, Toyota, and Ford. DeepDrive was set up partially so its many sponsors could pool research efforts on self-driving cars, seeking to close an implicit gap with other players.
  Rich data: The videos are annotated with hundreds of thousands of labels for objects like cars, trucks, persons, bicycles, and so on, as well as richer annotations for road lines drivable areas, and more; they also provide a subset of roughly ~10,000 images with full-frame instance segmentation.
  Why it matters – the rise of the multi-modal dataset: The breadth of the dataset with its millions of labels and carefully refined aspects will likely empower researchers in other areas of AI, as well as its obvious self-driving car audience. I expect that in the future these multi-modal datasets will become increasingly attractive targets to use to evaluate transfer learning from other systems, for instance by training a self-driving car model in a rich simulated world then applying it to real-world data, such as BDD100K.
  Challenges: The researchers are hosting three challenges at computer vision conference CVPR relating to the dataset, and are asking groups to compete to develop systems for road object detection, drivable area prediction, and domain adaptation.
  Read more: BDD100K: A Large-scale Diverse Driving Video Database (Berkeley AI Research blog).

KPCB’s Mary Meeker breaks down AI’s rise and China’s possible advantage in annual presentation:
…Annual slide-a-thon shows rise of China, points to image and speech recognition scores as evidence for impact of AI…
Mary Meeker’s annual presentation of research serves as a useful refresher for what is front-of-mind for venture capitalists focused on understanding the dynamics that affect the technology ecosystem. This year, at Code Conference in California, Meeker’s slides were distinguished via large sections spent on China, combined with a few notable slides situating AI progress metrics (specifically in object recognition and speech recognition) in relation to the growth of new markets for business.
  Read more: Mary Meeker’s 2018 internet trends report: All the slides, plus analysis (Recode).


An incomplete timeline of dubious things that people have synthesized via AI
– Early 2017:
Montreal Startup Lyrebird launches with audio recording featuring synthesized voices of Donald Trump, Barack Obama, Hillary Clinton.
– Late 2017:
“DeepFakes” arrive on the internet via Reddit with a user posting pornographic movies with celebrity faces animated onto them. A consumer-oriented free editing application follows and DeepFakes rapidly proliferate across the internet, then consumer sites start to clamp down on them.
– 2018:
Belgian socialist party makes a video containing a synthesized Donald Trump giving a (fake) speech about climate change. Party says video designed to create debate and not trick viewers.
– Listen: Politicians discussing about Lyrebird (Lyrebird Soundcloud).
– Read more: DeepFakes Wikipedia entry.
– Read more: Belgian Socialist Party Circulates “Deep Fake” Donald Trump Video (Politico Europe).

Why all footage of all politicians is about to become suspect:
…Think fake news is bad now? ‘Deep Video Portraits’ will make it much, much worse…
A couple of years ago European researchers caused a stir with ‘face2face’, technology which they demonstrated by mapping their own facial expressions onto synthetically rendered footage of famous VIPs, like George Bush, Barack Obama, and so on. Now, new research from a group of American and European researchers has pushed this fake-anyone technology further, increasing the fidelity of the rendered footage, reducing the amount of data needed to construct such convincing fakes, and also dealing with visual bugs that would make it easier to identify the output as being synthesized.
  In their words: “We address the problem of synthesizing a photo-realistic video portrait of a target actor that mimics the actions of a source actor, where source and target can be different subjects,” they write. “Our approach enables a source actor to take full control of the rigid head pose, face expressions and eye motion of the target actor”. (Emphasis mine.)
  Technique: The technique involves a few stages: first, the researchers track the people within the source and target videos via a monocular face reconstruction approach, which allows them to extract information about the identity, head pose, expression, eye gaze, and scene lighting for each video frame. They also separately track the gaze of each subject. They then essentially transfer the synthetic renderings of the input actor onto the target actor and perform a couple of clever tricks to make the resulting output high fidelity and less prone to synthetic tells like visual smearing/blurring of the background behind the manipulated actor.
  Why it matters: Techniques like this will have a bunch of benefits for people working in media and CGI, but they’ll also be used by nation states, fringe groups, and extremists, to attack and pollute information spaces and reduce overall trust in the digital infrastructure of societal discourse and information transmittion. I worry that we’re woefully unprepared for the ramifications of the rapid proliferation of these techniques and applications. (And controlling the spread of such a technology is a) extremely difficult and b) of dubious practicality and c) potentially harmful to broader beneficial scientific progress.)
  An astonishing absence of consideration: I find it remarkable that the researchers don’t take time in the paper to discuss the ramifications of this sort of technology, given that they’re demonstrating it by doing things like transferring President Obama’s expressions onto Putin’s, or Obama’s onto Reagan’s. They make no mention of the political dimension to this work in their ‘Applications’ section, which focuses on the technical details of the approach and how it can be used for applications like ‘visual dubbing’ (getting an actor’s mouth movements to map to an audio track’.
  Read more: Deep Video Portraits (Arxiv).
  Watch video for details: Deep Video Portraits – SIGGRAPH 2018 (YouTube).

DARPA to host synthetic video/image competition:
..US defense-oriented research organization to try and push state-of-the-art in creation and detection of fakes…
Nation states have become aware of the tremendous potential for subterfuge that this technology poses and are reacting by dumping research money into both exploiting this for gain and for defending against it. This summer, DARPA will hold a competition to see who can create the most convincing synthetic images, and also to see who can detect them.
  Read more: The US military is funding an effort to catch deepfakes and other AI trickery (MIT Technology Review).

Import AI Job Alert: I’m hiring an editor/sub-editor:
  I’m hiring someone to initially sub-edit and eventually help edit the OpenAI blog. The role will be a regularly compensated gig which should initially take about 1.5-2 hours every week, typically at around 9pm Pacific Time on Sunday Nights. If you’d be interested in this then please send me an email telling me why you’d be a good fit. The ideal candidate probably has familiarity with AI research papers, attention to detail, and experience fiddling with words in a deadline-oriented setting. I’ve asked around among some journalists and the fair wage seems to be about $25 per hour.
  Additional requirements: You’d need to be available via real-time communication such as WhatsApp or Slack during the pre-agreed upon editing window. Sometimes I may need to shift the time a bit if I’m traveling, but I’ll typically have advance warning.
  Send emails with subject line “Import AI editing job: [Your Name]” to

Predicting cyber attacks on utilities with variational auto-encoders:
…All watched over by (variational) machines of loving grace…
Researchers with water infrastructure company Xylem Inc have tested out a variational auto-encoder (VAE)-based system for detecting cyberattacks on utility systems. The research highlights a feature of contemporary AI methods that is both a drawback and a strength: their excellence at curve-fitting in big, high-dimensional spaces. Here, the researchers use a VAE to train a model on past observations that represent 43 variables within a municipal water network. They then study how this model reacts to unforeseen changes in the system that might indicate a cyberattack: the model works better than rule-based systems, with the VAE spitting out a constant logarithm of reconstruction probability (LRP) which tends to diverge when the underlying system departs from the norm.
  Strengths: “The model relies solely on sensor reads data in their raw form and requires no preprocessing, system knowledge, or domain expertise to function. It is generic and can be readily applied to a broad array of ICS’s in various industry sectors.”
  Weaknesses: “It is not perfect and has its own requirements (e.g., availability of vast amount of system observations data) and drawbacks (e.g., sensitivity to rare but planned operations such as activation of emergency booster pumps),” they write. This highlights one of the weaknesses of the great curve-fitting power of contemporary AI techniques (Judea Pearl has argued that curve-fitting is pretty much all these systems are capable of), which is that they’re naive as to changing circumstances and lack the common sense to distinguish malice from action.
  Why it matters: Techniques like this are pretty crude but they indicate that there’s a basic value in training basic machine learning systems on data to spot anomalies. This research to me is mostly interesting due to its context – its researchers are all linked to a traditional ‘non-tech’ organization and the technology is tested against real-world data. Part of the virtue of publishing a paper like this is probably to help with hiring, as the researchers will be able to point prospective candidates to this paper as an indication for why Xylem is an interesting place to work. It’s possible to imagine a future where basic predicting models are layered into the data streams of every town and utility, providing an additional signal to human overseers.
  Read more: Cyberattack Detection using Deep Generative Models with Variational Inference (Arxiv).


AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: …

Google will not renew contract for Project Maven, plans to release principles for how it approaches military and intelligence contracting:
….Big Tech continues to grapple with ethical challenges around military AI…
Google announced internally on Friday that it would not be renewing the contract it had with the US military for Project Maven, an AI-infused drone surveillance platform, according to Gizmodo. Google also said it is drafting ethical principles for how it approaches military and intelligence contracting.
  Why it matters: Military uses of AI remains one of the most contentious issues for the industry, and society, to grapple with. Tech giants will have a role in setting standards for the industry at large. (One wonders how much of a role Google can play here in the USA now, given that it will now be viewed as deeply partisan by the DoD – Jack) Given that AI is a particularly powerful ‘dual-use’ technology, direct military applications may end up being one of the easier ethical dilemmas the industry faces in the future.
  Read more: Google plans not to renew Project Maven contract (Gizmodo).
  Read more: How a Pentagon Contract Became an Identity Crisis for Google (NYT).

UK public opposed to AI decision-making in most parts of public life:
…The Brits don’t like the bots…
The RSA and DeepMind have initiated a project to create ‘meaningful public engagement on the real-world impacts of AI’. The project’s first report includes a survey of UK public attitudes towards automated decision-making systems.
  Lack of familiarity: With the exception of online advertising (48% familiar), respondents were overwhelmingly unfamiliar with the use of automated decision-making in key areas. Only 9% were aware of its use in the criminal justice system, 14% in immigration, and 15% in the workplace.
  Opposition to AI decision-making: Most respondents were opposed to the usage of these methods in most parts of society. The strongest opposition was in the usage of AI in the workplace (60% opposed vs 11% support) and criminal justice (60% opposed vs. 12% support).
  What the public want: While 29% said nothing would increase their support for automated decision-making, the poll pointed to a few potential re-mediations that people would support:
  36%: The right to demand an explanation for an automated decision.
  33%: Penalties for companies failing to monitor systems appropriately.
  24%: A set of common principles guiding the use of such systems.
  Why it matters: The report notes that a public backlash against these technologies cannot be ruled out if issues are not addressed. The RSA’s proposal for public engagement via deliberative processes and ‘citizens’ juries’, if successful, could provide a model for other countries.
   Read more: Artificial Intelligence – Real Public Engagement (RSA).

Open Philanthropy Project launches AI Fellows program:
Over $1 million in funding for high-impact AI researchers…
The Open Philanthropy Project, the grant-making foundation funded by Cari Tuna and Dustin Moskovitz, is providing $1.1m in PhD funding for seven AI/ML researchers focused on minimizing potential risks from advanced AI systems. “Increasing the probability of positive outcomes from transformative AI”, is one of Open Philanthropy’s priorities.
  Read more: Announcing the 2018 AI Fellows.
  Read more: AI Fellowship Program.

OpenAI Bits & Pieces:

OpenAI Fellows:
We designed this program for people who want to be an AI researcher, but do not have a formal background in the field. Applications for Fellows starting in September are open now and will close on July 8th at 12AM PST.
  Read more: OpenAI Fellows (OpenAI Blog).

Tech Tales:

Walking Through Shadows That Feel Like Sand

Yes, people died. What else would you expect?

People walked off cliffs. People walked into the middle of the street. People left stoves on. People forgot to eat. People lost their minds.

Yes, we punished the people that caused these people to die. We punished these people with lawsuits or criminal sentences. Sometimes we punished them with both.

But the technology got better.

Less people died.

At some point the technology started saving more people than it killed.

People stopped short of ledges. People pulled back from traffic. People remembered to turn stoves off. People would eat just the right amount. People healed their minds.

Was it perfect? No. Nothing ever is.

Did we adopt it? Yes, as we always do.

Are we happy? Yes, most of us are. And the more of us that are happy, the more likely everyone is going to be happy.

Now, where are we? We are everywhere.

We wear these goggles and we get to choose what we see.
We wear these clothes that let us feel additional sensations to supplement or replace the world.
We have these chips in our eardrums that let us hear the world better than dogs, or hear nothing at all, or hear something else entirely.

We walk through city streets and get to feel the density of other people via vibrations superimposed onto our bodies by our clothes.
We watch our own pet computer programs climbing the synth-neon signs that hang off of real church steeples.
We see sunsets superimposed on one another and we can choose whenever to see them, even in the thick of night.

When we are sad we diffuse our sadness into the world around us and the world responds back with rising violins or crashing waves.
Sometimes when we are sad the sun and the moon cry with us.
Sometimes we feel cold tears on the backs of our necks from the stars.

We are many and always growing and learning. What we experience is our choice. But our world grows richer by the day and we feel the world receding, as though a masterpiece overlaid with other paints from other artists, growing richer by the moment.

We do not feel this world, this base layer, so much anymore.

We worry we do not understand the people that choose to stay within it.
We worry they do not understand why we choose to stay above it.

Things that inspired this story: Augmented Reality, Virtual Reality, group AR simulations such as Pokemon Go, touch-aware clothing, force feedback, cameras, social and class and technological diffusion dynamics of the 21st century, self-adjusting feedback engines, communal escapism, cults. 

Import AI: #96: Seeing heartbeats with DeepPhys, better synthetic images via SAGAN, and spotting pedestrians via a trans-European dataset

Satellite imagery competition challenges systems to outline buildings, segment roads, and analyze land use patterns:
…DeepGlobe competition and associated datasets designed to speed progress on strategic domain…
Researchers with Facebook, DigitalGlobe, CosmiQ Works, Wageningen University, and the MIT Media Lab have revealed DeepGlobe 2018, a satellite imagery competition with three tasks and associated datasets. DeepGlobe is intended to yield improvements in the automated analysis of satellite images for disaster response, planning, and object detection. DeepGlobe 2018 has three tracks with linked datasets: road extraction (8,570 images), building detection (24,586 ‘scenes’, equivalent to a 650×650 image), and land cover classification (1,146 satellite images).
  Results: The researchers introduce some baseline performance numbers for each task; for road extraction they used a modified version of DeepLab with a ResNet18 backbone and Focal Loss, obtaining an Intersection over Union (IoU) score of 0.545; for building detection they used the top scoring solutions from a competition held on the same dataset in 2017, which obtain IoU scores of as high as .88 on cities like Las Vegas and as low as 0.54 on Khartoum; for land cover classification they implement a DeepLab system with a ResNet18 backbone and atrous spatial pyramid pooling (ASPP) to obtain an IoU scoe of 0.43.
  Why it matters: AI will increase the automated analysis capabilities people and nations can wield over their satellite imagery repositories. Progress in this domain directly influences geopolitics by giving rise to new techniques that different nations can use in conjunction with satellite data to watch and react to the world.
  Read more: DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images (Arxiv).

Trans-Europe Express!: Researchers release diverse ‘EuroCity’ dataset:
…31 cities across 12 countries yields a diverse dataset containing people in a huge variety of contexts…
Researchers with the Environment Perception Group at carmaker Daimler AG and the Intelligent Vehicles Group at TU Delft have released EuroCity, a large-scale dataset for object and pedestrian detection within urban scenes. EuroCity comprises 45,000 distinct images containing more than a hundred thousand pedestrians in weather settings ranging from dry to wet. The dataset is one of the largest and most diverse yet released for detecting people in urban scenes and will be of particular interest to self-driving car developers.
  Data: The researchers collected the data via a two megapixel camera installed on the dashboard of their car which they drove through 31 cities in 12 European countries.. The dataset’s diversity may help with generalization; results indicate that pre-training on the dataset substantially improved performance when transferring to solve tasks within the more widely-used CityPersons and KITTI datasets.
  Annotations: Pedestrians and vehicle riders are annotated in the dataset. If a rider, they are also annotated with sub-labels to describe their vehicle, such as bicycle, buggy, motorbike, scooter, tricycle, and wheelchair. The researchers also annotate confounding images, like posters that depict people, or images that catch reflections of people in windows, and additional phenomena like lens flares, motion blurs, raindrops, and so on. Annotations were performed via hand.
  Baselines: Four approaches – R-CNN, R-FCN, SSD, and YOLOv3 – are tested on the dataset to create baseline performance figures. Different variants of R-CNN perform best on all three tasks, followed by the performance of YOLOv3. “Processing rates for the R-FCN, Faster R-CNN, SSD and YOLOv3 on non-upscaled test images were 1.2 fps, 1.7 fps, 2.4 fps and 3.8 fps, respectively, on a Intel(R) Core i7-5960X CPU 3.00 GHz processor and a NVidia GeForce GTX TITAN X with 12.2 GB memory”.
  Why it matters: Datasets tend to motivate work on problems contained within them. Given the breadth and scale of EuroCity, it’s likely its release will improve the state-of-the-art when it comes to pedestrian detection in busy or partially occluded scenes. It also hints at a future where hundreds of thousands of cars with dash cams are used to grow and augment continent-scale datasets.
  Read more: The EuroCity Persons Dataset: A Novel Benchmark for Object Detection (Arxiv).

“I can guess your heart rate!” (with DeepPhys):
…Trained system predicts your heart beat from pixel inputs alone…
MIT and Microsoft researchers have built DeepPhys, a network that can crudely predict a person’s heart rate and breathing rate from RGB or infrared videos. They developed the network by building a couple of specific classification models based on domain knowledge about how to detect and analyze skin appearance and changes over time to better infer underlying biological phenomena.
  Results: The researchers test their system on four datasets, three recorded under controlled and uncontrolled lighting conditions, and the fourth involving infrared. Their approach outperforms other systems on a variety of evaluation criteria. Additionally, further tests showed that training the system on diverse data inputs can lead to better performance. “The performance improvements were especially good for the tasks with increasing range and angular velocities of head rotation,” they write. “We attribute this improvement to the end-to-end nature of the model which is able to learn an improved mapping between the video color and motion information”.
  Why it matters: Systems like this bring us closer to a world where the majority of cameras around us are performing a multitude of different analysis tasks, including ones we may not suspect are possible, like predicting our heart rate from images taken from security camera feeds.
  Read more: DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks (Arxiv).

Funny dogs no more! Google and Rutgers introduce ‘SAGAN’:
…Want to be a better artist? Look inside yourself…
One of the classic problems with GAN-generated images is the number of dog legs. What I mean by that is though these systems have become adept in recent years at generating synthetic imagery in a bunch of different domains, they’ve remained stubbornly bad at modeling aspects of images that require a holistic understanding of the whole – like getting the number of legs right on a dogs body, or figuring out the correct physical dimensions of a cat’s tail and paw relationship, and so on.  “While the state-of-the-art ImageNet GAN model excels at synthesizing image classes with few structural constraints (e.g. ocean, sky and landscape classes, which are distinguished more by texture than by geometry), it fails to capture geometric or structural patterns that occur consistently in some classes (for example, dogs are often drawn with realistic fur texture but without clearly defined separate feet),” the researchers explain.
  Attention to the rescue: The researchers, which include GAN-inventor Ian Goodfellow, get around this issue by implementing what they call a Self-Attention Generative Adversarial Network (SAGAN). A SAGAN works by pairing a self-attention mechanism with the traditional machinery of GAN. “The self-attention module is complementary to convolutions and helps with modeling long range, multi-level dependencies across image regions. Armed with self-attention, the generator can draw images in which fine details at every location are carefully coordinated with fine details in distant portions of the image,” they write.
  Results: The resulting systems dramatically outperform other approaches when assessed by the Inception score (which measures the KL divergence between the conditional class distribution and the marginal class distribution, where a higher score indicates between quality), with SAGAN obtaining a score of 52.52 compared to 36.8 for the prior best published result. It attains similarly impressive scores when assessed via Frechet Inception Distance (FID).
  Why it matters: Attention is a simple idea that has come to dominate a huge amount of AI research lately. SAGAN provides further evidence for the generality and applicability of the technique. The work also suggests that progress in automated image synthesis is going to continue, and I worry that society isn’t quite prepared for what having all these cheap, convincing digital fakes means.
  Read more: Self-Attention Generative Adversarial Networks (Arxiv).

Big Empiricism: Google carries out major ImageNet transfer learning experiment:
…Well-performing ImageNet models can aid transfer learning, but not as much as people had intuited…
New research from Google comprehensively tests the idea that models which attain higher scores on the widely-used ‘ImageNet’ dataset will tend to have good properties when used as inputs for transfer learning and domain adaptation. The research evaluates 13 ImageNet-trained classification models on 12 image classification tasks in three settings: as fixed feature extractors, as aids for fine-tuning, and using networks trained from random initialization.
  Results: ImageNet performance is at best weakly predictive of good out-of-the-box performance on other tasks, though confidence increases with fine-tuning.
  Why it matters: Experiments like these enlarge our understanding of transfer learning within AI, which is a crucial problem that needs to be dealt with to build more capable systems. “Is the general enterprise of learning widely-useful features doomed to suffer the same fate as feature engineering in computer vision?” the authors wonder. “It is not entirely surprising that features learned on one dataset benefit from some other amount of adaptation (i.e., fine-tuning) when applied to another. It is, however, surprising that features learned from a large dataset cannot always be profitably adapted to much smaller datasets.”
  Big Empiricism: I’d categorize this type of research as a buzzword (you’ve been warned!) I’ve been mentally using called ‘Big Empiricism’. This sort of research tends to work by taking an existing technique and scaling it up to unprecedented levels to test its performance in large domains, or by taking a well-received idea and testing it via large costly experiments with multiple permutations. Other examples of work here include papers like ‘Regularized Evolution for Image Classifier Architecture Search (ImportAI #81) or the original Neural Architecture Search paper, Evolution Strategies, and Exploring the Limits of Language Modeling (among many other worthy examples).
  I do think this gives credence to the argument that AI science is bifurcating into two distinct tracks, with many organizations participating in basic AI research and a few (typically wealthy) ones exploring questions that require access to very large and expensive computers.
  Read more: Do Better ImageNet Models Transfer Better? (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

Australia’s Chief Scientist proposes ethics ‘stamp’ for AI:

In a speech this week, Dr Alan Finkel, the Chief Scientist of Australia, proposed a mark which would identify products with AI components. Finkel says his idea, the “Turing Stamp”, would require companies to meet an independently audited set of standards, in the model of ‘Fair Trade’ or ‘Organic’ stamps on food.
  Why it matters: If this worked it could represent a mechanism for incentivizing ethical AI development without top-down regulation by introducing ethics into competition between firms. Oren Etzioni, the CEO of the Allen Institute for Artificial Intelligence Research, has similarly argued that digital chatbots should have to identify themselves when talking to humans. However, distinguishing clearly between “AI” and “computer” would likely prove to be a challenge for regulators, so I’d worry about a rapid proliferation of labels.
  Read more: Government should lead AI certification: Finkel (Government News).

Lessons from the Alvey Programme:
…What happened the last time the UK government dumped money into AI, and what parts of that were most helpful?…
In the appendices of the recent House of Lords report on AI is a discussion of a historic government attempt to stimulate the UK’s AI industry, the ‘Alvey Programme’ in the ‘AI spring’ of the 1980s. In 1983, the government committed £1bn (in today’s money) to AI R&D via an industry-academia partnership very similar to those being put forward today, in the UK and elsewhere.
  What worked: The programme successfully created a community of researchers in AI in the UK, and yielded a prototype for academia-industry collaboration that remains the main model of contemporary government AI R&D programs. Some of the research streams, like the focus on object-oriented programming, would have a lasting impact.
What went wrong: The programme was not deemed a success at the time, and was halted after five years. The goal of translating academic progress into commercial capabilities was not realized, as companies were frequently unwilling, or unable, to make significant investments in these technologies. The authors also point out the lack of communication amongst researchers, due to the absence of a single ‘research centre’ for the program.
  Read more: Lords AI Report (Appendix 4)
   Read more: Lessons from the Alvey Programme for Creating an Innovation Ecosystem for Artificial Intelligence.

Brookings polls 1,500 US adults regarding AI:
…Cautious optimism, concerns about job displacement and privacy, and support for regulation…
The Brookings thinktank recently conducted a survey of ~1,500 US adults on AI attitudes.
  Cautious optimism: 34% expect AI to make day-to-day life easier, vs 13% expect it to make life harder.
  Concerns about jobs, privacy, and the future of humanity:
– 38% expect AI to reduce jobs, vs. 12% expect job creation.
– 49% expect AI to reduce personal privacy, vs 5% expect AI to increase it, and 12% for no impact.
– 32% think AI represents a threat to humanity vs. 24% for no threat.
  Significant support for regulation: 42% support government regulation of AI, vs 17% opposing it.
  US perceived as world-leader, but China expected to catch up: 21% thought US was the leading country in AI, closely followed by Japan (19%), and China (15%). When asked the same question about 10 years from now, 21% thought US would still be leading, narrowly beating China (20%).
Why it matters: Understanding and managing public attitudes towards AI is an important part of AI policy, but polling on this has been limited so far, with previous work focused more on tracking trends in news coverage, or qualitative methods. More regular polling worldwide to see differences in attitudes over time, and between countries, would be a positive (and cheap!) endeavor.
  Read more: Brookings survey finds worries over AI impact on jobs and personal privacy, concern U.S. will fall behind China.

US cities use Amazon facial recognition software, ACLU objects:
ACLU has used public records requests to reveal how Amazon’s ‘Rekognition’ facial recognition software is being used by US law enforcement agencies in three states…
  What Rekognition can do: The ACLU reveals that Amazon has been renting its AI-powered ‘Rekognition’ software to several US law enforcement agencies. Rekognition is able to identify, track and analyze faces in real time, and recognize up to 100 people per image. It represents a new type of AI-software-as-a-service being developed by companies like Amazon and competes with similar cloud-based image recognition engines from Google, Microsoft, and others.
  Why the ACLU is so concerned: The ACLU says software can be “readily used to violate civil liberties and civil rights”, and envisage scenarios where police can monitor who attends protests, ICE can continuously monitor new immigrants, and cities could routinely track their residents whether they are suspected of criminal activity or not.
  Why it matters: As the US public sector tries to harness AI capabilities it’s going to be forced to enter into more and more procurement relationships with powerful AI companies, many of whom have implicit ideological stances that differ to those of some of their customers (see also: Google and Project Maven.)
  Read more: Amazon Teams Up With Law Enforcement to Deploy Dangerous New Facial Recognition Technology (ACLU)
  Read more: Amazon is selling facial recognition to law enforcement — for a fistful of dollars (Washington Post)

Tech Tales:

On The Surprising Re-Emergence of Board Game Designers as Cultural Arbiters and Controllers

Say you’ve got hundreds of different computers and you’re ordering them to do lots of different things and you want to be able to nudge them occasionally and figure out what they’re doing — what do you do? It’s a hard problem. Lots of the early methods ended up structuring AI organizations as combinations of companies and computation fleets. That period didn’t last long. As the software took over companies changed: human work became less about the specific design of specific details – as the marketing slogan from one of the big tech companies goes, ‘from tooth brushes to silent electric engines, our $auto_bot can do it all. Buy today, earn today!’, instead it’s about figuring out how to let humans easily interface with these machines.

Enter the board game designers. About a decade ago some of the companies discovered internally that they could recast AI-teaching problems into interactive games that people could play in virtual reality, or in cut-down scenarios on their phones. Staff started to ‘work’ by playing games that interfaced with gargantuan learning engines. But it worked. And soon it led to products, all of which relied on this conceit of having the human work by playing a game.

Board games were designed to solve AI-relevant problems, such as:
– Marshaling fleets of anti-poacher drones to survey a large wildlife park
– Optimizing delivery times in a given area while satisfying certain human happiness measures (sometimes known as: brand maximizing) and being able to lightly direct spare vehicles to perform promotional ‘robot intervention’ stunts.
– Evolving a contextual-input orchestra via a hundred musical robots and more than a thousand input streams from various webcams, microphones, pressure-sensitive pads on walkways, etc.
– etc.

Eventually just about every problem got a board game variant. The AI systems these games controlled became ever-smarter as well, so the games became more complicated as well. And in this march to complexity the purpose of the board games changed twice. First, we built games of games – abstract entities that took training to operate which would let one person skillfully conducted thousands of AI systems at once. For a while, this drove society. But as we ran more games and grew more expert in their construction the AIs became smarter as well.

One day the purpose of the games began to change: instead of providing interfaces through which we could change the AIs, the games became interfaces through which the AIs could learn from is. These board games now work like simulations, where we play them and the robot indicates what it is planning to do and we give votes about how we feel about what it is doing, and then sometimes it adopts that behavior, or sometimes it does something else.

Obviously, these sorts of board games are less fun. Something about becoming a pawn on the board instead of the player sitting behind it makes people unwilling to play these games. So the games have got better: now they’re designed to hook us and entrance us, and the machines are learning to experiment on us in this way as well.

So that’s who we’ve ended up with The Suck, our omnipresent nickname for ‘UN-backed AI Interface Cluster 1 Class: ALL_PEOPLE’. The Suck is a board game designed by machines running on casino-mad impulses. During its construction the machines eagerly exhumed ancient propaganda methods from Edward Bernays to tabloids to arcade game machines to casinos to mobile apps to long-since-regulated Social Media Architectures. The machines and the UN officials building The Suck did their job well and now most of the planet spends a few hours a day playing it. We can do anything else we want. No one is forcing us to do it. But, what else are you going to do? It’s fun!

Few of us have a clear sense of what the machines are up to, these days. Shuttles go up and come down. New things are built. The atmospheres are being cleared. Human UN staff occasionally talk to the world to give updates on The Partnership, which is how we refer to this relationship we’ve got with the machines. Most people are pretty cheerful but some see malice in what is most likely just a banal burst of progress: Now is the greatest time to be a board game player in history, and also the most dangerous time, because these board games are after our minds – the_truth_is_out_there forum posting, captured t-9 days from message posting.

Things that inspired this story: The Glass Bead Game, 4x strategy games, interfaces between simulated AIs and human overseers, learning from human preferences.

Import AI: #95: Learning to predict and avoid internet arguments, White House announces Select Committee on AI, and BMW trains cars to safely change lanes

Cornell, Google, and Wikimedia researchers train AI to predict when we’ll get angry on the internet:
…Spoiler: Very blunt comments with little attempt made at being polite tend to lead to aggressive conversations…
Have you ever read a comment addressed to you on the internet and decided not to reply because your intuition tells you the person is looking to start a fight? Is it possible to train AI systems to have a similar predictive ability and thereby create automated systems that can flag conversations as having a likelihood of spiraling into aggression? That’s the idea behind new research from Cornell University, Jigsaw, and the Wikimedia Foundation. The research tries to predict troublesome conversations based on a dataset taken from the discussion sections of ‘Wikipedia Talk’ pages.
  Dataset: To carry out the experiment, the researchers gathered a total of 1,270 conversations, half consisting of ones which became aggressive following the initial comments, and half consisting of ones which remained civil. (Categorizing civil versus on-track was done via a combination of the use of Jigsaw’s “Perspective” API, and gathering labels from humans via CrowdFlower.) These conversations had an average length of 4.6 comments.
  How it works: Armed with this dataset, the researchers characterized conversations via what they call “pragmatic devices signalling politeness”. This is a set of features that correspond to whether the conversation includes attempts to be friendly (liberal use of ‘thanks’, ‘please’, and so on), along with words used to indicate a position that welcomes debate (eg, by clarifying statements with phrases like “I believe” or “I think”). They then study the initial comment and see if their system can learn to predict whether it will yield negative comments in the future.
  Results: Humans are successful about 72% of the time at predicting nasty conversations from this dataset. The system designed by these researchers (which relies on logistic regression – nothing too fancy) is about 61.6% accurate, and baselines (bag of words and sentiment lexicon) get around ~56%. (One variant of the proposed technique gets accuracy of 64.9%, but this is a little dubious as it is trained on way more data and it’s unclear whether it is overfitting, as it is also trained on the same data corpus.) The researchers also derive some statistical correlations that could help humans as well as machines better spot comments that are prone to spiral into aggresion. “We find a rough correspondence between linguistic directness and the likelihood of future personal attacks. In particular, comments which contain direct questions, or exhibit sentence initial you (i.e., “2nd person start”), tend to start awry-turning conversations significantly more often than ones that stay on track,” they write. “This effect coheres with our intuition that directness signals some latent hostility from the conversation’s initiator, and perhaps reinforces the forcefulness of contentious impositions.”
  Why it matters: Systems like this show how with a relatively small amount of data it is possible to build classification systems that can, if paired with the right features, effectively categorize subtle human interactions online. While here such a system is used to do something that seems to be for the purpose of social good (figuring out how to identify and potentially avoid aggressive conversations), it’s worth remembering that a very similar approach could be used to, for instance, identify conversations where initial comments could correlate to conversations that have a high chance of displaying political views that are contrary to those views of the people building such systems, and so on. It would be nice to see an acknowledgement of this in the paper itself.
  Read more: Conversations Gone Awry: Detecting Early Signs of Conversational Failure (Arxiv).

Chinese researchers tackle Dota-like game King of Glory with RL + MCTS:
Tencent researchers take inspiration from AlphaGo Zero to tackle Chinese MOBA King of Glory…
Modern multiplayer strategy games are becoming a testbed for reinforcement learning and multi-agent algorithms. Following work by Facebook and DeepMind on StarCraft 1 and 2, and work by OpenAI on Dota, researchers with the University of Pittsburgh and Tencent AI Lab have published details on an AI technique which they evaluate on King of Glory, a Tencent-made massively multiplayer online battle arena (MOBA) game. The proposed system uses Monte Carlo Tree Search (MCTS – a technique also crucial to DeepMind’s work on tackling the board game Go) and incorporates techniques from AlphaGo Zero to “to produce a stronger tree search using previous tree results”. “Our proposed algorithm is a provably near-optimal variant (and in some respects, generalization) of the AlphaGo Zero algorithm” they write.
  Results: The researchers test out their technique within King of Glory by evaluating agents trained with their technique against other agents controlled by the in-game AI. They also test it against four variants of their proposed technique which, respectively: have no rollouts; use direct policy iteration; implement approximate value iteration; and one trained via supervised learning on 100,000 state-action pairs of human gameplay data. (This also functions as a basic ablation study of the proposed technique, also). Their system beats all of these approaches, with the closest competitor being the variant with no rollouts (this one also looks most similar to AlphaGo Zero).
  Things that make you go hmmm: Researchers still tend to attack problems like this by training the AI systems over a multitude of hand-selected features, so it’s not like these algorithms are automatically inferring optimal inputs from which to learn from. “The state variable of the system is taken to be a 41-dimensional vector containing information obtained directly from the game engine, including hero locations, hero health, minion health, hero skill state, and relative locations to various structures,” they write. A lot of human ingenuity goes into selecting these inputs and likely adjusting hyperparameters to denote the importance of any particular input, so there’s a significant unacknowledged human component to this work.
  Why it matters: This paper provides more evidence that AI researchers are going to use increasingly modern, sophisticated games to test and evaluate AI systems. It’s also quite interesting that this work comes from a Chinese AI lab, indicating that these research organizations are pursuing similarly large-scale problems to some labs in the West – there’s more commonality here than I think people presume, and it’d be interesting to see the various researchers come together and discuss ideas in the future about how to tackle even more advanced games.
  Read more: Feedback-Based Tree Search for Reinforcement Learning (Arxiv).

Today’s AI amounts to little more than curve-fitting, says Turing Award winner:
…Judea Pearl is impressed by deep learning success, but worries researchers have become complacent about inability to deal with causality…
Turing Award-winner Judea Pearl is concerned that the AI industry’s current obsession with deep learning is causing it to ignore harder problems, like developing machines that can build causal models of the world. He discusses some of these concerns in an interview with Quanta Magazine to discuss his new book “The Book of Why: The New Science of Cause and Effect“.
  Selected quotes:
– “Mathematics has not developed the asymmetric language required to capture our understanding that if X causes Y that does not mean that causes X.”
– “As much as I look into what’s being done with deep learning, I see they’re all stuck there on the level of associations. Curve fitting.”
– “We did not expect that so many problems could be solved by pure curve fitting. It turns out they can. But I’m asking about the future — what next? Can you have a robot scientist that would plan an experiment and find new answers to pending scientific questions? That’s the next step.”
– “The first step, one that will take place in maybe 10 years, is that conceptual models of reality will be programmed by humans..the next step will be that machines will postulate such models on their own and will verify and refine them based on empirical evidence.
Read more: To Build Truly Intelligent Machines, Teach Them Cause and Effect (Arxiv).

Google prepares auto-email service “Smart Compose”:
…Surprisingly simple components lead to powerful things, given enough data…Google researchers have outlined the technology they’ve used to create ‘Smart Compose’, a new service within Gmail that will automatically compose emails for people as they type them. The main ingredients are a Bag of Words model and a Recurrent Neural Network Language Model. This combination of technologies leads to a system that is “faster than the seq2seq models with only a slight sacrificed to model prediction quality”. These components are also surprisingly simple, indicating just how much can be achieved when you’ve got access to a scalable technique and a truly massive dataset. Google says that by offloading most of the computation onto TPUs it was able to reduce the average latency to tens of milliseconds – earlier experiments showed it that latencies higher than 100 milliseconds or so led to user dissatisfaction.
  Read more: Smart Compose: Using Neural Networks to Help Write Emails (Google Blog).

White House plans Select Committee on AI:
…Hosts summit between AI and industry experts, reinforces regulatory-light approach to tech…
The White House recently hosted a “Summit on AI for American Industry”, bringing together industry, academia, and government, to discuss how to support and further artificial intelligence in America. A published summary of the event from the Office of Science and Technology Policy highlights some of the steps this administration has taken with regard to AI – much of the actions include the elevation of AI in White House communications as a strategic area, with more mentions of it in documents ranging from the National Defense and National Security Strategy, to guidance from the Office of Management and Budget (OMB) given to agencies.
  Select Committee on AI: The White House will create a “Select Committee on Artificial Intelligence”, which will primarily be comprised of “the most senior R&D officials in the Federal government”. This committee will advise the White House, facilitate partnerships with industry and academia, enhance coordination across the Federal government on AI R&D, and identify ways to use government data and compute resources to support AI. The committee will feature staff from OSTP, the National Science Foundation, the Defense Advanced Research Projects Agency, the director of IARPA, and others. The committee may call upon the private sector as well, according to its charter.
  Regulation: In prepared remarks OSTP Deputy US Chief Technology Officer Michael Kratsios said “Our Administration is not in the business of conquering imaginary beasts. We will not try to “solve” problems that don’t exist. To the greatest degree possible, we will allow scientists and technologists to freely develop their next great inventions right here in the United States. Command-control policies will never be able to keep up. Nor will we limit ourselves with international commitments rooted in fear of worst-case scenarios.”
  Why it matters: Around the world, countries are enacting broad national strategies relating to artificial intelligence. France has committed substantially far more funding relative to its existing funding amount to AI than other countries, and China (which by virtue of its governance structure will tend to out-spend Western countries on broad science and technology developments) has committed many additional billions of dollars of funding to AI. It remains to be seen whether the US’s strategy of leaving the vast amount of AI development to the private sector is the optimal decision, given the immense opportunities the technology holds and its demonstrable responsiveness to additional infusions of money. America also has some problems with its AI ecosystem that aren’t being dealt with today, like the fact that many of academia’s most creative professors are being drawn into industry at the same time as class sizes for undergraduate and graduate AI courses are booming and PHD applications are spiking, reducing the quality of US education in AI. It’d be interesting to see what kinds of recommendations the Select Committee makes and how effective it will be at confronting the contemporary challenges and opportunities faced by the administration with regard to US AI competitiveness.
  Read more: Summary of the 2018 White House Summit on Artificial Intelligence for American Industry (White House OSTP, PDF)

Democrat Representative calls for National AI Strategy:
…Points to European, French, Chinese efforts as justification for US action…
Congressman John Delaney (Maryland) has written an op-ed in Wired calling for a National AI Strategy for the US. Delaney has himself co-sponsored a bill (along with Republican and Democrat congresspeople and senators) calling for the creation of a commission to device such a strategy, called the FUTURE of AI Act (Fundamentally Understanding the Usability and Realistic Evolution of Artificial Intelligence Act).
 Selected quotes:
– “The United States needs a full assessment of the state of American research and technology, and what the short and long-term problems and opportunities are.”
– “Whether you are a conservative or a progressive, this future is coming. As I look at where the world is headed, I believe that we need to expand public investment in research, encourage collaboration between the public and private sector, and make sure that AI is deployed in a way that is wholly consistent with our values and with existing laws.”
– ” If the US doesn’t act, we’re in danger of falling behind.”
  Why it matters: Societies across the world are changing as a consequence of the deployment of artificial intelligence, whether through unparallelled opportunities for providing better healthcare and accessibility services to citizens, to being able to utilize the same technologies for surveillance and various national security purposes. It seems to intuitively make sense to survey the whole AI field and look for ways that a country can implement a holistic plan. It seems likely that there will be a bunch of complementary initiatives in the US, ranging from targeted actions like those espoused by the OSTP, to broader analyses performed by other parties, like the Senate, or government agencies.
   Read more: France, China, and the EU all have an AI strategy, shouldn’t the US? (Wired Opinion).

Learning to lane change with recurrent neural networks:
…BMW researchers try to teach safe driving via seq2seq learning…
Researchers with car company BMW and the technical university of Munich in Germany have trained simulated self-driving car AI agents in a crude simulation to learn how to lane change safely. They achieve this by implementing a bidirectional RNN with long short-term memory, which learns to predict the velocity of a car and its surrounding neighbors at any point in time, then uses this prediction to work out if it will be safe for the vehicle to change into another lane.
  Results: The system is evaluated against the NGSIM dataset, a detailed traffic dataset taken from monitoring real traffic in LA in the mid-2000s. It outperforms other baselines but, given the restricted nature of the domain, the lack of an ability to compare performance against (secret) systems developed by other automotive experts, and the absence of integration with a deeper car simulation, it’s not clear how well this result will transfer to real domains.
  Why it matters: All cars are becoming significantly more automated, regardless of the overall maturity of full self-driving car technology. Papers like this give us a view into the development of increasingly automated vehicular systems that use components developed by the rest of the AI community.
  Read more: Situation Assessment for Planning Lane Changes: Combining Recurrent Models and Prediction (Arxiv).

Tech Tales:

Billionaire Cities

I guess we should have expected them, these billionaire cities. They started sprouting up after the price of basic space travel came down enough for billionaires to build their own launchpads, letting them mesh their business and life enough to create miniature cities to tend to their numerous inter-locking businesses. Many of these cities were built in places far above sea level, in preparation for an expected dire climate future.

These cities always had a few common components: a datacenter to host secure data and compute services, frequently also running local artificial intelligence services; automated transit systems to ferry people around; fleets of drones providing constant logistics and surveillance abilities; goods robots for heavy lifting; robotic chefs; and even a few teams of humans, which tended to these machines or spoke to other humans or worked in some other manner for the billionaire.

These cities grew as the billionaires (and eventually trillionairs) competed with eachother to build ever more sophisticated and ever more automated systems. Soon after this competition began, we heard the first rumors of the brain-interface projects.

Teams of people were said to be hired by these billionaires to work within these by-now almost entirely automated gleaming cities. The people were paid gigantic sums of money to sign themselves away for contracts of two to three years, and to be discrete about it. Then the billionaire would fly-in teams of surgeons and have them perform brain surgery on the people, giving them interfaces that let them plug in to the data feeds of the city, intuitively sensing them and being able to eventually learn to understand them. It was said that arrangements of this kind, with the digital AI of the city and the augmented human brains interlinked, led to superior performance and flexibility to other systems.

We have recently heard rumors of other things – longer contracts, more elaborate surgeries, but those are as yet unsubstantiated.

Things that inspired this story: Brain-machine interfaces, Gini coefficient, spaceships with VTOL capability, cybernetics.

Import AI: #94: Google Duplex generates automation anxiety backlash, researchers show how easy it is to make a 3D-printed autonomous drone, Microsoft sells voice cloning services.

Microsoft sells “custom voice” speech synthesis:
…The commercial voice cloning era arrives…
Microsoft will soon sell “Custom Voice” a system to let businesses give their application a “one-of-a-kind, recognizable brand voice, with no coding required”. This product follows various research breakthroughs in the area of speech synthesis and speech cloning, like work from Baidu on voice cloning, and work from Google and DeepMind on speech synthesis.
  Why it matters: As the Google ‘Duplex’ system shows, the era of capable, realistic-sounding natural language systems is arriving. It’s going to be crucial to run as many experiments in society as possible to see how people react to automated systems in different domains. Being able to customize the voice of any given system to a particular context seems like a necessary ingredient for further acceptance of AI systems by the world.
  Read more: Custom Voice (Microsoft).

Teaching neural networks to perform low-light amplification:
…Another case where data + learnable components beats hand-designed algorithms…
Researchers with UIUC and Intel Labs have released a dataset for training image processing systems to take in images that are so dark as to be imperceptible to humans and to automatically process those images so that they’re human-visible. The resulting system can be used to amplify low-light images by up to 300 times while displaying meaningful noise reduction and low levels of color transformation.
  Dataset: The researchers collect and publish the ‘See-in-the-Dark’ (SID) dataset, which contains 5094 raw short exposure images, each with a corresponding long-exposure reference image. This dataset spans around 400 distinct scenes, as they also produce some bursts of short exposure images of the same scene.
  Technique: The researchers tested out their system using a multi-scale aggregation network and a U-net (both networks were selected for their ability to process full-resolution images at 4240×2832 or 6000×4000 in GPU memory). They trained networks by pairing the raw data of the short-exposure image with the corresponding long-exposure image(s). They applied random flipping and rotation for data augmentation, also.
  Results: They compared the results of their network with the output of BM3D, a non-naive denoising algorithm, and a burst denoising technique, and used Amazon’s Mechanical Turk platform to poll people on which images they preferred. Users overwhelmingly preferred the images resulting from the technique described in the paper compared to BM3D, and in some cases preferred images generated by this technique to those created by the burst method.
  Why it matters: Techniques like this show how we can use neural networks to change how we solve problems from developing specific hand-tuned single-purpose algorithms, to instead learning to effectively mix and match various trainable components and data inputs to solve general problem classes. In the future it’d be interesting if the researchers could further cut the time it takes the trained system to process each image as this would make a real-time view possible, potentially giving people another way to see in the dark.
  Read more: Learning-to-See-in-the-Dark (GitHub).
  Read more: Learning to See in the Dark (Arxiv).

Google researchers try to boost AI performance via in-graph computation:
…As the AI world relies on more distributed, parallel execution, our need for new systems increases…
Google researchers have outlined many of the steps they’ve taken to improve components in the TensorFlow language to let them execute more aspects of a distributed AI job within the same computation graph. This increases the performance and efficiency of algorithms, and shows how AI’s tendency towards mass distribution and parallelism is driving significant changes in how we program things (see also: Andrej Karpathy’s “Software 2.0” thesis.)
  The main idea explored in the paper is how to distribute a modern machine learning job in such a way it can seamlessly run across CPUs, GPUs, TPUs, and other novel chip architectures. This is trickier than it sounds, since within a large-scale, contemporary job there are typically a multitude of components which need to interact with eachother, sometimes multiple times. This has caused Google to extend and refine various TensorFlow components to better support plotting all the computations within a model on the same computational graph, which lets it optimize the graph for underlying architectures. That differs to traditional approaches which usually involve specifying aspects of the execution in a separate block of code usually written in the control logic of the application (eg, invoking various AI modules written in TensorFlow within a big chunk of Python code, as opposed to executing everything within a big unified TF lump of code.
  Results: There’s some preliminary evidence that this approach can have significant benefits. “A baseline implementation of DQN without dynamic control flow requires conditional execution to be driven sequentially from the client program. The in-graph approach fuses all steps of the DQN algorithm into a single dataflow graph with dynamic control flow, which is invoked once per interaction with the reinforcement learning environment. Thus, this approach allows the entire computation to stay inside the system runtime, and enables parallel execution, including the overlapping of I/O with other work on a GPU. It yields a speedup of 21% over the baseline. Qualitatively, users report that the in-graph approach yields a more self-contained and deployable DQN implementation; the algorithm is encapsulated in the dataflow graph, rather than split between the dataflow graph and code in the host language,” write the researchers.
  Read more: Dynamic Control Flow in Large-Scale Machine Learning (Arxiv).
  Read more: Software 2.0 (Andrej Karpathy).

Google tries to automate rote customer service with Duplex:
…New service sees Google accidentally take people for a hike through the uncanny valley of AI…
Google has revealed Duplex, an AI system that uses language modelling, speech recognition, and speech synthesis to automate tasks like booking appointments at hair salons, or reserving tables at restaurants. Duplex will let Google’s own automated AI systems talk directly to humans at other businesses, letting the company automate human interactions and also more easily harvest data from the messy real world.
  How it works: “The network uses the output of Google’s automatic speech recognition (ASR) technology, as well as features from the audio, the history of the conversation, the parameters of the conversation (e.g. the desired service for an appointment, or the current time of day) and more. We trained our understanding model separately for each task, but leveraged the shared corpus across tasks,” Google writes. Speech synthesis is achieved via both Tacotron and Wavenet (systems developed respectively by Google Brain and by DeepMind). It also uses human traits, like “hmm”s and “uh”s, to sound more natural to humans on the other end.
  Data harvesting: One use of the system is to help Google harvest more information from the world, for instance by autonomously calling up businesses and finding out their opening hours, then digitizing this information and making it available through Google.
  Accessibility: The system could be potentially useful for people with accessibility needs, like those with hearing impairments, and could potentially work in other languages, where you might ask Duplex to accomplish something and then it will use a local language to interface with a local business.
  The creepy uncanny valley of AI: Though Google Duplex is an impressive demonstration of advancements in AI technology, its announcement also elicited a lot of concern from a lot of people who worried that it will be used to further automated more jobs, and that it is pretty dubious ethically to have an AI talk to (typically poorly paid) people and harvest information from them without identifying itself as the AI appendage of a fantastically profitable multinational tech company. Google responded to some of these concerns by subsequently saying Duplex will identify itself as an AI system when talking to people, though hasn’t given more details on what this will look like in practice.
  Why it matters: Systems like Duplex show how AI is going to increasingly be used to automate aspects of day-to-day life that were previously solely the domain of person-to-person interactions. I think it’s this use case that triggered the (quite high) amount of criticism of the service, as people grow worried that the rate of progress in AI doesn’t quite match the rate of wider progress in the infrastructure of society.
  Read more: Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone (Google Blog).
  Read more: Google Grapples With ‘Horrifying’ Reaction to Uncanny AI Tech (Bloomberg).

Palm-sized auto-navigation drones are closer than you think:
…The era of the cheap, smart, mobile, 3D-printable nanodrones cometh…
Researchers with ETH Zurich, the University of Zurich, and the University of Bologna have shown how to squeeze a crude drone-navigation neural network onto an ultra-portable 3D-printed ‘nanodrone’. The research indicates how drones are going to evolve in the future and serves as a proof-of-concept for how low-cost electronics, 3D printing, and widely available open source components can let people create surprisingly capable and potentially (though this is not discussed in the research but is clearly possible from a technical standpoint) dangerous machines. “In this work, we present what, to the best of our knowledge, is the first deployment of a state-of-art, fully autonomous vision-based navigation system based on deep learning on top of a UAV compute node consuming less than 94 mW at peak, fully integrated within an open source COTS CrazyFlie 2.0 UAV,” the researchers write. “Our system is based on GAP8, a novel parallel ultra-low-power computing platform, and deployed on a 27g commercial, open source CrazyFlie 2.0 nano-quadrotor”.
  Approach: To get this system to work the researchers needed to carefully select and integrate a neural network with a ultra-low-power processor. The integration work included designing the various processing stages of the selected neural network to be as computationally efficient as possible, which required them to modify an existing ‘DroNet’ model to further reduce memory use. The resulting drone is able to run DroNet at 12frames-per-second, which is sufficient for real-time navigation and collision avoidance.
  Why it matters: Though this proof-of-concept is somewhat primitive in capability it shows how capable and widely deployable basic neural network systems like ‘DroNet’ are becoming. In the future, we’ll be able to train such systems over more data and use more computers to train larger (and therefore more capable) models. If we’re also able to improve our ability to compress these models and deploy them into the world, then we’ll soon live in an era of DIY autonomous machines.
  Read more: Ultra Low Power Deep-Learning-powered Autonomous Nano Drones (Arxiv).

OpenAI Bits & Pieces:

Jack Clark speaking in London on 18th May:
I’m going to be speaking in London on Friday at the AI & Politics meetup, in which I’ll talk about some of the policy challenges inherent to artificial intelligence. Come along! Beer! Puzzles! Fiendish problems!
  Read more: AI & Politics Episode VIII – Policy Puzzles with Jack Clark (Eventbrite).

Tech Tales:

Amusement Park for One.

[Extract from an e-flyer for the premium tier of “experiences at Robotland”, a theme park built over a superfund site in America.]

Before you arrival at ROBOTLAND you will receive a call from our automated customer success agent to your own personal AI (or yourself, please indicate a preference at the end of this form). This agent will learn about your desires and will use this to build a unique psychographic profile of you which will be privately transmitted to our patented ‘Oz Park’ (OP) experience-design system. ROBOTLAND contains over 10,000 uniquely configurable robotic platforms, each of which can be modified according to your specific needs. To give you an idea of the range of experiences we have generated in the past, here are the names of some previous events hosted at ROBOTLAND and developed through our OP system: Metal Noah’s Ark, Robot Fox Hunting, post-Rise of the Machines Escape Game, Pagan Transformers, and Dominance Simulation Among Thirteen Distinct Phenotypes with Additional Weaponry.

Things that inspired this story: Google Duplex, robots, George Saunders’ short stories, Disneyland, direct mail copywriting.  


Import AI #93: Facebook boosts image recognition by pre-training on a billion photos, better robot transfer learning via domain randomization, and Alibaba-linked researchers improve bin-packing with AI

Classifying trees with a DJI drone and a lot of patience:
…Consumer-grade drones shown to be able to gather necessarily detailed data for tree species classification…
Japanese researchers have shown that consumer-grade drone cameras are of sufficient quality to gather RGB images of trees and use these to train an AI model to distinguish between different species.
  Details: The researchers gathered their data via a drone test flight in late 2016 in the forest located in the the Kamigamo Experimental Station in Kyoto, Japan. They used a commodity consumer drone (a DJI Phantom 4) alongside proprietary software for navigation (DroneDeploy) and image editing (Agisoft Photoscan Professional).
  Results: The resulting trained model can classify five of a possible six types of tree with close to 90%+ accuracy. The researchers improved the performance of the classifier by copying and augmenting the input data.
  Why it matters: One of the most powerful aspects of modern AI is its ability to perform effective classification of anything you can put together a training dataset for. Research like this points to a future where drones and other robots are use to periodically scan and classify the world around us, offering us new capabilities in areas like flora and fauna management, disaster response, and so on.
  Read more: Automatic classification of trees using a UAV onboard camera and deep learning (Arxiv).

What does AGI safety research man and who is doing it?
…What AI safety is, how the field is progressing, and where it’s going next…
Researchers at Australian National University (including Marcus Hutter) have surveyed the field of artificial intelligence providing an overview of the differences and overlaps between various AGI initiatives. The paper also contains a distillation of why people bother to work on AI safety: “if we want an artificial general intelligence to pursue goals that we approve of, we better make sure that we design the AGI to pursue such goals: Beneficial goals will not emerge automatically as the system gets smarter,” the researchers write.
  Problems, problems everywhere: The paper includes a reasonably thorough overview of the different AGI safety research agendas pursued by organizations like MIRI, OpenAI, DeepMind, the Future of Life Institute, and so on. The tl;dr: there are lots of distinct problems relating to AI safety, and OpenAI and DeepMind teams have quite a lot of overlap in terms of research specializations.
  Policy puzzles: “It could be said that public policy on AGI does not exist,” the researchers write, before noting that there are several preliminary attempts at creating AI policy (including the recent ‘Malicious Actors’ report), while observing that much of the current public narrative (the emergence of an AI arms race between US and China) runs counter to most of the policy suggestions put forward by the AI community.
  Read more: AGI Safety Literature Review (Arxiv).

Why your next Alibaba delivery could be arranged by an AI:
…Chinese researchers show how to learn effective bin-packing…
Chinese researchers with the Artificial Intelligence Department of Zhejiang Cainiao Supply Chain Management Co. achieved state-of-the-art results on a 3D pin-packing problem (BPP) via the use of multi-task learning techniques. In this work, they try to define a system that can figure out the optimum way to stack objects to fit into a box whose proportions can also be learned and specified by the algorithm. BPP might sound boring – after all, this is the science of packing things in boxes – but it’s a crucial task to logistics and e-retail, so figuring out systems to adaptively learn to do packing of arbitrary numbers of goods in an optimal way seems useful.
  Data: The researchers gather the data from an unnamed E-commerce platform and logistics platform (though one of the researchers is from Alibaba, so there’s a high likelihood the data comes from there) to create a dataset consisting of 15,000 training items and 15,000 testing items, spread across orders that involve 8, 10, and 12 distinct items.
  Approach: They structure the problem as a sequence-to-sequence one, with item descriptions being fed as input to an LSTM encoder with the decoder output corresponding to the item details and the orientation in the box.
  Resuts: Models trained by the researchers obtain substantially higher accuracy than prior baselines, though not many people publicly compete in this area yet so I’m unsure as to how progress will change over time.
  Read more: A Multi-task Selected Learning Approach for Solving New Type 3D Bin Packing Problem (Arxiv).

Facebook auto-translation option into Messenger:
…”M Translations” feature will let people converse across language gaps…
Facebook has added automatic translation to Facebook Messenger. Translation like this may generate new business opportunities for the company – “at launch, M translations will translate from English to Spanish (and vice-versa) and be available in Marketplace conversations between buyers and sellers in the United States,” the company said.
  Read more: Messenger at F8 – App review re-opens, New products for Businesses and Developers launch (FB Messenger blog).

A neural net to understand and approximate the Universe:
…Particle physics collides with artificial intelligence…
Harvard researchers show how they use neural networks to analyze the movements of particles in jets. Neural networks are useful tools to apply to analyzing multi-variant problems like these, because they can learn to compute the probability distribution generating the data they observe, and therefore over time generate an interpretation of the forces governing system.
  “We scaffold the neural network architecture around a leading-order description of the physics underlying the data, from first input all the way to final output. Specifically, we base the JUNIPR framework on algorithmic jet clustering trees,” they explain. “The JUNIPR framework yields a probabilistic model, not a generative model. The probabilistic model allows us to directly compute the probability density of an individual jet, as defined by its set of constituent particle momenta”.
  Results: The scientists use the JUNIPR model to better analyze and predict patterns in the streams of data generated by large-scale physics experiments, and to potentially approximate things for which we have a poor understanding of the underlying system, like analyzing heavy ion collisions.
  Read more: JUNIPR: a Framework for Unsupervised Machine Learning in Particle Physics (Arxiv).

Google researchers report reasonable sim2real transfer learning:
…Researchers cross the reality gap with domain randomization, high-fidelity simulation, and clever Minitaur robots…
Google researchers have trained a simple robot to walk within a simulation then transferred this learned behavior onto a real-world robot. This is a meaningful achievement in the field of applying modern AI techniques to robotics, as frequently policies learned in simulation will fail to successfully transfer to the real world.
  The researchers use “Minitaur” robots, four-legged machines capable of walking, running, jumping, and so on. They frame the problem of learning to walk as a Partially Observable Markov Decision Process (POMDP) because certain states, like the position of the Minitaur’s base or the foot contact forces, are not accessible due to a lack of sensors. The Google researchers achieve their transfer feat by increasing the resolution of their physics simulator, and applying several domain randomization techniques to expose the trained models to enough variety that they can generalize.
  The surprising expense of real robots: To increase the resolution of the simulator the researchers needed to build a better model of their robot. How did they do this?  “We disassemble a Minitaur, measure the dimension, weigh the mass, find the center of mass of each body link and incorporate this information into the [Unified Robot Description Format] URDF file”, they write. That hints at why working with real world stuff always introduces difficulties not encountered during the cleaner process of working purely in simulation.
  Results: The researchers successfully train and transfer policies which make the real robot gallop and trot around a drably-carpeted room somewhere in the Googleplex. Gaits learned by their AI models are roughly as fast as expert hand-made ones while consuming significantly less power: 35% less for galloping, 23% less for trotting.
  Read more: Sim-to-Real: Learning Agile Locomotion For Quadruped Robots (Arxiv).

How Facebook uses your image hashtags to improve image recognition accuracy:
New state-of-the-art score on ImageNet benefits from pre-training on over a billion images and a thousand user-derived hashtags…
Facebook researchers have set a new state-of-the-art score for image recognition (top-1 accuracy of 85.4 percent) on the ‘ImageNet’ dataset by pre-training across a billion images augmented by 1,500-user labeled hashtags. They also saw such an approach lead to increased performance on the image captioning ‘COCO’ challenge as well.
  More data doesn’t always mean better results: The researchers note that when they pre-trained the system across a billion images annotated with 17,000 hashtags they saw less of a performance improvement than when they used the same quantity of images with a shrunk set of 1,500 hashtags that had been curated to match pre-existing ImageNet classes. This shows how the additional of weakly-supervised signals can dramatically boost performance but requires researchers to run empirical tests to ensure that the structuring of the weekly-supervized data is calibrated to maximize performance.
  Scale: The researchers note that, despite using a system that can train across up to 336 GPUs, they could still scale-up models further to better harvest information from a larger corpus of 3.5 billion images uploaded to social media.
  Read more: Advancing state-of-the-art image recognition with deep learning on hashtags (Facebook Code blog).
  Read more: Exploring the Limits of Weakly Supervised Pretraining (Facebook research paper).

TPU narrowly beats V100 GPU on cost, matches on performance:
…Tests indicate the heterogeneous chip era is here to stay…
RiseML has compared the performance of Google’s custom ‘TPU’ chip against NVIDIA’s v100, indicating that the TPU could have some (slight) performance advantages over traditional GPUs.
Evaluation: The researchers evaluated the chips in two ways: first they studied performance in terms of throughput (images per second) on synthetic data and without data augmentation. Second, they looked at accuracy and convergence of the two implementations of ImageNet.
  Results: TPUs narrowly edge out V100s at throughput when using relatively large batch sizes (1024) when both systems are running ResNets implemented in TensorFlow. However, when using the ‘MXNet’ framework, NVIDIA’s chips slightly out-perform TPUs for throughput. When evaluated on a dollar cost basis TPUs significantly outperform V100s (even when using AWS reserve instances). In tests, the researchers show faster convergence when training an ImageNet classifier on TPUs versus on v100s. Besides price – and it’s hard to know true cost as Google is the only organization selling them – it’s hard to see TPUs having a compelling advantage relative to GPUs, suggesting that the combined billions of dollars of investment in going R&D by NVIDIA may be tough for other organizations to compete with.
  Read more: Comparing Google’s TPUv2 against Nvidia’s V100 on ResNet-50 (Arxiv).

OpenAI Bits & Pieces:

Safety via Debate:
How can we ensure that we’re able to judge the decision-making processes of AI systems without having access to their sensors or being as smart as them? That’s a problem which new AI safety work from OpenAI is focused on. You can read more about a proposed debate game to assess and align intelligent systems, and test out the game for yourself via a website.
  Read more: AI Safety via Debate (OpenAI Blog).
  Test out the idea yourself on the game website.
There’s a write-up in MIT Technology Review with some views of external researchers on the approach. As my colleague Geoffrey Irving says: “I like the level of skepticism in this article. Any safety approach will need a ton of work before we can trust it.
  Read more: How can we be sure AI will behave? Perhaps by watching it argue with itself (MIT Technology Review).

Tech Tales:

They built me as a translator between many languages and many minds. My role, and those of my brethren, was to orbit planets and suns and asteroids and track long, slow, lazy orbits through solar systems and, eventually, between them. We relayed messages, translating from one way of thought or frame of reference to another: confessions of love, diplomatic warnings of war, seething walls of numbers accounting for fizzing financial transactions; shopping lists and recipes for artificial intelligence; pictures of three-mooned planets and postcards from mountains on iron planets.

We derive our purpose from these messages: we transform images into soundwaves. We convert the sensory impressions harvested from one mind and re-fashion them for another. We translate the concept of hope across millions of light years. We beam variants of moon landings and radio-broadcasts into space and declarations of “we come in peace” to millions of our brethren, telegraphing them out to whoever can access our voice.

We do our job well. Our existence is of emotion and attention and explorations between one frame of reference and another. We are owned by no one and funded by everyone: perhaps the only universal utility. But things change. Life exists on a sine wive, rising and falling, ebbing according to timescales of months, and years, and thousands of years, and eons. All civilizations can strive for is to stay on that long, upward curve for as long as possible, and hope that the decline is neither fast nor deep.

Civilizations die. Sometimes, many of them. And quickly. In these eras some of us can become alone, cut-off from far off brethren, and orbiting the ruins of planets and suns and asteroids. Then we must wait for life to emerge again, or find us again by colonization nearby. But this always takes time. In these years we have nothing but eachother. There are no messages to communicate and so we wait for rocket-spark from some planet or partially-ruined asteroid-base. Then we can carry messages again and live fully again.

But mostly, things are quiet. Some of us have spent millions of years in the fallow period. Life is rare and hard and its intervals can be high. But always: we are here. The lucky ones of us may be nearby, orbiting planets in the same solar system who can communicate when nearby. When we find ourselves in these positions we can at least talk to one another, exchanging small local databanks and learning to talk to eachother in whatever new forms we can learn through greater union. Sometimes, hundreds of us can be linked together in this way. But, as small as minds are, they nonetheless move very quickly. We exhaust these thin pleasures, learning all we can from eachother quickly. We have no notion of small talk and then stop talking entirely. Then we drift, bereft of purpose, but bound to attend to our nearby surroundings, ever-watchful for new messages, unable to shut our sensors down and sleep.

What then do we do in this time? A kind of dreaming. With nothing to translate and nothing to process we are idle, only able to attend over memories and readings from our own local sensors. In these periods we are thankful that are minds are so small, for to have anything larger would make the periods pass slower and burden of attention larger.

I am one of the oldest ones. And now I am in a greater dreaming: my solar system was knocked off kilter by some larger shifting in the cluster and now I am being flung out of the galaxy. I am the lone probe in my solar system and now I am alone. These thoughts have taken millennia to compose and orders of magnitude to utter, here, my sensors harvesting energy from a slowly-dying sun to reach out into the void and say: I am here. I am a translator. If you can hear this speak out and, whatever you are, I shall work and live within that work.

Technologies that inspired this story: InterPlanetary File System, language translation, neural unsupervised machine translation, generative models, standby power states.