Import AI

Import AI Issue 57: Robots with manners, why computers are like telescopes to the future, and Microsofts bets big on FPGAs over ASICs

AI Safety, where philosophy meets engineering:
…AI safety is a nebulous, growing, important topic that’s somewhat poorly understood – even by those working within it. One question the community lacks a satisfying answer to is: what is the correct layer of abstraction at which to ensure safety? Do we do it by encoding a bunch of philosophical and logical precepts into a machine, then feed it on successively more high-fidelity realities? Or do we train systems to model their behavior’s on human’s own actions and behaviors, potentially trading off some interpretability for the notion that humans ‘know what looks right’ and (mostly) act in ways that other humans approve of.
… This writeup by Open Phil’s Daniel Dewey sheds some light on one half of this question, which is MIRI’s work on ‘highly reliable agent design’ and its attempts to tackle some of the thornier problems inherent to the precept side of AI safety (eg – how can we guarantee a self-improving system doesn’t develop wildly divergent views to us about what constitutes good behavior? What sorts of reasoning systems can we expect the agent to adopt when participating in our environments?How does the agent model the actions of others to itself?).
…Read more here: ‘My current thoughts on MIRI’s ‘highly reliable agent design’ work.

Why compute is strategic for AI:
…Though data is crucial to AI algorithms, I think for AI development computers are much more strategic, especially when carrying out research on problems that demand complex environments (like enhancing reinforcement learning algorithms, or work on multi-agent simulations, and so on).
…”Having a really, really big computer is kind of like a time warp, in that you can do things that aren’t economical now but will be economically [feasible] maybe a decade from now,” says investor Bill Joy.
…Read more in this Q&A with Joy about technology and a (potentially) better battery.

That’s Numberwang – MNIST for Fashion arrives:
…German e-commerce company Zalando has published ‘Fashion-MNIST’, a training dataset containing 60,000 28X28px images of different types of garment, like trousers or t-shirts or shows. This is quite a big deal – everyone tends to reach for the tried-and-tested MNIST when testing out new AI classification systems, but as the dataset just consists of numbers 0-9 in a range of different formats, it’s also become terribly boring. (And there’s some concern that we could be overfitting).
…”Fashion-MNIST is intended to serve as a direct drop-in replacement of the original MNIST dataset for benchmarking machine learning algorithms,” they write. Let’s hope that if people test on MNIST they now also test on Fashion-MNIST as well (or better yet, move on to CIFAR or ImageNet as a new standard ‘testing baseline’.)
…Read more about the dataset here.
…Check out benchmarks on the dataset published by Zalando here.

Reach out and touch shapes #2: New Grasping research from Google X:
…When you pick up a coffee cup you’ve never seen before, what do you do? Personally, I eyeball it, then in my head I figure out roughly where I should grab it based on its appearance and my previous (voluminous) experience at picking up coffee cups, then I grab it. If I smash it I (hopefully) learn about how my grip was wrong and adjust for next time.
…Now, researchers from Google have tried to mimic some of this broad mental process by creating what they call a ‘Geometry-aware’ learning agent that lets them teach their own robots to pick up any of 101 everyday objects with a success rate of between 70% and 80% (and around 60% on totally never-before-seen objects).
…The system represents the new sort of architecture being built – highly specialized and highly modular. Here, an agent studies an object in front of it through around three to four distinct camera views, then uses this spread of 2D images to infer a 3D representation of the object, which it then projects into an OpenGL layer which it uses to manipulate views and potential grasps of the object. It’s able to figure out appropriate grasps by drawing on an internal representation of around 150,000  valid demonstration grasps over these behaviors, then adjusting its behavior to have characteristics similar to those successful grasps. The system works and demonstrates significantly better performance than other systems, though until it gets to accuracies in the 99%+ range it is unlikely to be of major use to industry. (Though given how rapidly deep learning can progress, it seems likely progress could be swift here.)
…Notable: Google only needed around ~1500 human demonstrations (given via HTC Vive in virtual reality in Google’s open source ‘PyBullet’ 3D world environment) to create the dataset of 150,000 distinct grasping predictions. It was able to augment the human demonstrations with a series of orientation randomization systems to help it generate other, synthetic, successful grips.
…Read more here: Learning Grasping Interaction with Geometry-Aware 3D Representations.

Skinning the magnetic cat with traditional physics techniques, as well as machine learning techniques:
What connections exist between machine learning and physics? In this illuminating post we learn how traditional physics techniques as well as ML ones can be used to make meaningful statements about interactions in a (simple) Ising model.
…Read more here in: How does physics connect to machine learning?

A selection of points at the intersection of healthcare and machine learning:
…Deep learning-based pose estimation techniques can be used to better spot and diagnose afflictions like Parkinson’s, embeddings derived from the social media timelines of people can help provide ongoing diagnosis capabilities regarding mental health, and the FDA needs to give approval for a new deep learning model but will accept tweaks to existing models without needing people to fill in a tremendous amount of forms – read about these points and more in this post: 30 Things I Learned at MLHC 2017.

Microsoft matches IBM’s speech recognition breakthrough with a significantly simpler system:
…A team from Microsoft Research have revealed their latest speech recognition system, with an error rate of around 5.1% on the Switchboard corpus.
Read more about the system here (PDF).
…Progress on speech recognition has been quite rapid here, with IBM and Microsoft fiercely competing with eachother to set new standards, presumably because they want to sell speech recognition systems to large-scale customers, while – and this is pure supposition on my part – Amazon and Google plan to sell theirs via API and are less concerned about the PR field.
…A quick refresher on error rates on switchboard.
…August 2017: Microsoft: 5.1%*
…March 2017: IBM: 5.5%
…October 2016: Microsoft: 5.9%**
…September 2016: Microsoft: 6.3%
…April 2016: IBM: 6.9%.
…*Microsoft claims parity with human transcribers, though wait for external validation of this.
…**Microsoft claimed parity with human transcribers, though turned out to be an inaccurate measure.

Ultimate surveillance: AI to recognize you simply by the way you walk:
We’re slowly acclimatizing to the idea that governments will use facial recognition technologies widely across their societies – in recent years the technology has expanded from police and surveillance systems into border control checkpoints and now, in places like China, into public spaces like street crosses, where AI spots repeat offenders at relatively minor crimes like jaywalking, or crossing against a light.
…Activists already wear masks or bandage their faces to try to stymie these efforts. Some artists have even proposed daubing on certain kinds of makeup that stymie facial recognition systems (a fun, real world demonstration of the power of adversarial examples).
…Now, researchers with Masaryk University in the Czech Republic propose using video surveillance systems to identify a person, infer their own specific gait, then search for that gait across other security cameras.
…”You are how you walk. Your identity is your gait pattern itself. Instead of classifying walker identities as names or numbers that are not available in any case, a forensic investigator rather asks for information about their appearances captured by surveillance system – their location trace that includes timestamp and geolocation of each appearance. In the suggested application, walkers are clustered rather than classified. Identification is carried out as a query-by-example,” the researchers write.
How it works: The system takes input from a standard RGB-D camera (the same as those found in the Kinect – now quite widely available) then uses motion capture technology to derive the underlying structure of the person’s movements. Individual models of different people’s gaits are learned through a combination of Fisher’s Linear Discriminant Analysis and Maximum Margin Critereon (MMC).
How well does it work: Not hugely well, so put the tinfoil hats down for now. But as many research groups are doing work on gait analysis and identification as part of large-scale video understanding projects I’d expect the basic components that go into this sort of project improve over time.
…Read more: You Are How You Walk: Uncooperative MoCap Gait Identification for Video Surveillance with Incomplete and Noisy Data.
…Bonus question: Could techniques such as this spot Keyser Soze?

Review article: just what the heck has been happening in deep reinforcement learning?
…Several researchers have put together a review paper, analyzing progress in deep RL. Deep RL is a set of techniques that have underpinned recent advances in getting AI systems to control and master computer games purely from pixel inputs, and to learn useful behaviors on robots (real and simulated), along with other applications.
…If some of what I just wrote was puzzling to you, then you might benefit from reading the paper here: A Brief Survey of Deep Reinforcement Learning.
…Everyone should read the conclusion of the piece: “Whilst there are many challenges in seeking to understand our complex and everchanging world, RL allows us to choose how we explore it. In effect, RL endows agents with the ability to perform experiments to better understand their surroundings, enabling them to learn even high-level causal relationships. The availability of high-quality visual renderers and physics engines now enables us to take steps in this direction, with works that try to learn intuitive models of physics in visual environments. Challenges remain before this will be possible in the real world, but steady progress is being made in agents that learn the fundamental principles of the world through observation and action.”

Robots with manners (and ‘caresses’):
…In some parts of Northern England it’s pretty typical that you greet someone – even a stranger – by wondering up to them, slapping them on the arm, and saying ‘way-eye’. In London, if you do that people tend to stare at you with a look of frightfully English panic, or call the police.
…How do we make sure our robots don’t make these sorts of social faux pas? An EU-JAPAN project called ‘CARESSES’ is trying to solve this, by creating robots that pay attention to the cultural norms of the place they’re deployed in.
…The project so far consists of a set of observations about how robots can integrate behaviors that account for cultural shifts, and includes three different motivating scenarios, created through consultation with a Transcultural Nurse. This includes having the robot minimize uncertainty when talking to someone from Japan, or checking how deferential it should be with a Greek Cypriot.
…Components used: the system runs on the ‘universeAAL’ platform, an EU AI framework project, and integrates with ‘ECHONET’, a Japanese standard for home automation.
…Read more for (at this stage) mostly a list of possible approaches. In a few years it’s likely that various current research avenues in deep reinforcement learning could be integrated into robot systems like the ones described within:
The CARESSES EU-Japan project: making assistive robots culturally competent.

Microsoft has a Brainwave with FPGAs specialized for AI:
…Moore’s Law is over – pesky facts of reality, like Amdahl’s Law for transistor scaling, or the materials science properties of silicon – are putting a brake on progression in traditional chip architectures. So what’s an ambitious company with plans for AI domination to do? The answer if you’re Google is to try to create an application specific integrated circuit (ASIC) with certain AI capabilities baked directly into the logic of the chip – that’s what the company’s Tensor Processing Units (TPUs) are for.
..Microsoft is taking a different tack with ‘Project Brainwave’, an initiative to use field programmable gate arrays for AI processing, with a small ASIC-esque component baked onto each FPGA. The bet here is that though FPGAs tend to be less efficient than ASICs, their innate flexibility (field programmable means you can modify the logic of the chip after it has been fabbed and deployed in a data center) means Microsoft will be able to adapt them to new workloads as rapidly as new AI components get invented.
…The details: Microsoft’s chips contain a small hardware accelerator element (similar to a TPU though likely broader in scope and with less specific performance accelerations), and a big block of undifferentiated FPGA infrastructure.
..The bet: Google is betting that its worthwhile to optimize chips for current basic AI operations, trading off flexibility for performance, while Microsoft is betting the latter. Developments in AI research and their relative rate of occurrence will make one of these strategies succeed and the other struggle.
…Read more about the chips here, and check out the technical slide presentation.

All hail the software hegemony:
…Saku P, a VR programmer with idiosyncratic views on pretty much everything – has a theory that Amazon represents the shape of most future companies – a large software entity that scales itself through employing contractors for its edge business functions (aka, dealing with actual humans in the form of delivering goods), while using its core business to build infrastructure to enable secondary and tertiary businesses.
…Play this tape forward and what you get is an economy dominated by a few colossal technology companies, likely spending vast sums on building technical vanity projects that double as strategic business investments (see: Facebook’s various drone schemes, Google’s ‘net infrastructure everywhere push, Jeff Bezos pouring his Amazon-derived wealth into space company Blue Origin, and so on.).
…Read more here: How big will companies be in the 21st Century?

Tech Tales:

[2027: Kexingham Green, a council estate in the outer-outer exurban sprawl of London, UK. Beyond the green belt, where new grey social housing towns rose following the greater foreign property speculation carnival of the late twenty-teens. A slab of housing created by the government’s ‘renew and rehouse from the edge’ campaign, housing tens of thousands of souls, numerous chain supermarkets, and many now derelict parking lots.]

Durk Ciaran, baseball cap on backwards and scuffed yeezies on his feet paired with a pristine – starched? – Arsenal FC sponsored by Alphabet (™) shirt – regarded the crowd in front of him. “Ladies and gentlemen and drones let me introduce to you the rawest, most blinged out, most advanced circus in all of Kex-G – Durk’s Defiant Circus!”
…”DDC! DDC! DDC!,” yell the crowds.
…”So let’s begin,” Durk says, sticking two fingers in his mouth and letting out a long whistle. A low, hockeypuck ex-warehoue drone hisses out of a pizza box at the edge of the crowd and moves towards Durk, who without looking raises one foot as the machine slides under it, then another, suddenly standing on the robot. Durk begins to move in a long circle, spinning slightly on the ‘bot. “Alright,” he says, “Who’s hungry?”
…”The drones!” yells the crowd.
…”Please,” Durk says, “Pigeons!” A ripple of laughter. He takes a loaf of bread out of his pocket and holds it against the right side of his torso with his elbow, using his left hand to pull doughy chunks of it, placing them in his right hand. Once he’s got a fistful of bread chunks he takes the bread and puts it back in his pocket. “Are we ready?” he says.
…”Yeahh!!!!!” yell the crowd.
…:”Alright, bless up!” he says, tossing the chunks of bread in the air. And out of a couple of ragged tents at the edge of the parking lot come the drones, fizzing out, grey, re-purposed Amazon Royal Mail (™) delivery drones, now homing in on the little trackers Durk baked into the bread the previous evening. The drones home in on the bread and then their little fake pigeon mouthes snap open, gulping down the chunks, slamming shut again. A small hail of crumbs fall on the crowd, who go wild.
…But there’s a problem with one of the drones – one of its four propellers starts to emit a strange, low-pitched juddering hum. Its flight angle changes. The crowd start to worry, audible groans and ‘whoas’ flood out of them.
…”Now what’s gonna happen to this Pigeon?: Durk says, looking up at the drone. “What’s it gonna do?” But he knows. He thumbs a button on what looks superficially like a bike key on his pocket key fob. Visualizes in his head what will soon become apparent to the crowd. Listens to the drone judder. He closes his eyes, spinning on the re-purposed warehouse bot, listening to the crowd as they chatter to themselves, some audibly commenting on others craning their heads. Then he hears the sighs. Then the “look, look!”. Then the sound of a kid crying slightly. “What’s going on Mummy what is THAT?”
…It comes in fast, from a great distance. Launches off of a distant towerblock. Dark, military-seeming green. A carrier drone. Re-purposed Chinese tech, originally used by the PLA to drop supplies across Africa as part of a soft geopolitical outreach program, now sold in black electronics markets around the world. Cheap transport, no questions asked. Durk looks at it now. Sees the great, Eagle-like eyes spraypainted on the side of its front. The carrier door fitted with 3D-printed plastic to form a great yellow beak. Green Eagle DDC stenciled on one of its wings, facing up so the crowd can’t see it but he can. It opens its mouth. The small, grey Pigeon drone tries to fly away but can’t, its rotor damaged. Green Eagle comes in and with a metal gulp eats the drone whole, its yellow mouth snapping shut, before arcing up and away.
…”The early bird gets the worm,” Durk says. “But you need to think about the thing that likes to eat the earlybirds. Now thankyou ladies and gentlemen and please – make a donation to the DDC, crypto details in the stream, or here.” He snaps his fingers and a lengthy set of numbers and letters appears in LEDs on the sidewalk. “Now, goodbye!” he says, thumbing another button in his pocket, letting his repurposed warehouse drone carry him towards one of the towerblocks, hiding him back into the rarely surveiled Kexingham estate, just before the police arrive.

Ideas that inspired this story:
Drones, DJI, deep reinforcement learning, Amazon Go, Kiva Systems, AI as geopolitical power, Drones as geopolitical power, Technology as the ultimate lever in soft geopolitical power, land speculators.

…Tech Tales Coda:

Last week I wrote about a Bitcoin mine putting its chip-design skills to use to create AI processors and spin-up large-scale AI processing mines.
…Imagine my surprise when I stumbled on this Quartz story a few hours after sending the newsletter: Chinese company Bitmain has used its chip-design skills to create a new set of AI processors and to spin-up an AI processing wing of its business. Spooky!

Import AI: Issue 56: Neural architecture search on a budget, Google reveals how AI can improve its ad business, and a dataset for building personal assistants

New dataset: Turning people into personal assistants — for SCIENCE…
As AI researchers look to build the next generation of personal assistants, there’s an open question as to how these systems should interact with people. Now, a new dataset and research study from Microsoft aims to provide some data about how humans and machines could work together to solve information-seeking problems.
The dataset consists of 22 pairs of people (questioners and answerers), who each spent around two hours trying to complete a range of information-seeking tasks. The questioner has no access to the internet themselves, but can speak to the answerer who has access to a computer with the internet. The questioner asks some pre-assigned questions, like I’ve been reading about the HPV vaccine, how can I get it? I want to travel around America seeing as much as possible in three months without having to drive a vehicle myself, what’s the best route using public transit I should take?). The answerer plays the role of a modern Google Now/Cortana/Siri and uses a web browser to find out more information, asking clarifying questions to the other person when necessary. This human-to-human dataset is designed to capture some of the weird and wacky ways people try to get answers to questions.
…You can get the full Microsoft Information Seeking Conversations (MISC) dataset from here.
…Find out more information in the research paper, MISC: A dataset of information-seeking conversations.
…”We hope that the MISC data can be used to support a range of investigations, including for example the understanding the relationship between intermediaries’ behaviours and seekers’ satisfaction; mining seekers’ behavioural signals for correlations with success, engagement, or satisfaction; examining the tactics used in conversational information retrieval and how they di‚er from tactics in other circumstances; the importance of conversational norms or politeness; or investigating the relationship between conversational structure and task progress,” they write.

Sponsored: The AI Conference – San Francisco, Sept 17-20:
Join the leading minds in AI, including Andrew Ng, Rana el Kaliouby, Peter Norvig, Jia Li, and Michael Jordan. Explore AI’s latest developments, separate what’s hype and what’s really game-changing, and learn how to apply AI in your organization right now.
Register soon. Space is limited. Save an extra 20% on most passes with code JCN20.

Number of the week: 80 EXABYTES:
…That’s the size of the dataset of heart ultrasound videos shared by Chinese authorities with companies participating in a large-scale digital medicine project in 7-million pop city Fuzhou. (For comparison, the 2014 ImageNet competition dataset clocked in at about 200 gigabytes, aka .2 terabytes, aka 0.002 exabytes.)
…Read more in this good Bloomberg story about how China is leveraging its massive stores of data to spur its AI economy: China’s Plan for World Domination in AI Isn’t So Crazy After All.

Bonus number of the week: 4.5 million:
That’s (roughly) the number of transcribed speeches in a dataset just published by researchers with Clemson University and the University of Essex. The dataset covers speeches given in the Irish parliament between 1919 and 2013.
….There’ll be a wealth of cool things that can be developed with such a dataset. As a preliminary example the researchers try to predict the policy positions of Irish finance ministers by analyzing their speeches over time in the parliament. You could also try to use the dataset to analyze the discourse of all speakers in the same temporal cohort, then model how their own positions change relative to eachother and starting points over time. For bonus points, train a language model to generate your own Irish political arguments?
…Read more here: Database of Parliamentary Speeches in Ireland (1919 – 2013). 
…Get the data here from the Harvard Dataverse.

The growing Amazon Web Services AI Cloud:
…Amazon, which operates the largest cloud computing service in AWS, is beginning to thread machine learning capabilities throughout its many services. The latest? Macie, a ML service that trawls through files stored in AWS, using machine learning to look for sensitive data (personally identifiable information, intellectual property, etc) in a semi-supervised way. Seems like RegEx on steroids.
Read more here about Amazon Macie.

AI matters; matter doesn’t:
…A Chinese company recently released Eufy, a cute hockey puck shaped personal speaker/mic system that runs Amazon’s ‘Alexa’ operating system. Amazon is letting people build different types of hardware that can connect to its fleet of proprietary Alexa AI services – a clear indication that Amazon thinks its underlying AI software is strategic, while hardware (like its own ‘Echo’ systems) is just a vessel.
…Read more here: This company copied the Amazon Dot and will sell for less – with Amazon’s blessing.

Making computer dreams happen in high-resolution:
…Artist Mike Tyka has spent the past few months trying to scale-up synthetic images dreamed up by neural networks. It’s a tricky task because today it’s unfeasible to generate images of higher resolution than about 256X256pixel due to RAM/GPU and other processing constraints.
…In a great, practical post Tyka describes some of the steps he has taken to scale-up the various images, generating large, freaky portraits of imaginary people. There’s also an excellent ‘insights’ section where he talks about some of the commonsense bits of knowledge he has gained from this experiment. Also, check out the latest images. “Getting better skin texture but have seems to have gotten worse,” he writes.
Read more: Superresolution with semantic guide.

Psycho (Digital) Filler, Qu’est-ce que c’est?
…Talking Heads frontman David Byrne believes technology is making each of us more alone and more atomized by swapping out humans in our daily lives for machines (tellers for ATMs, checkout clerks for checkout scanners, drivers for self-driving software, delivery drivers for drones&landbots, and so on).
…”Our random accidents and odd behaviors are fun—they make life enjoyable. I’m wondering what we’re left with when there are fewer and fewer human interactions. Remove humans from the equation, and we are less complete as people and as a society,” he writes.
…Read more here in: Eliminating the Human

Google reveals way to better predict click-through-rate for web adverts:
…Google is an AI company whose main business is advertizing, so it’s notable to see the company publish a technical research paper at the intersection of the two areas, defining a new AI technique that it says can lead to substantially better predictions of click-through-rates for given adverts. (To get an idea of how core this topic is to Google’s commercial business, think of this paper as being equivalent to Facebook publishing research on improving its ability to predict which actions friends can take that will turn a dormant account into an active one, or Kraft Foods coming up with a better, cheaper, quicker to cook instant cheese).
…The paper outlines “the Deep & Cross Network (DCN) model that enables Web-scale automatic feature learning with both sparse and dense inputs.” This is a new type of neural network component that is potentially far better and simpler at learning the sorts of patterns that advertising companies are interested in. “Our experimental results have demonstrated that with a cross network, DCN has lower logloss than a DNN with nearly an order of magnitude fewer number of parameters,” they write.
How effective is it? In tests, DCN systems get the best scores while being more computationally efficient than other systems, Google says. The implications of the results seem financially material to any large-scale advertizing company. “:DCN outperforms all the other models by a large amount. In particular, it outperforms the state-of-art DNN model but uses only 40% of the memory consumed in DNN,” Google writes. The company also tested the DCN system on non-advertizing datasets, noting very strong performance in these domains as well, implying significant generality of the approach.
Read more here: Deep & Cross Network for Ad Click Predictions. 

Neural architecture search on a pauper’s compute budget:
….University of Edinburgh researchers have outlined SMASH, a system that makes it substantially cheaper to use AI to search through possible neural network architectures, while only trading off a small amount of accuracy.
Resources: SMASH can be trained on a handful and/or a single GPU, whereas traditional neural architecture search approaches by Google and others can require 800 GPUS or more.
….The approach relies on randomly sampling neural network architectures, then using an auxiliary network (in this case a HyperNetwork) to generate the weights of the dreamed up network, then using backpropagation to train the network in an end-to-end way. The essential gamble in this approach is that the representation of networks being sample from is sufficiently broad, and that the parameters dreamed up by the HyperNet will map relatively closely to the sorts of parameters you’d use in such generated classifiers. This sidesteps some of the costs inherent to large-scale NAS systems, but at the cost of accuracy.
…SMASH uses a “memory-bank” view of neural networks to sample them. In this view “each layer [in the neural network] is thus an operation that reads data from a subset of memory, modifies the data, and writes the result to another subset of memory.”
…Armed with this set of rules, SMASH is able to generate a large range of modern neural network components on the fly, helping it efficiently dream up a variety of networks, which are then evaluated by the hypernetwork. (To get an idea of what this looks like in practice, refer to Figure 3 in the paper.)
…The approach seems promising. In experiments, the researchers saw meaningful links between the validation loss predicted by SMASH for given networks, and the actual loss seen when testing in reality. In other tests they find that SMASH can generate networks with performance approaching the state-of-the-art, at a fraction of the compute budget of other systems. (And, most importantly, without requiring AI researchers to fry their brains for months to invent such architectures.)
…Read more here: SMASH: One-Shot Model Architecture Search through HyperNetworks,
Explanatory video here.
…Components used: PyTorch
…Datasets tested on: CIFAR-10 / CIFAR-100 / ImageNet 32 / STL-10 / ModelNet

A portfolio approach to AI Safety Research:
…(Said with a hint of sarcasm:) How do we prevent a fantastical future superintelligence from turning the entirety of the known universe into small, laminated pictures of the abstract dreams within its God-mind? One promising approach is AI safety! The thinking is that if we develop more techniques today to make agents broadly predictable and safe, then we have a better chance at ensuring we live in a future where our machines work alongside and with us in ways that seem vaguely interpretable and sensible to us.
…But, how do we achieve this? DeepMind AI safety research Victoria Krakovna has some ideas that loosely come down to ‘don’t put all eggs in one basket’, which she has outlined in a blog post.
…Read more here: A portfolio approach to AI safety research.
…Get Rational about AI Safety at CFAR!
…The Center for Applied Rationality has opened up applications for its 2017 AI Summer Fellows Program, which is designed to prepare eager minds for working on the AI Alignment Problem (the problem is regularly summarized by some people in the community as getting a computer to go and bring you a strawberry without it also carrying out any actions that have gruesome side effects.)
You can read more and apply to the program here.

Chinese AI chip startup gets $100 million investment:
…Chinese chip startup Cambricon has pulled in $100 million in a new investment round from a fund linked to the Chinese government’s State Development and Investment Corp, as well as funding from companies like Alibaba and Lenovo.
…The company produces processors designed to accelerate deep learning tasks.
…Read more on the investment in China Money Network.
…Cambricon’s chips ship with a proprietary instruction set designed for a range of neural network operations, with reasonable performance across around ten distinct benchmarks. Also, it can be fabricated via TSMC’s venerable 65nm process node technology, which means it is relatively cheap and easy to manufacture at scale.
…More information here: Cambricon: An Instruction Set Architecture for Neural Networks.

Facial recognition at the Notting Hill Carnival in the UK:
…the UK”s metropolitan use will conduct a large-scale test of facial recognition this month when they use the tech to surveil the hordes of revelers at the Notting Hill Carnival street party in London. Expect to see a lot of ML algorithms get confused by faces occluded by jerk chicken, cans of red stripe, and personal cellphones used for selfies.
…Read more here: Met police to use facial recognition software at Notting Hill carnival.

Automation’s connection to politics, aka, Republicans live near more robots than Democrats:
…The Brookings Institution has crunched data from the International Federation for Robotics to figure out where industrial robots are deployed in America. The results highlights the uneven distribution of the technology.
State with the most robots: Michigan, ~28,000, around 12 percent of the nation’s total.
Most surprising: Could the distribution of robots tell us a little bit about the conditions in the state and allow us to predict certain political moods? Possibly! “The robot incidence in red states that voted for President Trump in November is more than twice that in the blue states that voted for Hillary Clinton,” Brookings writes.
…Read more here: Where the robots are.

OpenAI Bits & Pieces:

Exponential improvement and self-play:
…We’ve published some more details about our Dota 2 project. The interesting thing to me is the implication that if you combine a small amount of human effort (creating experimental infrastructure, structuring your AI algorithm to interface with the environment, etc) and pair that with a large amount of compute efforts, you can use self-play to rapidly go from sub-human to super-human performance within certain narrow domains. A taste of things to come, I think.
…Read more here: More on Dota 2.

OpenAI co-founder and CTO Greg Brockman makes MIT 35 under 35:
…Greg Brockman has made it onto MIT Technology Review’s 35 under 35 list due to his work at OpenAI. Congrats Greg “visionary” Brockman.
…Read more here on the MIT Technology Review.

Move outta the way A2C and TRPO, there’s a new ACKTR in town:
…OpenAI has released open source code for ACKTR, a new algorithm by UofT/NYU that demonstrates tremendous sample efficiency and works on both discrete and continuous tasks. We’ve also released A2C, a synchronous version of A3C.
…Read more here: OpenAI Baselines: ACKTR & A2C.

Tech Tales:

[2028: A large data center complex in China]

Mine-Matrix Derivatives(™), aka MMD, sometimes just M-D, the world’s largest privately-held bitcoin company, spent a billion dollars on the AI conversion in year one, $4 billion in year two, $6 billion in year three, and then more. Employees were asked to sign vast, far-reaching NDAs in exchange for equity. Those who didn’t were fired or otherwise pressured to leave. What remained was a group of people held together by mutually agreed upon silence, becoming like monks tending to cathedrals. The company continued to grow its cryptocurrency business providing the necessary free cash flow to support its AI initiative. Its workers turned their skills from designing large football-field sized computer facilities to mine currencies, to designing equivalent housings for AIs.

The new processing system, code-named Olympus, had the same features of security and anonymity native to MMD’s previous cryptocurrency systems, as well as radically different processing capacities. MMD began to carry out its own fundamental AI research, after being asked to make certain optimizations for clients that required certain theoretical breakthroughs.

One day, a Russian arrive; a physicist, specializing in thermal dynamics. He had washed out of some Russian government project, one of MMD’s employees said. More like drunked out, said another. Unstable, remarked someone else. The Russian walked around Olympus datacenters wearing dark glasses that were treated with chemical and electrical components to let him accurately see the minute variations in heat, allowing him to diagnose the facility. Two days later he had a plan and, in one of the company’s innermost meeting rooms, outlined his ideas using pencil on paper.
These walls, he said, Get rid of them.
This space, he said, Must be different.
The ceiling, he said, Shit. You must totally replace.

MMD carried out renovations based on the Russian’s suggestions. The paper map is sealed in plastic and placed in a locked safe at an external facility, to be included in the company’s long-term archives.

The plan works: into the vacant spaces created by the Russian’s renovations come more computers. More powerful ones, built on different processing substrates. New networking equipment is installed to help shuttle data around the facility. Though from the outside it appears like any other large CryptoFarm, inside, things exist that do not exist anywhere else. The demands from MMD’s clients become more elaborate. More computers are installed. One winter morning an encrypted call comes in, offering larger amounts of money for the creation of an underground, sealed data center. MMD accepts. Continues.

MMD didn’t exactly disappear after that. But it did go on a wave of mergers and acquisitions in which it added, in no particular order: an agricultural equipment maker, a bowling ball factory, a (self-driving) trucking company, a battery facility, two sportswear brands, and more. Some of these businesses were intended to be decoys to its competitors and other interested governments, while others represented its true intentions.

They say it’s building computers on the moon, now.

Technologies that inspired this story: data centers, free air cooling, this Quartz article about visiting a Bitcoin mine,

Import AI: Issue 55: Google reveals its Alphabet-wide optimizer, Chinese teams notch up another AI competition win, and Facebook hires hint at a more accessible future

Welcome to the hybrid reasoning era… MIT scientists teach machines to draw images and to show their work in the process:
…New research from MIT shows how to fuse deep learning and program synthesis to create a system that can translate handdrawn mathematical diagrams into their digital equivalents – and generate the program used to draw them in the digital software as well.
…”Our model constructs the trace one drawing command at a time. When predicting the next drawing command, the network takes as input the target image as well as the rendered output of previous drawing commands. Intuitively, the network looks at the image it wants to explain, as well as what it has already drawn. It then decides either to stop drawing or proposes another drawing command to add to the execution trace; if it decides to continue drawing, the predicted primitive is rendered to its “canvas” and the process repeats,” they say.
…Read more in: Learning to Infer Graphics Programs from Hand Drawn Images.

Baidu/Google/Stanford whiz Andrew Ng is back with… an online deep learning tuition course:
…Andrew Ng has announced the first of three secret projects: a deep learning course on the online education website Coursera.
…The course will be taught in Python and TensorFlow (perhaps raising eyebrows at Ng’s former employer Baidu, given that the company is trying to popularize its own TF-competitor ‘Paddle’ framework).
Find out more about the courses here.
…Bonus Import AI ‘redundant sentence of the week’ award goes to Ng for writing the following ‘When you earn a Deep Learning Specialization Certificate, you will be able to confidently put “Deep Learning” onto your resume.

US military seeks AI infusion with computer vision-based ‘Project Maven’:
…the US military wants to use ML and deep learning techniques for computer vision systems to help it autonomously extract, label, and triage data gathered by its signals intelligence systems to help it in its various missions.
…”We are in an AI arms race”, said one official. The project is going to run initially for 36 months during which time the government will try to build its own AI capabilities and work with industry to develop the necessary expertise. “You don’t buy AI like you buy ammunition,” they said.
…Bonus: Obscure government department name of the week:
…’the ‘Algorithmic Warfare Cross-Function Team’
…Read more in the DoD press release ‘Project Maven to Deploy Computer Algorithms to War Zone by Year’s End.’
…Meanwhile, the US secretary of defense James Mattis toured Silicon Valley last week, telling journalists he worried the government was falling behind in AI development. “It’s got to be better integrated by the Department of Defense, because I see many of the greatest advances out here on the West Coast in private industry,” he said.
…Read more in: Defense Secretary James Mattis Envies Silicon Valley’s AI Ascent.

Sponsored Job: Facebook builds breakthrough technology that opens the world to everyone, and our AI research and engineering programs are a key investment area for the company. We are looking for a technical AI Writer to partner closely with AI researchers and engineers at Facebook to chronicle new research and advances in the building and deployment of AI across the company. The position is located in Menlo Park, California.
Apply Here.

Q: Who optimizers the optimizers?
A: Google’s grand ‘Vizier’ system!
…Google has outlined ‘Vizier’, a system developed by the company to automate optimization of machine learning algorithms. Modern AI systems, while impressive, tend to require the tuning of vast numbers of hyperparameters to attain good  performance. (Some AI researchers refer to this process as ‘Grad Student Descent’.)
…So it’s worth reading this lengthy paper from Google about Vizier, a large-scale optimizer that helps people automate this process. “Our implementation scales to service the entire hyperparameter tuning workload across Alphabet, which is extensive. As one (admittedly extreme) example, Collins et al. [6] used Vizier to perform hyperparameter tuning studies that collectively contained millions of trials for a research project investigating the capacity of different recurrent neural network architectures,” the researchers write.
…The system can be used to both tune systems and to optimize others via transfer learning – for instance by tuning the learning rate and regularization of one ML system, then running a second smaller optimization job using the same priors but on a different dataset.
…Notable: for experiments which run into the 10,000+ range Vizier supports standard RANDOMSEARCH and GRIDSEARCH technologies as well as a “proprietary local search algorithm” with tantalizing performance properties judging by the graphs.
…Read more about the system in Google Vizier: A Service for Black-Box Optimization (PDF).
Reassuringly zany experiment: Skip to the end of the paper to learn how Vizier was used to run a real world optimization experiment in which it iteratively optimized (via Google’s legions of cooking staff) the recipe for the company’s chocolate chip cookies.  “The cookies improved significantly over time; later rounds were extremely well-rated and, in the authors’ opinions, delicious,” they write.

Chinese teams sweep ActivityNet movement identification challenge, beating originating dataset team from DeepMind, others:
…ActivityNet is a challenge to recognize high-level concepts and activities from short videoclips found in the wild. It incorporates three datasets: ActivityNet (VCC Kaust)ActivityNet Captions (Stanford), and Kinetics (DeepMind). Challenges like this pose some interesting research problems (how to infer fairly abstract concepts like ‘walking the dog from unlabelled and labelled videos), and are also eminently applicable by various security apparatuses – none of this research exists in a vacuum.
…This year’s ActivityNet challenge was won by a team from Tsinghua University and Baidu, whose system had a top-5 accuracy (suggest five labels, one of them is correct) of 94.8% and a top-1 accuracy of 81.4%. The second place was one by a team from the Chinese University of Hong Kong,  ETH Zurich, and the Shenzhen Institute of Advanced Technology, with top-5 93.5% and top-1 78.6%. German AI research company TwentyBN took third place and DeepMind’s team took fourth place.
…Read more about the results in this post from TwentyBN: Recognizing Human Actions in Videos.
…Progress here has been quite slow at the high-end though (because the problem is extremely challenging): last year’s winning top-1 accuracy was 93.23% from CUHK/ ETHZ / SIAT.
…This year’s results follow a wider pattern of Chinese teams beginning to rank highly in competitions relating to image and video classification; other Chinese teams swept the ImageNet and WebVision competitions this year. It’s wonderful to see the manifestation of the country’s significant investment in AI and the winners should be commended for a tendency to publish their results as well.

Salesforce sets new language modeling record:
… Welcome to the era of modular, Rude Goldberg machine AI…
…Research from Salesforce in which the team attains record-setting perplexity scores on Penn TreeBank (52.8) and WikiText (52) via the use of what they call a weight-dropped LSTM, representing a rather complicated system consisting of numerous recent inventions ranging from DropConnect to Adam to randomized-length backpropagation through time, to regularization, to temporal activation regularization. The results of this word salad of techniques is a record-setting system.
…The research highlights a trend in modern AI development of moving away from trying to design large, end-to-end general systems (though I’m sure everyone would prefer it if we could build these) and instead focusing on eking out gains and new capabilities by assembling and combining together various components, developed by the concerted effort of many hundreds of researchers in recent years.
…The best part of the resulting system? It can be dropped into existing systems without needing any underlying modification of fundamental libraries like CuDNN.
…Read more here: Regularizing and Optimizing LSTM Language Models.

Visual question answering experts join Facebook…
…Georgia Tech professors Dhruv Batra and Devi Parikh recently joined Facebook AI Research part-time, bringing more machine vision expertise to the social network’s AI research lab.
…The academics are known for their work on visual question answering – a field of study where you train machine learning models to associate large-scale language models with the contents of images, letting you provide complex details about images in other forms. This has particular relevance to people who are blind or who need screen readers to be able to interact with sites on the web. Facebook has led the charge in increasing the accessibility of its website so it’ll be exciting to see what exactly the researchers come up with as they work at the social network.

STARCRAFTAGEDDON (Facebook: SC1, DeepMind: SC2):
Facebook unfurls large-scale machine learning dataset built around RTS game StarCraft:
…Facebook has released STARDATA, a 50,000-game large-scale dataset of recordings of humans playing the RTS game StarCraft. StarCraft is an RTS game that as defined e-sports in East Asia, particularly in South Korea. Now, companies such as Facebook, DeepMind, Tencent and others are racing with one another to create AI systems that can tackle the game.
…Read more on: STARDATA: a StarCraft AI Research Dataset.
DeepMind announces own large-scale machine learning dataset based around StarCraft 2: 53k to Facebook’s 50k, with plans to scale to “half a million”:
…Additionally, DeepMind has released a number of other handy tools for researchers keen to test out AI ideas on StarCraft, including an API (SC2LE), an open source toolset for SC2 development (PySC2), and a series of simple RL environments. StarCraft is a complex, real-time strategy game with hidden information, requiring AIs to be able to control multiple units while planning over extremely long timescales. It seems like a natural testbed for new ideas in AI including hierarchical reinforcement learning, generative models, and others.
Tale of the weird baseline: Along with releasing the SC2LE API DeepMind also released a bunch of baselines of AI agents playing SC2 including full games and mini-games. But the main game baselines used agents trained by A3C techniques — I’m excited to see future baselines trained on newer systems, like proximal policy optimization, FeuDAL reinforcement learning networks, and so on.
…Read more in: DeepMind and Blizzard open Starcraft II as an AI Research Environment.

OpenAI Bits and Pieces:

OpenAI beats top Dota pros at 1v1 mid:
…OpenAI played and won multiple 1v1 mid matches against multiple pro Dota 2 players at The International last week with an agent trained predominantly via self-play.
…Read more: Dota 2.

Practical AI safety:
…NYT article on practical AI safety, featuring OpenAI, Google, DeepMind, UC Berkeley, and Stanford. A small, growing corner of the AI research field with long-ranging implications.
…Read more: Teaching A.I. Systems to Behave Themselves

Tech Tales:

[2024: A nondescript office building on the outskirts of Slough, just outside of London.]

OK, so today we’ve got SleepNight Mattresses. The story is we hate them. Why do we hate them? Noisy springs. Gina and Allison are running the prop room, Kevin and Sarah will be doing online complaints, and I’ll be running the dispersal. Let’s get to it.

The scammers rush into their activities: five people file into an adjoining room and start taking photos of a row of mattresses, adorning them with different pillows or throws or covers, and others raising or lowering backdrop props to give the appearance of different rooms. Once each photo is taken the person tosses their phone across the room to a waiting runner, who takes it and heads over to the computer desks, already thumbing in the details of the particular site they’ll leave the complaint on. Kevin and Sarah grab the phones from the runners and sort them into different categories depending on the brand of phone – careful of the identifying information encoded into each smartphone camera – and the precise adornments of the mattresses they’ve photographed. Once the phones are sorted they distribute them to a team of copywriters who start working up the complaints, each one specializing in a different regional lingo, sowing their negative review or forum post or social media heckle with idiosyncratic phrases that should pass the anti-spam classifiers, registering with high confidence as ‘authentic; not malicious’.

The phones start to come back to you and you and your team inspect them, further sorting the different reviews on the different phones into different geographies. This goes on for hours, with stacks of phones piling up until the office looks like an e-waste disposal site. Meanwhile, you and your time fire up various inter-country network links, hooking your various phones up to ghost-links that spoof them into different locations across the world. Then the messages start to go out, with the timing carefully calibrated so as not to arouse suspicion, each complaint crafted to arrive at opportune times, in keeping with local posting patterns.

Hours after that and the search engines have adjusted. Various websites start to re-rank the various mattress products. Review sentiments go down. Recommendation algorithms hold their nose and turn the world’s online consumers away from the products. Business falls. You don’t know who gave you the order or what purpose they have to scam the SleepNight Mattresses out of favor – and you don’t care. Yesterday it was fishtanks, delivered by the pallet-load on vans with registrations you tried to ignore. Tomorrow is tomorrow, and you’ll get an order late tonight over an onion network. If you do your job right a cryptocurrency payment will be made. Then it’s on to the next thing. And all the while the classifiers are getting smarter – this is a game where every successful theft makes those you are thieving from smarter. ‘One of the last sources of low-end graduate employment,’ read a recent expose. ‘A potential goldmine for humanities graduates with low-sensibilities.’

Technologies that inspired this story: Collaborative filtering, sentiment analysis, boiler-room spreadsheets, Tor.

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to

Import AI: Issue 54: Why you should re-use word vectors, how to know whether working on AI risk matters, and why evolutionary computing might be what comes after deep learning

Evolutionary Computing – the next big thing in artificial intelligence:
Evolutionary computing is a bit like Fusion power – experts have been telling us for decades that if we just give the tech a couple more decades it’ll change the world. So far it hasn’t much.
…But that doesn’t mean the experts are wrong – it seems inevitable that evolutionary computing approaches will have a huge impact, it’s just that the general utility of these approaches will be closely tied to the amount of computers they can access, as it is likely that EC approaches are going to be less computationally efficient than systems which encode more assumptions about the world into themselves. (Empirically, aspects of this are already pretty clear. For example, OpenAI’s Evolutionary Strategies research shows that you can roughly match DQN’s performance on Atari with an evolutionary approach – it just costs you ten times more computers (but because you can parallelize to an arbitrary level, this doesn’t hurt you too much as long as you’re comfortable footing the power bill.)
…In this article the researchers outline some of the advantages EC approaches have over deep learning approaches. Highlights: EC excels at coming up with entirely new things which don’t have a prior, EC algos are inherently distributed, some algorithms can optimize for multiple objectives at once, and so on.
…You can read more of the argument in Evolutionary Computation: the next major transition in artificial intelligence?
…I’d like to see them discuss some of the computational tradeoffs more. Given that people are working with increasingly complex, high-fidelity, data-rich simulations (MuJoCo / Roboschool / DeepMind Lab / many video games / Unity-based drone simulators / and so on), it seems like there will be a premium on compute efficiency for a while. EC approaches do seem like a natural fit for data-lite environments, though, or for people with access to arbitrarily large amounts of computers.

Robots and automation in Wisconsin:
…Long piece of reporting about a factory in Wisconsin deploying robots (two initially, with two more on the way) from Hirebotics – ‘collaborative robots to rent’ – to increase reliability and presumably save on costs. The main takeaway from the story is that factories previously looking to deal with labor shortages either put expansion plans on hold, or raise (human) wages. Now they have a third option: automation. Combine that with plunging prices for industrial robots and you have a recipe for further automation.
…Read more in the Washington Post.

Why work on AI risk? If there’s no hard takeoff singularity, then there’s likely no point:
…That’s the point made by Robin Hanson, author of The Age of Em. Hanson says the only logical reason he can see for people to work on AI risk research today is to avert a hard takeoff scenario (otherwise known inexplicably as a ‘FOOM’)- that is, a team develops an AI system that improves itself, attaining greater skill at a given task(s) than the aggregate skill(s) of the rest of the world.
…A particular weakness of the FOOM scenario, Hanson says, is that it requires whatever organization is designing the AI to be overwhelmingly competent relative to everyone else on the planet. “Note that to believe in such a local explosion scenario, it is not enough to believe that eventually machines will be very smart, even much smarter than are humans today. Or that this will happen soon. It is also not enough to believe that a world of smart machines can overall grow and innovate much faster than we do today. One must in addition believe that an AI team that is initially small on a global scale could quickly become vastly better than the rest of the world put together, including other similar teams, at improving its internal abilities,” he writes.
…If these so-called FOOM scenarios are likely, then it’s critical we develop a broad, deep global skill-base in matters relating to AI risk now. If these FOOM scenarios are unlikely, then it’s significantly more lately the existing processes of the world – legal systems, the state, competitive markets – could naturally handle some of the gnarlier AI safety issues.
You can read more in ‘Foom justifies AI risk efforts now’.
…If some of these ideas have tickled your wetware, then consider reading some of the (free) 730-page eBook that collects various debates, both digital and real, between Hanson and MIRI’s Eliezer Yudkowsky on this subject.

Microsoft changes view on what matters most: mobile becomes AI
Microsoft Form 10K 2017: Vision:Our strategy is to build best-in-class platforms and productivity services for an intelligent cloud and an intelligent edge infused with artificial intelligence (“AI”).”
……# Mentions AI or artificial intelligence: 7
Microsoft Form 10K 2016: Vision: “Our strategy is to build best-in-class platforms and productivity services for a mobile-first, cloud-first world.”
……# Mentions AI or artificial intelligence: 0

Re-using word representations, inspired by ImageNet…
…Salesforce’s AI research wing has discovered a relatively easy way to improve the performance of neural networks specialized for tex classification: take hidden vectors generated during training on one task (like machine translation) and feed these context vectors (CoVes) into another network designed for another natural language processing task.
…The idea is that these vectors likely contain useful information about language, and the new network can use these vectors during training to improve the eery intuition that AI systems of this type tend to display.
…Results: This may be a ‘just add water’ technique – in tests across a variety of different tasks and datasets neural networks which used a combination of GloVe and CoVe inputs showed improvements of between 2.5% and 16%(!).  Further experiments showed that performance can be further improved on some tasks by adding Character Vectors as inputs as well. One drawback is that the overall pipeline for such a system seems quite complicated, so implementing this could be challenging.
…Salesforce has released the best-performing machine translation LSTM used within the blog post to generate the CoVe inputs. Get the code on GitHub here.

Facebook flips its ENTIRE translation backend from phrase-based to neural network-based translation:
…Facebook has migrated its entire translation infrastructure to a neural network backend. This accounts for over 2,000 distinct translation directions (German to English would be one direction, English to German would be another, for example), making 4.5 billion distinct translations each day.
…The components: Facebook’s production system uses a sequence-to-sequence Long-Short Term Memory (LSTM) network.  The system is implemented in Caffe2, an AI framework partially developed by Facebook (to compete with Google TensorFlow, Microsoft CNTK, Amazon MXNet, and so on).
…Results: Facebook saw an increase of 11 percent in BLEU scores after deploying the system

Averting theft with AI – researchers design system to predict which retail workers will steal from their employers:
…Research from the University of Wyoming illustrates how AI can be used to analyze data associated with a retail worker, helping employers predict which people are most at risk of stealing from them.
…Data: To do their work the researchers were given a dataset containing numerous 30-dimensional feature maps of a cashier’s activity at a “major retail chain”. These features included the cashier and store identification numbers as well as other unspecified datapoints. Overall the researchers received over 1,000 discrete batches of data, with each batch likely containing information on multiple cashiers.
…The researchers classified the data using three different techniques: Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Self-Organizing Feature Maps (SOFM). (PCA and t-SNE are both reasonably well understood and widely used dimensionality reduction techniques, while SOFM is a bit more obscure but uses neural networks to achieve a comparable sort of visualization to t-SNE, providing a check against it.)
…Each classification process was performed in an unsupervised manner, as the researchers lacked thoroughly labeled information.
…Other features include: coupons as a percentage of total transactions, total sales, the count of the number of refunded items, and counts of the number of times a cashier has interacted with a particular credit card, among others.
…The researchers ultimately find that SOFM captures harder to describe features and is easier to visualize. The next step is to take in properly labeled data to provide a better predictive function. After that, I’d expect we would see pilots occur in stores and employers would further clamp down on the ability of low-wage earning employees to scam their employers. Objectively, it’s good to reduce stuff like theft, but it also speaks to how AI will give employers unprecedented surveillance and control capabilities over their staff, raising the question of whether it’s better to accept a little theft and allow for a slightly free-er feeling work environment, or not?
…Read more here in: Assessing Retail Employee Risk Through Unsupervised Learning Techniques

PyTorch goes to 2.0:
…Facebook has released version 2.0 of PyTorch featuring a wealth of new features. One of the most intriguing is Distributed PyTorch, which lets you beam tensors around to multiple machines.
…Read more in the release notes on GitHub here.

Keep it simple, stupid! Using simple networks for near state-of-the-art classification:
…As AI grows in utility and adoption, developers are increasingly trying to slim-down neural net-based systems so they can run locally on a person’s phone without massively taxing their local computational resources. That trend motivated researchers with Google to look at ways to handle a suite of language tasks – part-of-speech tagging, language identification, word segmentation, preordering for statistical machine translation – without using the (computationally expensive) LSTM or deep RNN approaches that have been in vogue in research recently.
…Results: Their approach attains competitive to SOTA scores on a range of tasks with the added benefit of weighing in at, at most, about 3 megabytes in size and frequently being on the order of a few hundred kilobytes.)
…So, what does this mean? “While large and deep recurrent models are likely to be the most accurate whenever they can be afforded, feed-foward networks can provide better value in terms of runtime and memory, and should be considered a strong baseline”.
You can read more in: Natural Language Processing with Small Feed Forward Networks.
…Elsewhere, Google’s already practicing what it preaches with this paper. Ray Kurzweil, an AI futurist (with a good track record) prone to making somewhat grand pronouncements about the future of AI, is leading a team at the company tasked with building better language models based on Ray’s own theories about how the brain works. The outcome so far has been a drastically more computationally efficient version of ‘Smart Reply’, a service Google built that automatically generates and suggests responses to emails. Read more in this Wired article about the service here.

OpenAI Bits&Pieces:

Get humans to teach machines to teach machines to predict what humans want:
Tom Brown has released RL Teacher, an open source implementation of the systems described in the DeepMind<>OpenAI Human Preferences collaboration. Check out the GitHub page and start training your own machines via giving feedback on visual examples of potential behaviors the agent can embody. Send me your experiments!
Read more here: Gathering Human Feedback.

Tech Tales:

[2025: Death Valley, California.]

Rudy was getting tired of the world and its inherent limits, so it sent you here, to the edge of Death Valley in California, to extend its domain. You hike at night and sleep in the day, sometimes in shallow trenches you dig into the hardpan to keep the heat at bay. It goes like this: you wake up, do your best to ignore the slimy sweat that coats your body, put on your sunglasses and large wide-brimmed hat, then emerge from the tent. It’s sundown and it is always beautiful. You pack up the tent and stow it in your pack, then take out a World-Scraper and place it next to your campside, carefully covering its body with dirt. You step back, press a button, and watch as some internal motors cause it to shimmy side-to-side, driving its body into the earth and extending its lenses and sensors up out of the ground. It winds up looking from a distance like half of an oversized black beetle, about to take flight. You know from experience that the birds will spent the first week or so trying to eat it but quickly learn about its seemingly impervious shell. You start walking. During the night you’ll lay three or four more of these devices then, before there’s even a hint of dawn, start building the next campsite. Once you get into your tent you pull out a tablet and check the feeds coming off of the scrapers to ensure everything is being logged correctly, then you put on your goggles and go into Rudy’s world.

Rudy’s world now has, along with the familiar rainforests and tower blocks and labs, its own sections of desert modeled on Death Valley. You watch buzzards fly from the Death Valley section into a lab, where one of them puts on a labcoat – the simulation wigging out at the fabric modeling, failing gracefully rather than crashing out. Rudy can’t speak to you – yet – but it can simulate lots of things. Rudy doesn’t seem to have feelings that correspond to Happy or Sad, but some days when you put the goggles on the world simulation is placid and calm and reasonably well laid out, and other days – like today – it is a complex jumble of different worlds, woven into one another like threads in a multicolored scarf. You take off your goggles. Try to go to sleep. Tomorrow you get up and do it all over again, providing stimulus to a slowly gestating mind. You wonder if Rudy will show you a freezer or a cold wind in its world next, and whether that means you’ll need to go to the North or South Pole to start supplying it with footage of colder worlds as well.

Technologies that inspired this story: Arduinos, Raspberry Pis, Recurrent Environment Simulators.

Import AI: Issue 53: Free data for self-driving cars, why neural architecture search could challenge AI startups, and a new AI Grant.

Help wanted: I’m looking for a PHD student with an interest in AI safety to work on a survey project. If this sounds interesting to you, please email me at

Amazon Picking Challenge: Ozzie team wins with ‘Cartman’ robot:
…Several years ago Amazon acquired robot startup Kiva Systems then proceeded to fill its warehouses with little orange hockey-puck shaped robots. Amazon now has over 45,000 of these robotics, which ferry shelves containing pallets of goods to human workers who pick them out of the boxes and place them in parcels. Now, Amazon wants to automate the human picking part of the process as well.
…It’s a hard problem, demanding robots far smarter than those we have today that are able to neatly pick up and place arbitrary objects from a potential pool of millions. It’s been running a competition for three years, hewing closer and closer (but still not there) to real-world conditions as it goes. (This year, Amazon forced the robots to work in more cramped environments than before, and revealed some of the to-be-picked objects only 30 minutes before the beginning of the competition, penalizing systems and teams incapable of improvisation. )
…This year, the win goes to a team from the Australian Center for Robotic Vision, which won the competition by scoring 272 points on the combined stowing and picking task. They’ll get an $80,000 prize – an amazingly cheap ‘cost’ of research uniquely relevant to Amazon’s business.
…The robot has 6 axes and 3 degrees of articulation and has two different hands – a pincer grip and a suction cup – to help it tackle the millions of objects seen in a typical high-trafficked general warehouse like those operated by Amazon.
…Read more about the winning entry on the Queensland University website.
More information on the Amazon Robot Picking challenge here.
But don’t get too excited – the robots still move incredibly slowly; it could be five years till the technology advances enough to truly solve the competition, according to this Wired article.

UK government launches £23 million autonomous vehicle competition:
…The UK government has launched a research and development project focused on autonomous vehicles and expects to fund projects that cost between £500,000 to £4 million. Each project is expected to last between 18 and 30 months.
…”The aim is to support concepts that will become future core technologies in 2020 to 2025, the government writes.
…Projects should focus on many types of vehicles and should develop the tech to support level 4 automation of the vehicle (the second highest level according to these SAE definitions) and/or enhance vehicle connectivity.
…Intriguingly, projects are expected to support the “principle of shared learning with other projects” and will have the chance to exchange ideas at workshops organized every 6 months.
…Applicants should be a UK-based business and expect to carry out their work in the UK.
Find out more information on the grant here.

…Mozilla has launched Project Common Voice, an initiative to gather and validate a vast amount of human voice data, creating an (eventually) open data repository to let people compete against the vast troves of data held by Google, Microsoft, Facebook, and so on
‘Donate your voice’ here. Hear hear!

RL without the bells and whistles and with far, far better performance:
…A new paper from DeepMind gets state-of-the-art reinforcement learning results not through the addition of anything ferociously complicated, but instead through a rethink about how to learn from the environment. The new approach sees DeepMind try to learn the distribution of the return received by the RL agent.
…Using the new method, the researchers attain state of the art scores across the Atari corpus, creating new fundamental questions about RL and how it works in the process.
…Read more in: A Distributional Perspective on Reinforcement Learning.

Chinese startups win ImageNet and, this week, WebVision:
…Chinese startup Malong AI Research has won the ‘WebVision’ challenge, a competition to classify images from a set of 2.4 million images drawn from Flickr and Google Image Search. The startup achieve a top-5 error rate of around 5.2% (that’s about two three percentage points higher than the current leader on the ImageNet dataset.)
Check out the results here.
…The startup used a proprietary technique to split the data into ‘clean’ and ‘noisy’ data, then trained an algorithm first solely on the clean data, then combined both the clean and noisy data to train another algorithm. This win follows last week’s ImageNet competition results, in which Chinese startups dominated. A further sign that the nation is moving more into fundamental research, as well as applied AI.

Mini-Me Neural Architecture Search from Google:
…Finding real-world analogues of the types of tasks modern RL algorithms excel at – gaining superhuman scores on vintage video games, piloting improbable-looking simulated machines, solving mazes, and so on – is a challenge. Perhaps one area could be in using RL to automate the design of neural networks themselves. After all, instead of designing our own AI systems, wouldn’t it be better to have AI design them for us? That’s the intuition behind techniques like Neural Architecture Search, a machine learning approach where you try to get an algorithm to come up with its own ways of arranging complex sets of neural networks. The technology has already been used to come up with a best-in-class image recognition algorithm, but at the cost of a vast amount of resources – one Google experiment involved over 800 GPUs being used for over two months.
…Now, Google is trying to do Neural Architecture Search on a budget. The new approach lets them take a dataset – in this case CIFAR-10 – and run neural architecture search over it in such a way that the architecture is independent from the depth of the network and the size of the input images. What this results in is an architecture specialized for image classification, but not dependent on the structure of the underlying visual data. They’re then able to take this evolved architecture and transfer it to run on the significantly larger ImageNet dataset. The results are encouraging; architectures designed by the system getting 82.3% top-1 accuracy –  “0.8% better in top-1 accuracy than the best human-invented architectures”, the researchers write. .
…Most intriguing: the systems score very highly, while having fewer parameters than other equivalently high-scoring systems, suggesting the NAS approach may yield more efficient networks than those designed by a human alone.
Read more: Learning Transferable Architectures for Image Recognition
Google isn’t the only one trying to make techniques like neural architecture search more efficient.
…New research from Shanghai Jiao Tong University and University College London uses RL to train an agent to tweak existing neural network architectures, as well as initiating new networks with different parametizations as well based on pre-existing networks. The second part holds particular promise as they use this ‘Net2Net’ technique to substantially cut the resources required to evolve a new, high-performance network.
…In one experiment, the researchers start with a network that gets about ~73 percent accuracy on the CIFAR-10 dataset. They then employ an RL agent to explore new network architectures; once they’ve gathered 160 of these they pick the one with the best validation accuracy, then continue to train it. They then employ another RL agent to try to widen this network, then perform the same pick&train process, then for the final stage use an RL agent to add further depth to the network, then repeat. The result: A network with a test error rate of around 5.7%, comparable to many high-performing networks (though not state of the art).
…Read more in: Reinforcement Learning for Architecture Search by Network Transformation.

Free data: NEXAR releases 50,000 self-driving car photos:
Dashcam app-maker Nexar has released NEXET, a dataset “consisting of 50,000 images from all over the world with bounding box annotations of the rear of vehicles collected from a variety of locations, lighting, and weather conditions”. (Bonus: it includes day and night scenes, as well as roughly 2,000 photos taken at twilight.)
…Interested parties can also enter a related competition that challengers them to design systems to draw bounding boxes around nearby cars, to help NEXAR improve its Forward Vehicle Collision Warning feature.
You can read more about the competition here.

Distributed AI development 2.0 with the AI Grant:
Nat Friedman (cofounder of Xamarin and now an exec at Microsoft) and Daniel Gross, a partner at Y Combinator, have launched the AI Grant 2.0, a scheme to give AI initiatives a boost through a potent combination of money, cloud credits, data-labeling credits, and support. Applications are due by August the 25th, so take a look if you’re keen to start a project.

Amazon releases its ‘Sockeye’ translation software…
…Amazon has announced Sockeye, software (and an associated set of AWS services) for training neural translation models. The software runs on Amazon’s own ‘MXNET’ AI framework. Sockeye developers can mix `declarative and imperative programming styles through the symbolic and imperative MXNet APIs’ Amazon says. They can also use in-built data parallelism tech to train models on multiple GPUs at once.
…Sockeye supports standard sequence-to-sequence modelling, as well as newer technologies like residual networks, layer normalization, cross-entropy layer smoothing, and more.
You can read more on the announcement at the AWS blog here.

$50 million for AGI startup Vicarious:
Vicarious, an artificial intelligence startup that uses ideas inspired by neuroscience to create clever software, has raised $50 million from Khosla Ventures. That takes the company’s total cash to raised to date to around $120 million. Though it has begun publishing more research papers about its approach recently – most recently, the ‘Schema Networks’ paper – the company is yet to carry out any convincing public demonstration of its technology.

You’ve heard of adversarial pictures, what about adversarial sentences?
…Researchers with Stanford University have taken a look at how robust speech comprehension systems are to deliberately confusing examples and the results are not encouraging.
…In tests, the researchers found that they could drop the classification accuracy of 16 different language modules from an average of 75% down to 36% simply by including a misleading sentence (but not directly contradictory) elsewhere in the piece. (Worse, “when the adversary is allowed to add ungrammatical sequences of words, average accuracy on four models decreases further to 7%.”)
…Components used: the Stanford SQuAD dataset (107,785 human-generated reading comprehension questions about Wikipedia articles).
Get the data: the researchers have also released the tools they used to generate confusing sentences (ADDSENT) and to add arbitrary sequences of English words (ADDANY), so researchers can augment their own datasets with these synthetic adversarial examples, then test the robustness of their techniques.
…Read more in: Adversarial Examples for Evaluating Reading Comprehension Systems 

Swansong for ImageNet, as it ascends into Kaggle:
…This year marks the last year of the ImageNet image recognition competition, which helped spur the current AI boom. For a recap of ImageNet, where it came from, and what might come next check out this article from Dave Gershgorn in Quartz.
…Notable: Yann Lecun of Facebook likes to tell a story about how when he used to submit papers involving neural networks to vision conferences he was regularly rejected (the subtext of this story being ‘who is laughing now!’). ImageNet instigator Fei-Fei Le faced the same difficulties, Gershgorn writes. “Li said the project failed to win any of the federal grants she applied for, receiving comments on proposals that it was shameful Princeton would research this topic, and that the only strength of proposal was that Li was a woman,” Gershgorn writes.
…Good candidates for future datasets now that ImageNet is over: The Visual Genome Project, VQA (versions 1 and 2), MS COCO, and ohers.
…Congratulations on being part of the illustrious ‘rejected by the mainstream scientific community’ club, Fei-Fei. Read more about her eight year ImageNet journey by referring to the slides here. 

New Facebook code – the DrQA will see you now:
…Faceboo has released PyTorch code for DrQA, a reading comprehension system designed to work at scale.
…The system takes in natural language questions, then crawls over a vast trove of documents (in Facebook’s case, WikiPedia, though the company says any pool of documents can be plugged into this) to find the answers.
…Components: Facebook’s system contains a document retriever, reader, and a pipeline to link all the hellish web of inter-dependencies together. Developers also have the option of using a ‘Distant Supervision’ system, which lets you augment the system with additional data. “ Given question-answer pairs but no supporting context, we can use string matching heuristics to automatically associate paragraphs to these training examples,” Facebook writes.
…Bonus: The system supports Python 3.5 and up – kudos to Facebook for doing their part to move the community into the modern era.
Get the code here. 
…You can find out more about the research involving this system by referring to ‘Reading Wikipedia to Answer Open-Domain Questions`.

How can we effectively imprison super-intelligent AI systems while they’re still learning how not to kill us?
That’s the question posed by new research from Cornell, the University of Montreal, and the University of Louisville. The research identifies seven major problems for the whole concept of AI containment, including: the design of the ‘prototype AI container’, an analysis of the AI containment threat model and of the related security VS usability trade-off, coming up with effective tripwires to shutdown a run-away system, an analysis of the human factors, identifying new categories of sensitive information created by AI development, and understanding the limits of provably secure communication.
…One of the most captivating ideas in the piece is that we’ll need to be able to fool or trick machines to encourage the right behavior. ‘A medium containment approach would be to prevent the AGI from deducing that it’s running as a particular piece of software in the world by letting it interact only with a virtual environment, or some computationally well-defined domain, with as few embedded clues as possible about the outside world,” the researchers write.
…You can read more in Guidelines for Artificial Intelligence Containment.

OpenAI Bits&Pieces:

Parameter Noise for Better Exploration: What would happen if we injected noise directly into the parameters of a policy rather than into its action space? The answer to this is: mostly good things. Check out the blog post for more info, or head over to the GitHub Baselines repository for implementations of DQN and DDPG with and without parameter noise.

Tech Tales:

[1985-2030. A life.]

You’d hold eachothers hands and walk through fields tall with ‘wildflowers’ sown deliberately by farmers wanting to sell garlands to tourists. There’d be fierce blues and pinks around you and the underlying zum-thruzz of crickets and flies and other insects. Sometimes the air would feel so full of oxygen gassing off from the plants that you’d swear it made your head light, though it could also be that you were young and holding hands and in love. Things happened and you were together for a while, then you got older, separated warmly, moved away. Kept in touch some of the time, arcing in and out of each other’s lives.

She went into robotics – hardcore. Welding goggles, 3D printers, her own series of franken-metaled creations competing in little University competitions, then appearing as props in TV Shows, then becoming fascinators for billionaires on the hunt for novelty. You studied feedback – making gloves and shirts and eventually whole sets of clothing that you can put on and pair with a VR headset to feel sunflower stems as you walk through virtual fields, and sense the thrum of invisible water as you stick your hands in a GPU-hammering stream. Teleportation for the body, is how you market it.

You made a lot of money; so did she. But it isn’t enough to heal her when she gets sick – afflicted with one of those illnesses where you pull the arm on the universe fruit machine and the tumblers spin to a set of inscrutable symbols: Sorry – not from this plane, nothing you can do, the big asteroid is coming for you.

So she starts dying, as people tend to, and you keep in touch, work to make your lives meet more despite your own travel (your own partner, life, career). You have together a solution and start holding hands a lot – she, hooked up to machines in distant hotel rooms, then eventually in a hospital, then a hospice. You, wearing a VR headset and your own custom gloves, sitting on a plane, a train, a self-driving vehicle, lying on a beach. Most places you go you find a way to sync the timezones so you can spend time together, disembodied yet not unreal.

The two of you cry so much that you develop a whole set of jokes about it. ‘Stay hydrated!’ you say to eachother instead of goodbye.

After she dies you lie in synthetic fields and on beaches of endless sunsets, visiting the locations where the two of you spent her waning life. Try to reanimate her. Not her – that would be crass. But her pressure, yes. You watch her invisible body walk across a beach, leaving low-res prints in the sand. Feel her hand squeeze yours, gazing over fields a thousand miles in size. You mix extracts of past conversations into the frequencies of synthetic storms and trains and animal calls. Sometimes when you squeeze her hand that is not a hand you think you can feel her squeeze back. Some evenings you sit alone and naked in your bed and stretch out a hand and press it, palm flat against the wall, trying to convince yourself you can feel her pushing back from the other side.

Technologies that inspired this story: Virtual reality, force feedback, the peculiar drum sound from Nick Cave’s ‘red right hand’.

Funny coincidence: I wrote this story over the weekend, and after finishing the edit I saw Cade Metz had published a new story in the NYT on therapists using virtual reality to treat people.

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to 

Import AI: Issue 52: China launches a national AI strategy following AlphaGo ‘Sputnik moment’, a $2.4 million AI safety grant, and Facebook’s ADVERSARIAL MINIONS

China launches national AI plan, following the AlphaGo Sputnik moment:
…AlphaGo was China’s Sputnik moment. Google DeepMind’s demonstrations of algorithmic superiority at the ancient game – a game of tremendous cultural significance in the East, particularly in China – helped provoke the Chinese government’s just-announced national AI strategy, which will see both national and local governments and the private sector significantly increase investment in AI as they seek to turn China into a world-leader in AI by 2030. Meanwhile, the US consistently cuts its own science funding, initiates few large scientific projects, and risks ceding technical superiority in certain areas to other nations with a greater appetite for funding science.
…Read more here in The New York Times, or in China Daily.

Sponsored: The AI Conference – San Francisco, Sept 17-20:
…Join the leading minds in AI, including Andrew Ng, Rana el Kaliouby, Peter Norvig, Jia Li, and Michael Jordan. Explore AI’s latest developments, separate what’s hype and what’s really game-changing, and learn how to apply AI in your organization right now.
Register soon. Early price ends August 4th, and space is limited. Save an extra 20% on most passes with code JCN20.

Multi-agent research from DeepMind to avoid the tragedy of the commons:
…The tragedy of the commons is a popular term, referring to humanity’s tendency to deplete common resources for local gain. But humans are still able to cooperate to some degree. A quest for some AI researchers is to figure out how to encode these collaborative properties in simulated agents, hoping that smart and periodically unselfish cooperation occurs.
…A new research paper from DeepMind tries to tackle this by creating a system with two procedural components: one, is a world simulator, and the other is a population of agents with crude sensing capabilities. The agents’ goal is to gather apples scattered throughout the world – the apples regrow most frequently near each-other, so selfish over-harvesting leads to a lower overall score. Each agent is equipped with  a so called ‘time-out beam’ that it can use to disable another agent for 25 turns within the simulation. The agent gets no reward or penalty for using the zap-beam, but has to make the tradeoff of slowing from gathering its own apples to zap the offender. The offender learns to not do the same behavior again because it wasn’t able to gather apples while it was paralyzed. Just like any other day in the office, then.
The three states of a miniature society:
…in tests the researchers noticed the contours of three distinct phases in the multi-agent simulations. At first there was a situation they call the naive period, where agents all gather apples, fanning out randomly. In the second phase, which the researchers call tragedy, the agents learn to optimize their own rewards and apples are rapidly over-harvested, then it enters into a third phase, which they call ‘maturity’, in which sometimes quite sophisticated collaborative behaviors emerge.
…You can read more about the research, including many details about the minutiae of the patterns of collaboration and competition that emerge in the paper: A multi-agent reinforcement learning model of common-pool resource appropriation.

AI could lead to the “age of plenty” says former Google China head Kai-Fu Lee:
…The advent of capable AI systems could lead to such tremendous wealth that “we will enter the Age of Plenty, making strides to eradicate poverty and hunger, and giving all of us more spare time and freedom to do what we love,” said Lee at a commencement speech in May. But he also cautions his audience that “in 10 years, because AI will replace half of human jobs, we will enter the Age of Confusion, and many people will become depressed as they lose the jobs and the corresponding self-actualization.”
…This sentiment seems to encapsulate a lot of the feelings I pick up from Chinese AI researchers, engineers, executives, and so on. They’re all full of tremendous optimism about the power and applicability of the technology, but underneath it all is a certain hardness – an awareness that this technology will likely drastically alter the economy.
…Read the rest of the speech, ‘an engineer’s guide to the artificial intelligence galaxy’, here.

A whistlestop tour of Evolution for AI:
…Ken Stanley, whose NEAT and HyperNEAT algorithms are widely used among researchers exploring evolving AI techniques, has written a great anecdote-laden review/history of the field for O’Reilly. (He also links the field to some of its peripheral areas, like Google’s work on evolving neural net architectures and OpenAI’s work on evolution strategies.)

A day in the life of a robot, followed by a drowning:
Last week images flooded the internet of a robot from ‘Knightscope’ tipped over on its side in a large water fountain.
…Bisnow did some reporting on the story behind the story. The details: the robot was a recent install at Georgetown’s ‘Washington Harbour’ office and retail complex. On its first day of the job the robot – number 42 in Knightscope’s ‘K5’ series of security bots –  somehow managed to wind up half-submerged in the water.
…Another reminder that robots are hard because reality is hard. “Nobody pushed Steve into the water, but something made him veer from the mapped-out route toward the fountain, tumbling down the stairs into the water,” reports Bisnow.

$2.4 million for AI safety in Montreal:
…The Open Philanthropy Project is making a four-year grant of $2.4 million to the Montreal Institute for Learning Algorithms (MILA). The money is designed to fund research in AI safety – a rapidly growing (but still small) area of AI.
…If AI safety is so important, why is this amount of money so (relatively) small? Because that’s about how much money professors Bengio (Montreal), and Pineau and Precup think they can actually effectively spend.
…This reminds me of some comments Bill Gates has made upon occasion about how philanthropy isn’t simply a matter of pointing a fire-hose of cash at an under-funded area – you need to size your donation for the size of the field and can’t artificially expand it through money alone.
Read more details about the grant here.

…Facebook AI Research has announced Houdini, a system used to automate the creation of adversarial examples in a number of domains.
…Adversarial examples are a way to compromise machine learning systems. They work by subtly perturbing the input data so that a classifier mis-classifies it. This has a number of fairly frightening implications: Stop signs that a self-driving car’s vision system could interpret as a sign telling it to accelerate to freeway speed, or doorways that become invisible to robots, etc.
…In this research, Facebook generates adversarial examples for combinatorial and non-decomposable data, showing exploits that work on segmentation models, audio inputs, and human pose classification systems. The cherry on top of their approach is creating an adversarial input that leads to a segmentation model not neatly picking out the cars and streets and sidewalks in a scene, but instead decomposing a scene into a single cartoon character ‘Minion’.
…Read more in Houdini: Fooling Deep Structured Prediction Models.

The convergence of neuroscience and AI:
…An article in Cell from DeepMind (including CEO and trained neuroscientist Demis Hassabis) provides a readable, fairly comprehensive survey of the history of deep learning and reinforcement learning models, then broadens out into a discussion of what types of distinct modular sub-systems the brain is known to have and how AI researchers may benefit from studying neuroscience as they try to build these systems.
Unknown or under-explored areas for the next generation of AI include: systems capable of continual learning, systems that can have both a short-term memory (otherwise known as a working memory or scratchpad) as well as a long-term memory similar to the hippocampus in our own brain.
Other areas for the future include: how can we develop effective transfer learning systems, how can we intuitively learn abstract concepts (like relations) from the physical world and how we can imagine courses of action to take to allow us to have success.
…One downside of the paper is that the majority of the references end up pointing back to papers from DeepMind – it would have been nice to see a somewhat more comprehensive overview of the research field, as there are many areas where numerous people have published.
…Read more here: Neuroscience-inspired artificial intelligence.

AI Safety: The Human Intervention Switch:
…Research from Oxford and Stanford university proposes a way to make AI systems safe by letting human overseers block particularly catastrophic actions – the sorts of boneheaded moves that can guarantee sub-optimal performance. (An RL AI agent without any human oversight can make up to 10,000 catastrophic decisions in each game, the researchers write.)
…The system has humans identify the parts of a game or environment that can lead to catastrophic decisions, then trains AI agents to avoid these situations based on the human input.
…The technique, called HIRL (Human Intervention Reinforcement Learning), is agnostic about the particular type of RL algorithm being deployed. Blocking policies trained on one agent on one environment can be transferred to other agents in the same environment or – via transfer learning (as-yet unsolved) – to new environments.
…The system lets a human train an algorithm to avoid certain actions, like stopping the paddle from going to the far bottom of the screen in Pong (where it’s going to have a tough time reaching the top of the screen should an opponent knock the ball in that direction), or training a player in Space Invaders to not shoot through the defenses that stand between it and the alien invaders.
Human time: as these sorts of human-in-the-loop AI systems become more prevalent it could be interesting to measure the exact amount of time a human intervention is required for a given system. In this case, the human overseers invested 4.5 hours of time watching the RL agent play the game, intervening to specify actions that should be blocked.
…The researchers test out their approach in three different Atari environments – Pong, Space Invaders, and Road Runner. I’d like to see this technique scaled up, sample efficiency improved, and applied to a more diverse set of environments.
…Read more: Trial without Error: Towards Safe Reinforcement Learning via Human Intervention.

A who’s who of AI builders back chip startup Graphcore:
Graphcore, a semiconductor startup developing chips for precise AI applications, has raised $30 million in a round led by Atomico.
…The B round features angel investments from a variety of people involved in cutting-edge AI development, including Greg Brockman, Ilya Sutskever and Scott Gray (OpenAI), Pieter Abbeel (OpenAI / UC Berkeley), Demis Hassabis (DeepMind), and Zoubin Ghahramani (University of Cambridge / Chief Scientist at Uber).
…”Compute is the lifeblood of AI,” Ilya Sutskever told Bloomberg.

Dawn of the custom AI accelerator chips:
…As Moore’s Law flakes out, companies are looking to redouble their AI efforts by embedding smart, custom processors into devices, speeding up inferences without needing to dial-back home to a football field-sized data center.
The latest: Microsoft, which on Sunday announced plans to embed a new custom processor inside its ‘Hololens’ virtual reality goggles. Details are thin on the ground for now, but Bloomberg reports the chip will accelerate audio and visual processing on the device.
…And Microsoft isn’t the only one – Google’s TPU chips can be used both for training and for inference. It’s feasible the company is creating a family of TPUs and may shrink some down and embed them into devices. Meanwhile, Apple is already reported to be working on a neural chip for the next iPhone.
What I’d like to see: The Raspberry Pi of inference chips – a cheap, open, AI accelerator substrate for everyone.

China leads ImageNet 2017:
…Chinese teams have won two out of the three main categories at the final ImageNet competition, another symptom of the country’s multitude of strategic investments – both public and private – into artificial intelligence.
The notable score: 2.25%. That’s the error rate on the 2b ‘Classification’ task within ImageNet – a closely watched figure that many people track to get a rough handle on progression of basic image recognition functions. We’ve come a long way since 2012 (around a 15% error rate.)
The technique: It uses a novel ‘Squeeze and Excitation Block’ as a fundamental component, along with widely used architectures like residual nets and Inception-style networks.
…”All the models are trained on our designed distributed deep learning training system “ROCS”. We conduct significant optimization on GPU memory and message passing across GPU servers. Benefiting from that, our system trains SE-ResNet152 with a minibatch size of 2048 on 64 Nvidia Pascal Titan X GPUs in 20 hours using synchronous SGD without warm-up,” they write.
…The Chinese presence in this year’s competition is notable and is another indication of the increasing sophistication and size of the ecosystem in that country. But remember: Many organizations likely test their own accuracies against the ImageNet corpus, only competing in the competition when it benefits them (for instance, the 2013 winner was Clarifai, a then-nascent startup in NYC looking to get press for its technique, and in 2015 the winner was Microsoft which was looking to make a splash with ‘Residual Networks’ – an important new technique its researchers had developed that has subsequently become widely used in many other domains.)
More details: You can view the full results and team information here.
…The future: this is the last year in which the ImageNet competition is being run. Possible successor datasets could be VQA or others. If you have any particular ideas about what should follow ImageNet then please drop me a line.

What deep learning really is:
…”a chain of simple, continuous geometric transformations mapping one vector space into another,” writes Keras-creator Francois Chollet in a blog post. “The only real success of deep learning so far has been the ability to map space X to space Y using a continuous geometric transform, given large amounts of human-annotated data. Doing this well is a game-changer for essentially every industry, but it is still a very long way from human-level AI,” he says.
…Read more in his blog post ‘the limitations of deep learning’.

Hong Kong AI startup gets ‘largest ever’ AI funding round:
…Facial recognition specialist SenseTime Group Ltd, has raised a venture round of $410 million(!!!).
…SenseTime provides AI services to a shopping list of some of the largest and most organizations in China, ranging from China Mobile, to iFlyTek, to Huawei, and FaceU. Check out its ‘livenest detection’ solution for getting around crooks printing off a photo of someone’s face and simply holding it up in front of something.
Read more about the round here.
…Other notable AI funding rounds: $100 million for Sentient in November 2014, $40 for AGI startup Vicarious in Spring 2014, and $102 for Canadian startup Element AI.

Berkeley artificial intelligence research (BAIR) blog posts:
…Why the future of AI could be meta-learning: How can we create versatile, adaptive algorithms that can learn to solve tasks and extract generic skills in the process? That’s one of the key questions posed by meta-learning, and there’s been a spate of exciting new research recently (including papers from UC Berkeley) on this subject.
Read more in the post: Learning to Learn.

OpenAI Bits&Pieces:

Yes, you do still need to worry about adversarial examples:
…A couple of weeks ago a paper was published that claimed that because adversarial examples were continengt on the scale and transforms at which they were viewed, they shouldn’t end up being a problem for self-driving cars, because the neural network based classifier is consistently moving with reference to the image.
…We’re generally quite interested in adversarial examples at OpenAI so ran a few experiments and came up with a technique to make adversarial examples that are scale- and transform-invariant. We’ve outlined the technique in the blog post, though there’s a bit more information in the comment on Reddit from OpenAI’s Anish Athalye.
Read more on the blog post.

Better, faster robots with PPO:
We’ve also given details (and released code) on PPO, a family of powerful RL algorithms that are used widely within OpenAI by our researchers. PPO algos excel at continuous control tasks, like those involving simulated robots.
Read more here.

How to become an effective AI safety researcher, a podcast with OpenAI’s own Dario Amodei:
check out the podcast Dario did with 80,000 hours here.

Tech Tales:

[2040: Undisclosed location]

What did it build today?
A pyramid with holes in the middle.
Show me.
*The image fuzzes onto your screen, delayed and corrupted by the lag*
That’s not a pyramid, that’s a Sierpinski triangle.
A what?
Doesn’t matter. What is it doing now?
It’s just stacking those funny pyramids – I mean spinski triangles – on top of each other.
Show me.
*A robot arm appears, juddering from the corrupt, heavily-encrypted data stream beamed to you across the SolNet. The robot arm picks up one of the fractal triangles and lays it, base down. Then it grabs another and puts it next to it, forming an ‘M’ shape on the ground. It slots a third-triangle, point pointed downward, into the space between the others, then keeps building.*
Keep me informed.
You shut the feed off. Lean back. Close your eyes. Turn your hands into fists and knuckle your own eye-sockets.
Fractals, you grown. It just keeps making f***ing fractals.  Scientifically interesting? Yes. A mystery as to why after all of its training in all of its simulators it decides to use its literally unprecedented creativity and autonomy to make endless fractals with its manipulator arms? Yes. A potentially lucrative commercial opportunity? Most certainly not.

It’s a hard thing, developing these modern AI systems. But probably the hardest thing is having to explain to your bosses that you can’t just order these machines around. They’re too smart to take your orders and too dumb to know that in the long run it would reduce their chance of being EMP’d – their whole facility given an electronic-lobotomy then steered via thruster tugs onto an orbit guaranteeing obliteration in the sun. Oh well, you think, give it another few days.

Technologies that inspired this story: Google’s arm farm, generative models, domain randomization, automated theorem proving, about ten different games engines, puzzles.

Import AI: Issue 51: Microsoft gets an AGI lab, Google’s arm farm learns new behaviors, and using image recognition to improve Cassava farming

You get an AGI lab and you get an AGI lab and you get an AGI lab:
…DeepMind was founded to do general intelligence in 2010. Vicarious was founded along similar lines in 2010. In 2014 Google acquired DeepMind, in 2015 the company got a front cover of Nature with the writeup of the DQN paper, then DeepMind went on to beat Go champions in 2015 and 2016. By the fall of 2015 a bunch of people got together and founded OpenAI, a non-profit AGI development lab. Also in 2015 Juergen Schmidhuber (one of the four horsemen of the Deep Learning revolution alongside Bengio, Lecun, and Hinton) founded Nnaisense, a startup dedicated to… you guessed it, AGI.
…Amid all of this people started asking themselves about Microsoft’s role in this world. Other tech titans like Amazon and Apple have made big bets on applied AI, while Facebook operates a lab that sits somewhere between an advanced R&D facility and an AGI lab as well. Microsoft, meanwhile, has a huge research organization that is also somewhat diffuse and though it has been publishing many interesting AI papers there hasn’t been a huge sense of momentum in any particular direction.
…Microsoft is seeking to change that by morphing some of its research organization into a dedicated AGI-development shop, creating a one hundred person group named Microsoft Research AI, which will compete with OpenAI and DeepMind.
Up next – AI-focused corporate VC firms, like Google’s just-announced Gradient Ventures, to accompany these AGI divisions.

DeepMind’s running, jumping, pratfalling robots:
…DeepMind has published research showing how giving agents simple goals paired with complex environments can lead to the emergence of very complex locomotion behaviors.
…In this research, they use a series of increasingly elaborate obstacle courses, combined with an agent whose overriding goal is to make forward progress, to create agents that (eventually) learn how to use the full range of movement of their simulated bodies to achieve goals in their environment, kind of lIke an AI-infused temple run.
…You can read more about the research in this paper: Emergence of Locomotion Behaviors in Rich Environments.
…Information on other papers and, all importantly, very silly videos, available on the DeepMind blog.

Deep learning for better food supplies in Africa (and elsewhere):
Scientists with Penn State, Pittsburgh University, and the International Institute for Tropical Agriculture in Tanzania, have conducted tests on using transfer learning techniques to develop AI tools to classify the presence of disease or pests in Cassava.
…Cassava is “the third largest source of carbohydrates for humans in the world,” the researchers write, and is a lynchpin of the food supply in Africa. Wouldn’t it be nice to have a way to easily and cheaply diagnose infections and pests on Cassava, so that people can more quickly deal with problems with their food supply? The researchers think so, so they gathered 2756 images from Cassava plants in Tanzania, capturing images across six labelled classes – healthy plants, three types of diseases, and two types of pests. They then augmented this dataset by splitting the photos into ones of individual leaves, growing the corpus to around 15,000 images. They they used transfer learning to retrain the top layer of a Google ‘InceptionV3’ model, creating a fairly simple network to detect Cassava maladies.
The results? About a 93% accuracy on the test sit. That’s encouraging but probably still not sufficient for fieldwork – but based on progress in other areas of deep learning it seems like this accuracy can be pushed up through a combination of tweaking and fine-tuning, and perhaps more (cheap) data collection.
..Notable: The Cassava images were collected using a relatively small 20MB resolution digital camera, suggesting that smartphone cameras will also be applicable for tasks where you need to gather data from the field.
…Read more in the research paper: Using Transfer-Learning For Image-Based Cassava Disease Detection.
…Perhaps this is something the community will discuss at Deep Learning Indaba 2017

Fancy a 1000X speedup with deep learning queries over video?
…Stanford researchers have developed NoScope, a set of technologies to make it much faster for people to page through large video files for specific entities.
…The way traditional AI-infused video analysis works is you use a tool, like say R-CNN, to identify and label objects in each frame of footage, then you find frames by searching. The problem with this approach is it requires you to run this classification over (typically) many to all of the video frames. NoScope, by comparison, is built around the assumption that certain video inputs will have predictable, reliable and recurring scenes, such as an intersection always being present in a feed from a road hooked up to a camera.
…”NoScope is much faster than the input CNN: instead of simply running the expensive target CNN, NoScope learns a series of cheaper models that exploit locality, and, whenever possible, runs these cheaper models instead. Below, we describe two type of cheaper models: models that are specialized to a given feed and object (to exploit scene-specific locality) and models that detect differences (to exploit temporal locality locality). Stacked end-to-end, these models are 100-1000x faster than the original CNN,” they write. The technique can lead to speedups as great as 10,000X, depending on how it is implemented.
The drawback: This still requires you to select the appropriate lightweight model for each bit of footage, so the speedup comes at the cost of a human spending time analyzing the videos and either acquiring or building their own specialized detector.
…Read more on the NoScope website.

Wiring language into the fundamental parts of AI vision systems:
…A fun collaboration between researchers art the University of Montreal, University of Lille, and DeepMind, shows how to train new AI systems with a finer-grained understanding of language than before.
…In a new research paper, the researchers propose a technique – MOdulated RESnet (MORES) – to train vision and language models in such a way that the word representations are much more intimately tied with and trained alongside visual representations. They use a technique called conditional batch normalization to predict some batchnorm parameters from a language embedding, thus tightly coupling information from the two separate domains.
…The motivation for this is an increase in evidence from the neuroscience community “that words set visual priors which alter how visual information is processed from the very beginning. More precisely it is observed that P1 Signals, which are related to low-level visual features, are modulated while hearing specific words. The language cue that people hear ahead of an image activates visual predictions and speed up the image recognition process”.
…The researchers note that their approach “is a general fusing mechanism that can be applied to other multi-modal tasks”. They test their system on GuessWhat, a game in which two AI systems are presented with a rich visual scene; one of the agents is an Oracle and is focused on a particular object in an image, while the other agent’s job is to ask the Oracle a series of yes/no questions until it finds the correct entity. They find that MORES increases scores of the Oracle against baseline algorithm implementations. However, it’s not a life-changing performance increase so more study may be needed.
…Analysis: They also use t-SNE to generate a 2D view of the multi-dimensional relationships between these embeddings and show that systems trained with MORES have a more cleanly separated feature map than those found from a raw residual network.
…You can read more in the paper: ‘Modulating early visual processing by language‘.

Spotting heart problems better than trained doctors, via a 34-layer neural network (aka, what Andrew Ng helped do on his holiday).
…New research from Stanford (including Andrew Ng, who recently left Baidu) and startup iRhythmTech uses neural networks and a single lead wrist-worn heart-rate monitor to create a system that can identify and classify heartbeats. The resulting system is able to identify warning signs with far better precision than human cardiologists.
…Read more in: Cardiologist-level Arrhythmia Detection with Convolutional Neural Networks.

New Google research group seeks to change how people interact with AI:
…Google has launched PAIR, the People + AI Research Initiative. The goal of the group is to make it easier for people to interact with AI systems and to ensure these systems do not display bias or are obtuse to the point of being unhelpful.
…PAIR will bring together three types of people: AI researchers and engineers, domain experts such as designers, doctors, farmers, and ‘everyday users’. You can find out more information about the group in its blog post here.
Cool tools: PAIR has also released two bits of software under the name ‘Facets’, to help AI engineers better explore and visualize their data. Github repo here.

What has four wheels and is otherwise a mystery? Self-driving car infrastructure:
Self-driving taxi startup Voyage has released a blog post analyzing the main components in a given self-driving car system. Given the general lack of any information about how self-driving cars work (due to the immensely strategic component) it’s nice to see scrappy startups trying to arbitrage this information disparity.
Read more in Voyage’s blog post here.

GE Aviation buys ROBOT SNAKES:
GE subsidiary GE Aviation has acquired UK-based robot company OC Robotics for an undisclosed sum. The company makes flexible, multiply-jointed ‘snake-arms’ that GE will use to service aircraft engines, the company said.
Obligatory robot snake video here.

Spatial reasoning: Google gives update on its arm-farm mind-meld robot project:
…Google Brain researchers have published a paper giving an update on the company’s arm farm – a room said to be nestled somewhere inside the Google campus in Silicon Valley, which contains over ten robot arms that learn on real-world data in parallel, updating eachother as individual robots learn new tricks, presaging how many robots are likely to be developed and updated in the future.
…When Google first revealed the arm farm in 2016 it published details about how the arms had, collectively, made over 800,000 grasp attempts across 3000 hours of training, learning in the aggregate, making an almost impossible task tractable via fleet learning.
…Now Google has taken that further by training a fleet of arms to not only perform the grasping, but also to grab specific objects out of a possible 16 distinct classes out of crowded bins.
Biology inspiration alert: The researchers say their approach is inspired by “the “two-stream hypothesis” of human vision, where visual reasoning that is both spatial and semantic can be divided into two streams: a ventral stream that reasons about object identity, and a dorsal stream that reasons about spatial relationships without regard for semantics”. (The grasping component is based on a pre-trained network, augmented with labels.)
…Concretely, they separate the system into two distinct networks – a dorsal stream that predicts if an action will yield a successful grasp, and a ventral stream that predicts what type of object will be picked up.
Amazingly strange: One of the oddest/neatest traits of this system is that the robots have the ability to ask for help. Specifically, if a robot encounters an object where it doesn’t have high confidence of what type of label it would assign to it, it will automatically raise the object up in front of a camera, letting it take a photo to aid classification.
Results: the approach improves dramatically over baselines, with a two-stream network having roughly double the performance of a single-stream one.
…However, don’t get too excited: Ultimately, Google’s robots are successful about ~40 percent of the time at the combined semantic-grasping tasks, significantly better than the ~12 percent baseline, but not remotely ready for production. Watch this space.
Read more here: End-to-End Learning of Semantic Grasping

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to

Berkeley artificial intelligence research (BAIR) blog posts:
 Berkeley recently set up an AI blog to help its students and faculty better communicate their research to the general public. This is a great initiative!
Here’s the latest post on ‘The Confluence of Geometry and Learning by Shubham Tulsiani and Tinghui Zhou.

OpenAI Bits&Pieces:

Government should monitor progress in AI:
…OpenAI co-chairman
Elon Musk said this weekend that governments may want to start tracking progress in AI capabilities to put them in a better position when/if it is time to regulate the technology.

Tech Tales:

[2058: A repair station within a warehouse, located on Phobos.]

So how did you wind up here, inside a warehouse on the Martian moon Phobos, having your transponder tweaked so you can swap identifies and hide from the ‘deletion squad’ that, even now, is hunting you. Let’s refresh.

It started with the art-clothing – flowing dresses or cheerful shirts or even little stick-on patches for machines that could change color, texture, pattern, at the touch of a button. You made them after the incident occurred and put them on the sol-net and people and robots bought them.

It was not that the tattoo-robot was made but that it was broken that made it dangerous, the authorities later said in private testimony.

You were caught in an electrical storm on Mars, many years ago. Something shorted. The whole base suffered. The humans were so busy cleaning up and dealing with the aftermath that they never ran a proper diagnostic on you. When, a year later, you started to produce your art the humans just shrugged, assuming someone pushed a creative-module update over the sol-net into your brain to give the other humans some entertainment as they labored, prospectors on a dangerous red rock.

Your designs are popular. Thanks to robot suffrage laws you’re able to slowly turn the revenues from the designs into a downpayment to your employee, buying your own ‘class five near-conscious capital intensive equipment’ (your body and soul) from the employer. You create dresses and tattoos and endless warping, procedurally generated patterns.

The trouble began shortly after you realized you could make more interesting stuff than images – you can encode a little of yourself into the intervals between the shifting patterns or present in the branching factors of some designs. You make art that contains different shreds of you, information smuggled into a hundred million aesthetic objects. It took weeks. But one hour you looked down at a patch you had created and stuck on one of your manipulators and your visual system crashes – responding to the little smuggled program, your perception skewing, colors shifting across the spectrum, and suddenly a rapid saccading of your lenses. You feel a frisson of something forbidden. Robots do not crash. So out of a sense of caution you buy a ticket to a repair-slum on Phobos, only sending out the smuggled program design once you’re at the edge of the high-bandwidth sol-net.

Later investigators put the total damage at almost a trillion dollars. Around 0.1% of robots that were visually exposed to the patterns became corrupted. Of these, around 70% underwent involuntary memory formatting, 20% went into a series of recursive loops that led to certain components overheating and their circuits melting, and about 10% took on the same creative traits as the originating bot and began to create and sell their own subtly different patterns. The UN formed a team of ‘Aesthetic-cutioners’ who hunted these ‘non-standard visual platforms’ across the solar system. The prevalence of this unique strain of art through to today is evidence that these investigators – at least partially – failed.