Import AI: Issue 56: Neural architecture search on a budget, Google reveals how AI can improve its ad business, and a dataset for building personal assistants
by Jack Clark
New dataset: Turning people into personal assistants — for SCIENCE…
…As AI researchers look to build the next generation of personal assistants, there’s an open question as to how these systems should interact with people. Now, a new dataset and research study from Microsoft aims to provide some data about how humans and machines could work together to solve information-seeking problems.
…The dataset consists of 22 pairs of people (questioners and answerers), who each spent around two hours trying to complete a range of information-seeking tasks. The questioner has no access to the internet themselves, but can speak to the answerer who has access to a computer with the internet. The questioner asks some pre-assigned questions, like I’ve been reading about the HPV vaccine, how can I get it? I want to travel around America seeing as much as possible in three months without having to drive a vehicle myself, what’s the best route using public transit I should take?). The answerer plays the role of a modern Google Now/Cortana/Siri and uses a web browser to find out more information, asking clarifying questions to the other person when necessary. This human-to-human dataset is designed to capture some of the weird and wacky ways people try to get answers to questions.
…You can get the full Microsoft Information Seeking Conversations (MISC) dataset from here.
…Find out more information in the research paper, MISC: A dataset of information-seeking conversations.
…”We hope that the MISC data can be used to support a range of investigations, including for example the understanding the relationship between intermediaries’ behaviours and seekers’ satisfaction; mining seekers’ behavioural signals for correlations with success, engagement, or satisfaction; examining the tactics used in conversational information retrieval and how they dier from tactics in other circumstances; the importance of conversational norms or politeness; or investigating the relationship between conversational structure and task progress,” they write.
Sponsored: The AI Conference – San Francisco, Sept 17-20:
…Join the leading minds in AI, including Andrew Ng, Rana el Kaliouby, Peter Norvig, Jia Li, and Michael Jordan. Explore AI’s latest developments, separate what’s hype and what’s really game-changing, and learn how to apply AI in your organization right now.
…Register soon. Space is limited. Save an extra 20% on most passes with code JCN20.
Number of the week: 80 EXABYTES:
…That’s the size of the dataset of heart ultrasound videos shared by Chinese authorities with companies participating in a large-scale digital medicine project in 7-million pop city Fuzhou. (For comparison, the 2014 ImageNet competition dataset clocked in at about 200 gigabytes, aka .2 terabytes, aka 0.002 exabytes.)
…Read more in this good Bloomberg story about how China is leveraging its massive stores of data to spur its AI economy: China’s Plan for World Domination in AI Isn’t So Crazy After All.
Bonus number of the week: 4.5 million:
…That’s (roughly) the number of transcribed speeches in a dataset just published by researchers with Clemson University and the University of Essex. The dataset covers speeches given in the Irish parliament between 1919 and 2013.
….There’ll be a wealth of cool things that can be developed with such a dataset. As a preliminary example the researchers try to predict the policy positions of Irish finance ministers by analyzing their speeches over time in the parliament. You could also try to use the dataset to analyze the discourse of all speakers in the same temporal cohort, then model how their own positions change relative to eachother and starting points over time. For bonus points, train a language model to generate your own Irish political arguments?
…Read more here: Database of Parliamentary Speeches in Ireland (1919 – 2013).
…Get the data here from the Harvard Dataverse.
The growing Amazon Web Services AI Cloud:
…Amazon, which operates the largest cloud computing service in AWS, is beginning to thread machine learning capabilities throughout its many services. The latest? Macie, a ML service that trawls through files stored in AWS, using machine learning to look for sensitive data (personally identifiable information, intellectual property, etc) in a semi-supervised way. Seems like RegEx on steroids.
…Read more here about Amazon Macie.
AI matters; matter doesn’t:
…A Chinese company recently released Eufy, a cute hockey puck shaped personal speaker/mic system that runs Amazon’s ‘Alexa’ operating system. Amazon is letting people build different types of hardware that can connect to its fleet of proprietary Alexa AI services – a clear indication that Amazon thinks its underlying AI software is strategic, while hardware (like its own ‘Echo’ systems) is just a vessel.
…Read more here: This company copied the Amazon Dot and will sell for less – with Amazon’s blessing.
Making computer dreams happen in high-resolution:
…Artist Mike Tyka has spent the past few months trying to scale-up synthetic images dreamed up by neural networks. It’s a tricky task because today it’s unfeasible to generate images of higher resolution than about 256X256pixel due to RAM/GPU and other processing constraints.
…In a great, practical post Tyka describes some of the steps he has taken to scale-up the various images, generating large, freaky portraits of imaginary people. There’s also an excellent ‘insights’ section where he talks about some of the commonsense bits of knowledge he has gained from this experiment. Also, check out the latest images. “Getting better skin texture but have seems to have gotten worse,” he writes.
…Read more: Superresolution with semantic guide.
Psycho (Digital) Filler, Qu’est-ce que c’est?
…Talking Heads frontman David Byrne believes technology is making each of us more alone and more atomized by swapping out humans in our daily lives for machines (tellers for ATMs, checkout clerks for checkout scanners, drivers for self-driving software, delivery drivers for drones&landbots, and so on).
…”Our random accidents and odd behaviors are fun—they make life enjoyable. I’m wondering what we’re left with when there are fewer and fewer human interactions. Remove humans from the equation, and we are less complete as people and as a society,” he writes.
…Read more here in: Eliminating the Human
Google reveals way to better predict click-through-rate for web adverts:
…Google is an AI company whose main business is advertizing, so it’s notable to see the company publish a technical research paper at the intersection of the two areas, defining a new AI technique that it says can lead to substantially better predictions of click-through-rates for given adverts. (To get an idea of how core this topic is to Google’s commercial business, think of this paper as being equivalent to Facebook publishing research on improving its ability to predict which actions friends can take that will turn a dormant account into an active one, or Kraft Foods coming up with a better, cheaper, quicker to cook instant cheese).
…The paper outlines “the Deep & Cross Network (DCN) model that enables Web-scale automatic feature learning with both sparse and dense inputs.” This is a new type of neural network component that is potentially far better and simpler at learning the sorts of patterns that advertising companies are interested in. “Our experimental results have demonstrated that with a cross network, DCN has lower logloss than a DNN with nearly an order of magnitude fewer number of parameters,” they write.
…How effective is it? In tests, DCN systems get the best scores while being more computationally efficient than other systems, Google says. The implications of the results seem financially material to any large-scale advertizing company. “:DCN outperforms all the other models by a large amount. In particular, it outperforms the state-of-art DNN model but uses only 40% of the memory consumed in DNN,” Google writes. The company also tested the DCN system on non-advertizing datasets, noting very strong performance in these domains as well, implying significant generality of the approach.
…Read more here: Deep & Cross Network for Ad Click Predictions.
Neural architecture search on a pauper’s compute budget:
….University of Edinburgh researchers have outlined SMASH, a system that makes it substantially cheaper to use AI to search through possible neural network architectures, while only trading off a small amount of accuracy.
…Resources: SMASH can be trained on a handful and/or a single GPU, whereas traditional neural architecture search approaches by Google and others can require 800 GPUS or more.
….The approach relies on randomly sampling neural network architectures, then using an auxiliary network (in this case a HyperNetwork) to generate the weights of the dreamed up network, then using backpropagation to train the network in an end-to-end way. The essential gamble in this approach is that the representation of networks being sample from is sufficiently broad, and that the parameters dreamed up by the HyperNet will map relatively closely to the sorts of parameters you’d use in such generated classifiers. This sidesteps some of the costs inherent to large-scale NAS systems, but at the cost of accuracy.
…SMASH uses a “memory-bank” view of neural networks to sample them. In this view “each layer [in the neural network] is thus an operation that reads data from a subset of memory, modifies the data, and writes the result to another subset of memory.”
…Armed with this set of rules, SMASH is able to generate a large range of modern neural network components on the fly, helping it efficiently dream up a variety of networks, which are then evaluated by the hypernetwork. (To get an idea of what this looks like in practice, refer to Figure 3 in the paper.)
…The approach seems promising. In experiments, the researchers saw meaningful links between the validation loss predicted by SMASH for given networks, and the actual loss seen when testing in reality. In other tests they find that SMASH can generate networks with performance approaching the state-of-the-art, at a fraction of the compute budget of other systems. (And, most importantly, without requiring AI researchers to fry their brains for months to invent such architectures.)
…Read more here: SMASH: One-Shot Model Architecture Search through HyperNetworks,
…Explanatory video here.
…Components used: PyTorch
…Datasets tested on: CIFAR-10 / CIFAR-100 / ImageNet 32 / STL-10 / ModelNet
A portfolio approach to AI Safety Research:
…(Said with a hint of sarcasm:) How do we prevent a fantastical future superintelligence from turning the entirety of the known universe into small, laminated pictures of the abstract dreams within its God-mind? One promising approach is AI safety! The thinking is that if we develop more techniques today to make agents broadly predictable and safe, then we have a better chance at ensuring we live in a future where our machines work alongside and with us in ways that seem vaguely interpretable and sensible to us.
…But, how do we achieve this? DeepMind AI safety research Victoria Krakovna has some ideas that loosely come down to ‘don’t put all eggs in one basket’, which she has outlined in a blog post.
…Read more here: A portfolio approach to AI safety research.
…Get Rational about AI Safety at CFAR!
…The Center for Applied Rationality has opened up applications for its 2017 AI Summer Fellows Program, which is designed to prepare eager minds for working on the AI Alignment Problem (the problem is regularly summarized by some people in the community as getting a computer to go and bring you a strawberry without it also carrying out any actions that have gruesome side effects.)
…You can read more and apply to the program here.
Chinese AI chip startup gets $100 million investment:
…Chinese chip startup Cambricon has pulled in $100 million in a new investment round from a fund linked to the Chinese government’s State Development and Investment Corp, as well as funding from companies like Alibaba and Lenovo.
…The company produces processors designed to accelerate deep learning tasks.
…Read more on the investment in China Money Network.
…Cambricon’s chips ship with a proprietary instruction set designed for a range of neural network operations, with reasonable performance across around ten distinct benchmarks. Also, it can be fabricated via TSMC’s venerable 65nm process node technology, which means it is relatively cheap and easy to manufacture at scale.
…More information here: Cambricon: An Instruction Set Architecture for Neural Networks.
Facial recognition at the Notting Hill Carnival in the UK:
…the UK”s metropolitan use will conduct a large-scale test of facial recognition this month when they use the tech to surveil the hordes of revelers at the Notting Hill Carnival street party in London. Expect to see a lot of ML algorithms get confused by faces occluded by jerk chicken, cans of red stripe, and personal cellphones used for selfies.
…Read more here: Met police to use facial recognition software at Notting Hill carnival.
Automation’s connection to politics, aka, Republicans live near more robots than Democrats:
…The Brookings Institution has crunched data from the International Federation for Robotics to figure out where industrial robots are deployed in America. The results highlights the uneven distribution of the technology.
…State with the most robots: Michigan, ~28,000, around 12 percent of the nation’s total.
…Most surprising: Could the distribution of robots tell us a little bit about the conditions in the state and allow us to predict certain political moods? Possibly! “The robot incidence in red states that voted for President Trump in November is more than twice that in the blue states that voted for Hillary Clinton,” Brookings writes.
…Read more here: Where the robots are.
OpenAI Bits & Pieces:
Exponential improvement and self-play:
…We’ve published some more details about our Dota 2 project. The interesting thing to me is the implication that if you combine a small amount of human effort (creating experimental infrastructure, structuring your AI algorithm to interface with the environment, etc) and pair that with a large amount of compute efforts, you can use self-play to rapidly go from sub-human to super-human performance within certain narrow domains. A taste of things to come, I think.
…Read more here: More on Dota 2.
OpenAI co-founder and CTO Greg Brockman makes MIT 35 under 35:
…Greg Brockman has made it onto MIT Technology Review’s 35 under 35 list due to his work at OpenAI. Congrats Greg “visionary” Brockman.
…Read more here on the MIT Technology Review.
Move outta the way A2C and TRPO, there’s a new ACKTR in town:
…OpenAI has released open source code for ACKTR, a new algorithm by UofT/NYU that demonstrates tremendous sample efficiency and works on both discrete and continuous tasks. We’ve also released A2C, a synchronous version of A3C.
…Read more here: OpenAI Baselines: ACKTR & A2C.
[2028: A large data center complex in China]
Mine-Matrix Derivatives(™), aka MMD, sometimes just M-D, the world’s largest privately-held bitcoin company, spent a billion dollars on the AI conversion in year one, $4 billion in year two, $6 billion in year three, and then more. Employees were asked to sign vast, far-reaching NDAs in exchange for equity. Those who didn’t were fired or otherwise pressured to leave. What remained was a group of people held together by mutually agreed upon silence, becoming like monks tending to cathedrals. The company continued to grow its cryptocurrency business providing the necessary free cash flow to support its AI initiative. Its workers turned their skills from designing large football-field sized computer facilities to mine currencies, to designing equivalent housings for AIs.
The new processing system, code-named Olympus, had the same features of security and anonymity native to MMD’s previous cryptocurrency systems, as well as radically different processing capacities. MMD began to carry out its own fundamental AI research, after being asked to make certain optimizations for clients that required certain theoretical breakthroughs.
One day, a Russian arrive; a physicist, specializing in thermal dynamics. He had washed out of some Russian government project, one of MMD’s employees said. More like drunked out, said another. Unstable, remarked someone else. The Russian walked around Olympus datacenters wearing dark glasses that were treated with chemical and electrical components to let him accurately see the minute variations in heat, allowing him to diagnose the facility. Two days later he had a plan and, in one of the company’s innermost meeting rooms, outlined his ideas using pencil on paper.
These walls, he said, Get rid of them.
This space, he said, Must be different.
The ceiling, he said, Shit. You must totally replace.
MMD carried out renovations based on the Russian’s suggestions. The paper map is sealed in plastic and placed in a locked safe at an external facility, to be included in the company’s long-term archives.
The plan works: into the vacant spaces created by the Russian’s renovations come more computers. More powerful ones, built on different processing substrates. New networking equipment is installed to help shuttle data around the facility. Though from the outside it appears like any other large CryptoFarm, inside, things exist that do not exist anywhere else. The demands from MMD’s clients become more elaborate. More computers are installed. One winter morning an encrypted call comes in, offering larger amounts of money for the creation of an underground, sealed data center. MMD accepts. Continues.
MMD didn’t exactly disappear after that. But it did go on a wave of mergers and acquisitions in which it added, in no particular order: an agricultural equipment maker, a bowling ball factory, a (self-driving) trucking company, a battery facility, two sportswear brands, and more. Some of these businesses were intended to be decoys to its competitors and other interested governments, while others represented its true intentions.
They say it’s building computers on the moon, now.
Technologies that inspired this story: data centers, free air cooling, this Quartz article about visiting a Bitcoin mine,
[…] week I wrote about a Bitcoin mine putting its chip-design skills to use to create AI processors and spin-up…. …Imagine my surprise when I stumbled on this Quartz story a few hours after sending the […]
[…] for most researchers. That started to change last year with the publication of SMASH (covered in Import AI #56), a technique to do neural architecture search on a significant compute budget but with slight […]
[…] architecture search’ (NAS) systems efficient, and approaches like ENAS (Import AI #82) and SMASH (Import AI #56) have shown how to take systems that previously required hundreds of GPUs and fit them onto one or […]