Mapping Babel

Import AI: Issue 55: Google reveals its Alphabet-wide optimizer, Chinese teams notch up another AI competition win, and Facebook hires hint at a more accessible future

Welcome to the hybrid reasoning era… MIT scientists teach machines to draw images and to show their work in the process:
…New research from MIT shows how to fuse deep learning and program synthesis to create a system that can translate handdrawn mathematical diagrams into their digital equivalents – and generate the program used to draw them in the digital software as well.
…”Our model constructs the trace one drawing command at a time. When predicting the next drawing command, the network takes as input the target image as well as the rendered output of previous drawing commands. Intuitively, the network looks at the image it wants to explain, as well as what it has already drawn. It then decides either to stop drawing or proposes another drawing command to add to the execution trace; if it decides to continue drawing, the predicted primitive is rendered to its “canvas” and the process repeats,” they say.
…Read more in: Learning to Infer Graphics Programs from Hand Drawn Images.

Baidu/Google/Stanford whiz Andrew Ng is back with… an online deep learning tuition course:
…Andrew Ng has announced the first of three secret projects: a deep learning course on the online education website Coursera.
…The course will be taught in Python and TensorFlow (perhaps raising eyebrows at Ng’s former employer Baidu, given that the company is trying to popularize its own TF-competitor ‘Paddle’ framework).
Find out more about the courses here.
…Bonus Import AI ‘redundant sentence of the week’ award goes to Ng for writing the following ‘When you earn a Deep Learning Specialization Certificate, you will be able to confidently put “Deep Learning” onto your resume.

US military seeks AI infusion with computer vision-based ‘Project Maven’:
…the US military wants to use ML and deep learning techniques for computer vision systems to help it autonomously extract, label, and triage data gathered by its signals intelligence systems to help it in its various missions.
…”We are in an AI arms race”, said one official. The project is going to run initially for 36 months during which time the government will try to build its own AI capabilities and work with industry to develop the necessary expertise. “You don’t buy AI like you buy ammunition,” they said.
…Bonus: Obscure government department name of the week:
…’the ‘Algorithmic Warfare Cross-Function Team’
…Read more in the DoD press release ‘Project Maven to Deploy Computer Algorithms to War Zone by Year’s End.’
…Meanwhile, the US secretary of defense James Mattis toured Silicon Valley last week, telling journalists he worried the government was falling behind in AI development. “It’s got to be better integrated by the Department of Defense, because I see many of the greatest advances out here on the West Coast in private industry,” he said.
…Read more in: Defense Secretary James Mattis Envies Silicon Valley’s AI Ascent.

Sponsored Job: Facebook builds breakthrough technology that opens the world to everyone, and our AI research and engineering programs are a key investment area for the company. We are looking for a technical AI Writer to partner closely with AI researchers and engineers at Facebook to chronicle new research and advances in the building and deployment of AI across the company. The position is located in Menlo Park, California.
Apply Here.

Q: Who optimizers the optimizers?
A: Google’s grand ‘Vizier’ system!
…Google has outlined ‘Vizier’, a system developed by the company to automate optimization of machine learning algorithms. Modern AI systems, while impressive, tend to require the tuning of vast numbers of hyperparameters to attain good  performance. (Some AI researchers refer to this process as ‘Grad Student Descent’.)
…So it’s worth reading this lengthy paper from Google about Vizier, a large-scale optimizer that helps people automate this process. “Our implementation scales to service the entire hyperparameter tuning workload across Alphabet, which is extensive. As one (admittedly extreme) example, Collins et al. [6] used Vizier to perform hyperparameter tuning studies that collectively contained millions of trials for a research project investigating the capacity of different recurrent neural network architectures,” the researchers write.
…The system can be used to both tune systems and to optimize others via transfer learning – for instance by tuning the learning rate and regularization of one ML system, then running a second smaller optimization job using the same priors but on a different dataset.
…Notable: for experiments which run into the 10,000+ range Vizier supports standard RANDOMSEARCH and GRIDSEARCH technologies as well as a “proprietary local search algorithm” with tantalizing performance properties judging by the graphs.
…Read more about the system in Google Vizier: A Service for Black-Box Optimization (PDF).
Reassuringly zany experiment: Skip to the end of the paper to learn how Vizier was used to run a real world optimization experiment in which it iteratively optimized (via Google’s legions of cooking staff) the recipe for the company’s chocolate chip cookies.  “The cookies improved significantly over time; later rounds were extremely well-rated and, in the authors’ opinions, delicious,” they write.

Chinese teams sweep ActivityNet movement identification challenge, beating originating dataset team from DeepMind, others:
…ActivityNet is a challenge to recognize high-level concepts and activities from short videoclips found in the wild. It incorporates three datasets: ActivityNet (VCC Kaust)ActivityNet Captions (Stanford), and Kinetics (DeepMind). Challenges like this pose some interesting research problems (how to infer fairly abstract concepts like ‘walking the dog from unlabelled and labelled videos), and are also eminently applicable by various security apparatuses – none of this research exists in a vacuum.
…This year’s ActivityNet challenge was won by a team from Tsinghua University and Baidu, whose system had a top-5 accuracy (suggest five labels, one of them is correct) of 94.8% and a top-1 accuracy of 81.4%. The second place was one by a team from the Chinese University of Hong Kong,  ETH Zurich, and the Shenzhen Institute of Advanced Technology, with top-5 93.5% and top-1 78.6%. German AI research company TwentyBN took third place and DeepMind’s team took fourth place.
…Read more about the results in this post from TwentyBN: Recognizing Human Actions in Videos.
…Progress here has been quite slow at the high-end though (because the problem is extremely challenging): last year’s winning top-1 accuracy was 93.23% from CUHK/ ETHZ / SIAT.
…This year’s results follow a wider pattern of Chinese teams beginning to rank highly in competitions relating to image and video classification; other Chinese teams swept the ImageNet and WebVision competitions this year. It’s wonderful to see the manifestation of the country’s significant investment in AI and the winners should be commended for a tendency to publish their results as well.

Salesforce sets new language modeling record:
… Welcome to the era of modular, Rude Goldberg machine AI…
…Research from Salesforce in which the team attains record-setting perplexity scores on Penn TreeBank (52.8) and WikiText (52) via the use of what they call a weight-dropped LSTM, representing a rather complicated system consisting of numerous recent inventions ranging from DropConnect to Adam to randomized-length backpropagation through time, to regularization, to temporal activation regularization. The results of this word salad of techniques is a record-setting system.
…The research highlights a trend in modern AI development of moving away from trying to design large, end-to-end general systems (though I’m sure everyone would prefer it if we could build these) and instead focusing on eking out gains and new capabilities by assembling and combining together various components, developed by the concerted effort of many hundreds of researchers in recent years.
…The best part of the resulting system? It can be dropped into existing systems without needing any underlying modification of fundamental libraries like CuDNN.
…Read more here: Regularizing and Optimizing LSTM Language Models.

Visual question answering experts join Facebook…
…Georgia Tech professors Dhruv Batra and Devi Parikh recently joined Facebook AI Research part-time, bringing more machine vision expertise to the social network’s AI research lab.
…The academics are known for their work on visual question answering – a field of study where you train machine learning models to associate large-scale language models with the contents of images, letting you provide complex details about images in other forms. This has particular relevance to people who are blind or who need screen readers to be able to interact with sites on the web. Facebook has led the charge in increasing the accessibility of its website so it’ll be exciting to see what exactly the researchers come up with as they work at the social network.

STARCRAFTAGEDDON (Facebook: SC1, DeepMind: SC2):
Facebook unfurls large-scale machine learning dataset built around RTS game StarCraft:
…Facebook has released STARDATA, a 50,000-game large-scale dataset of recordings of humans playing the RTS game StarCraft. StarCraft is an RTS game that as defined e-sports in East Asia, particularly in South Korea. Now, companies such as Facebook, DeepMind, Tencent and others are racing with one another to create AI systems that can tackle the game.
…Read more on: STARDATA: a StarCraft AI Research Dataset.
DeepMind announces own large-scale machine learning dataset based around StarCraft 2: 53k to Facebook’s 50k, with plans to scale to “half a million”:
…Additionally, DeepMind has released a number of other handy tools for researchers keen to test out AI ideas on StarCraft, including an API (SC2LE), an open source toolset for SC2 development (PySC2), and a series of simple RL environments. StarCraft is a complex, real-time strategy game with hidden information, requiring AIs to be able to control multiple units while planning over extremely long timescales. It seems like a natural testbed for new ideas in AI including hierarchical reinforcement learning, generative models, and others.
Tale of the weird baseline: Along with releasing the SC2LE API DeepMind also released a bunch of baselines of AI agents playing SC2 including full games and mini-games. But the main game baselines used agents trained by A3C techniques — I’m excited to see future baselines trained on newer systems, like proximal policy optimization, FeuDAL reinforcement learning networks, and so on.
…Read more in: DeepMind and Blizzard open Starcraft II as an AI Research Environment.

OpenAI Bits and Pieces:

OpenAI beats top Dota pros at 1v1 mid:
…OpenAI played and won multiple 1v1 mid matches against multiple pro Dota 2 players at The International last week with an agent trained predominantly via self-play.
…Read more: Dota 2.

Practical AI safety:
…NYT article on practical AI safety, featuring OpenAI, Google, DeepMind, UC Berkeley, and Stanford. A small, growing corner of the AI research field with long-ranging implications.
…Read more: Teaching A.I. Systems to Behave Themselves

Tech Tales:

[2024: A nondescript office building on the outskirts of Slough, just outside of London.]

OK, so today we’ve got SleepNight Mattresses. The story is we hate them. Why do we hate them? Noisy springs. Gina and Allison are running the prop room, Kevin and Sarah will be doing online complaints, and I’ll be running the dispersal. Let’s get to it.

The scammers rush into their activities: five people file into an adjoining room and start taking photos of a row of mattresses, adorning them with different pillows or throws or covers, and others raising or lowering backdrop props to give the appearance of different rooms. Once each photo is taken the person tosses their phone across the room to a waiting runner, who takes it and heads over to the computer desks, already thumbing in the details of the particular site they’ll leave the complaint on. Kevin and Sarah grab the phones from the runners and sort them into different categories depending on the brand of phone – careful of the identifying information encoded into each smartphone camera – and the precise adornments of the mattresses they’ve photographed. Once the phones are sorted they distribute them to a team of copywriters who start working up the complaints, each one specializing in a different regional lingo, sowing their negative review or forum post or social media heckle with idiosyncratic phrases that should pass the anti-spam classifiers, registering with high confidence as ‘authentic; not malicious’.

The phones start to come back to you and you and your team inspect them, further sorting the different reviews on the different phones into different geographies. This goes on for hours, with stacks of phones piling up until the office looks like an e-waste disposal site. Meanwhile, you and your time fire up various inter-country network links, hooking your various phones up to ghost-links that spoof them into different locations across the world. Then the messages start to go out, with the timing carefully calibrated so as not to arouse suspicion, each complaint crafted to arrive at opportune times, in keeping with local posting patterns.

Hours after that and the search engines have adjusted. Various websites start to re-rank the various mattress products. Review sentiments go down. Recommendation algorithms hold their nose and turn the world’s online consumers away from the products. Business falls. You don’t know who gave you the order or what purpose they have to scam the SleepNight Mattresses out of favor – and you don’t care. Yesterday it was fishtanks, delivered by the pallet-load on vans with registrations you tried to ignore. Tomorrow is tomorrow, and you’ll get an order late tonight over an onion network. If you do your job right a cryptocurrency payment will be made. Then it’s on to the next thing. And all the while the classifiers are getting smarter – this is a game where every successful theft makes those you are thieving from smarter. ‘One of the last sources of low-end graduate employment,’ read a recent expose. ‘A potential goldmine for humanities graduates with low-sensibilities.’

Technologies that inspired this story: Collaborative filtering, sentiment analysis, boiler-room spreadsheets, Tor.

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to

Import AI: Issue 54: Why you should re-use word vectors, how to know whether working on AI risk matters, and why evolutionary computing might be what comes after deep learning

Evolutionary Computing – the next big thing in artificial intelligence:
Evolutionary computing is a bit like Fusion power – experts have been telling us for decades that if we just give the tech a couple more decades it’ll change the world. So far it hasn’t much.
…But that doesn’t mean the experts are wrong – it seems inevitable that evolutionary computing approaches will have a huge impact, it’s just that the general utility of these approaches will be closely tied to the amount of computers they can access, as it is likely that EC approaches are going to be less computationally efficient than systems which encode more assumptions about the world into themselves. (Empirically, aspects of this are already pretty clear. For example, OpenAI’s Evolutionary Strategies research shows that you can roughly match DQN’s performance on Atari with an evolutionary approach – it just costs you ten times more computers (but because you can parallelize to an arbitrary level, this doesn’t hurt you too much as long as you’re comfortable footing the power bill.)
…In this article the researchers outline some of the advantages EC approaches have over deep learning approaches. Highlights: EC excels at coming up with entirely new things which don’t have a prior, EC algos are inherently distributed, some algorithms can optimize for multiple objectives at once, and so on.
…You can read more of the argument in Evolutionary Computation: the next major transition in artificial intelligence?
…I’d like to see them discuss some of the computational tradeoffs more. Given that people are working with increasingly complex, high-fidelity, data-rich simulations (MuJoCo / Roboschool / DeepMind Lab / many video games / Unity-based drone simulators / and so on), it seems like there will be a premium on compute efficiency for a while. EC approaches do seem like a natural fit for data-lite environments, though, or for people with access to arbitrarily large amounts of computers.

Robots and automation in Wisconsin:
…Long piece of reporting about a factory in Wisconsin deploying robots (two initially, with two more on the way) from Hirebotics – ‘collaborative robots to rent’ – to increase reliability and presumably save on costs. The main takeaway from the story is that factories previously looking to deal with labor shortages either put expansion plans on hold, or raise (human) wages. Now they have a third option: automation. Combine that with plunging prices for industrial robots and you have a recipe for further automation.
…Read more in the Washington Post.

Why work on AI risk? If there’s no hard takeoff singularity, then there’s likely no point:
…That’s the point made by Robin Hanson, author of The Age of Em. Hanson says the only logical reason he can see for people to work on AI risk research today is to avert a hard takeoff scenario (otherwise known inexplicably as a ‘FOOM’)- that is, a team develops an AI system that improves itself, attaining greater skill at a given task(s) than the aggregate skill(s) of the rest of the world.
…A particular weakness of the FOOM scenario, Hanson says, is that it requires whatever organization is designing the AI to be overwhelmingly competent relative to everyone else on the planet. “Note that to believe in such a local explosion scenario, it is not enough to believe that eventually machines will be very smart, even much smarter than are humans today. Or that this will happen soon. It is also not enough to believe that a world of smart machines can overall grow and innovate much faster than we do today. One must in addition believe that an AI team that is initially small on a global scale could quickly become vastly better than the rest of the world put together, including other similar teams, at improving its internal abilities,” he writes.
…If these so-called FOOM scenarios are likely, then it’s critical we develop a broad, deep global skill-base in matters relating to AI risk now. If these FOOM scenarios are unlikely, then it’s significantly more lately the existing processes of the world – legal systems, the state, competitive markets – could naturally handle some of the gnarlier AI safety issues.
You can read more in ‘Foom justifies AI risk efforts now’.
…If some of these ideas have tickled your wetware, then consider reading some of the (free) 730-page eBook that collects various debates, both digital and real, between Hanson and MIRI’s Eliezer Yudkowsky on this subject.

Microsoft changes view on what matters most: mobile becomes AI
Microsoft Form 10K 2017: Vision:Our strategy is to build best-in-class platforms and productivity services for an intelligent cloud and an intelligent edge infused with artificial intelligence (“AI”).”
……# Mentions AI or artificial intelligence: 7
Microsoft Form 10K 2016: Vision: “Our strategy is to build best-in-class platforms and productivity services for a mobile-first, cloud-first world.”
……# Mentions AI or artificial intelligence: 0

Re-using word representations, inspired by ImageNet…
…Salesforce’s AI research wing has discovered a relatively easy way to improve the performance of neural networks specialized for tex classification: take hidden vectors generated during training on one task (like machine translation) and feed these context vectors (CoVes) into another network designed for another natural language processing task.
…The idea is that these vectors likely contain useful information about language, and the new network can use these vectors during training to improve the eery intuition that AI systems of this type tend to display.
…Results: This may be a ‘just add water’ technique – in tests across a variety of different tasks and datasets neural networks which used a combination of GloVe and CoVe inputs showed improvements of between 2.5% and 16%(!).  Further experiments showed that performance can be further improved on some tasks by adding Character Vectors as inputs as well. One drawback is that the overall pipeline for such a system seems quite complicated, so implementing this could be challenging.
…Salesforce has released the best-performing machine translation LSTM used within the blog post to generate the CoVe inputs. Get the code on GitHub here.

Facebook flips its ENTIRE translation backend from phrase-based to neural network-based translation:
…Facebook has migrated its entire translation infrastructure to a neural network backend. This accounts for over 2,000 distinct translation directions (German to English would be one direction, English to German would be another, for example), making 4.5 billion distinct translations each day.
…The components: Facebook’s production system uses a sequence-to-sequence Long-Short Term Memory (LSTM) network.  The system is implemented in Caffe2, an AI framework partially developed by Facebook (to compete with Google TensorFlow, Microsoft CNTK, Amazon MXNet, and so on).
…Results: Facebook saw an increase of 11 percent in BLEU scores after deploying the system

Averting theft with AI – researchers design system to predict which retail workers will steal from their employers:
…Research from the University of Wyoming illustrates how AI can be used to analyze data associated with a retail worker, helping employers predict which people are most at risk of stealing from them.
…Data: To do their work the researchers were given a dataset containing numerous 30-dimensional feature maps of a cashier’s activity at a “major retail chain”. These features included the cashier and store identification numbers as well as other unspecified datapoints. Overall the researchers received over 1,000 discrete batches of data, with each batch likely containing information on multiple cashiers.
…The researchers classified the data using three different techniques: Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Self-Organizing Feature Maps (SOFM). (PCA and t-SNE are both reasonably well understood and widely used dimensionality reduction techniques, while SOFM is a bit more obscure but uses neural networks to achieve a comparable sort of visualization to t-SNE, providing a check against it.)
…Each classification process was performed in an unsupervised manner, as the researchers lacked thoroughly labeled information.
…Other features include: coupons as a percentage of total transactions, total sales, the count of the number of refunded items, and counts of the number of times a cashier has interacted with a particular credit card, among others.
…The researchers ultimately find that SOFM captures harder to describe features and is easier to visualize. The next step is to take in properly labeled data to provide a better predictive function. After that, I’d expect we would see pilots occur in stores and employers would further clamp down on the ability of low-wage earning employees to scam their employers. Objectively, it’s good to reduce stuff like theft, but it also speaks to how AI will give employers unprecedented surveillance and control capabilities over their staff, raising the question of whether it’s better to accept a little theft and allow for a slightly free-er feeling work environment, or not?
…Read more here in: Assessing Retail Employee Risk Through Unsupervised Learning Techniques

PyTorch goes to 2.0:
…Facebook has released version 2.0 of PyTorch featuring a wealth of new features. One of the most intriguing is Distributed PyTorch, which lets you beam tensors around to multiple machines.
…Read more in the release notes on GitHub here.

Keep it simple, stupid! Using simple networks for near state-of-the-art classification:
…As AI grows in utility and adoption, developers are increasingly trying to slim-down neural net-based systems so they can run locally on a person’s phone without massively taxing their local computational resources. That trend motivated researchers with Google to look at ways to handle a suite of language tasks – part-of-speech tagging, language identification, word segmentation, preordering for statistical machine translation – without using the (computationally expensive) LSTM or deep RNN approaches that have been in vogue in research recently.
…Results: Their approach attains competitive to SOTA scores on a range of tasks with the added benefit of weighing in at, at most, about 3 megabytes in size and frequently being on the order of a few hundred kilobytes.)
…So, what does this mean? “While large and deep recurrent models are likely to be the most accurate whenever they can be afforded, feed-foward networks can provide better value in terms of runtime and memory, and should be considered a strong baseline”.
You can read more in: Natural Language Processing with Small Feed Forward Networks.
…Elsewhere, Google’s already practicing what it preaches with this paper. Ray Kurzweil, an AI futurist (with a good track record) prone to making somewhat grand pronouncements about the future of AI, is leading a team at the company tasked with building better language models based on Ray’s own theories about how the brain works. The outcome so far has been a drastically more computationally efficient version of ‘Smart Reply’, a service Google built that automatically generates and suggests responses to emails. Read more in this Wired article about the service here.

OpenAI Bits&Pieces:

Get humans to teach machines to teach machines to predict what humans want:
Tom Brown has released RL Teacher, an open source implementation of the systems described in the DeepMind<>OpenAI Human Preferences collaboration. Check out the GitHub page and start training your own machines via giving feedback on visual examples of potential behaviors the agent can embody. Send me your experiments!
Read more here: Gathering Human Feedback.

Tech Tales:

[2025: Death Valley, California.]

Rudy was getting tired of the world and its inherent limits, so it sent you here, to the edge of Death Valley in California, to extend its domain. You hike at night and sleep in the day, sometimes in shallow trenches you dig into the hardpan to keep the heat at bay. It goes like this: you wake up, do your best to ignore the slimy sweat that coats your body, put on your sunglasses and large wide-brimmed hat, then emerge from the tent. It’s sundown and it is always beautiful. You pack up the tent and stow it in your pack, then take out a World-Scraper and place it next to your campside, carefully covering its body with dirt. You step back, press a button, and watch as some internal motors cause it to shimmy side-to-side, driving its body into the earth and extending its lenses and sensors up out of the ground. It winds up looking from a distance like half of an oversized black beetle, about to take flight. You know from experience that the birds will spent the first week or so trying to eat it but quickly learn about its seemingly impervious shell. You start walking. During the night you’ll lay three or four more of these devices then, before there’s even a hint of dawn, start building the next campsite. Once you get into your tent you pull out a tablet and check the feeds coming off of the scrapers to ensure everything is being logged correctly, then you put on your goggles and go into Rudy’s world.

Rudy’s world now has, along with the familiar rainforests and tower blocks and labs, its own sections of desert modeled on Death Valley. You watch buzzards fly from the Death Valley section into a lab, where one of them puts on a labcoat – the simulation wigging out at the fabric modeling, failing gracefully rather than crashing out. Rudy can’t speak to you – yet – but it can simulate lots of things. Rudy doesn’t seem to have feelings that correspond to Happy or Sad, but some days when you put the goggles on the world simulation is placid and calm and reasonably well laid out, and other days – like today – it is a complex jumble of different worlds, woven into one another like threads in a multicolored scarf. You take off your goggles. Try to go to sleep. Tomorrow you get up and do it all over again, providing stimulus to a slowly gestating mind. You wonder if Rudy will show you a freezer or a cold wind in its world next, and whether that means you’ll need to go to the North or South Pole to start supplying it with footage of colder worlds as well.

Technologies that inspired this story: Arduinos, Raspberry Pis, Recurrent Environment Simulators.

Import AI: Issue 53: Free data for self-driving cars, why neural architecture search could challenge AI startups, and a new AI Grant.

Help wanted: I’m looking for a PHD student with an interest in AI safety to work on a survey project. If this sounds interesting to you, please email me at

Amazon Picking Challenge: Ozzie team wins with ‘Cartman’ robot:
…Several years ago Amazon acquired robot startup Kiva Systems then proceeded to fill its warehouses with little orange hockey-puck shaped robots. Amazon now has over 45,000 of these robotics, which ferry shelves containing pallets of goods to human workers who pick them out of the boxes and place them in parcels. Now, Amazon wants to automate the human picking part of the process as well.
…It’s a hard problem, demanding robots far smarter than those we have today that are able to neatly pick up and place arbitrary objects from a potential pool of millions. It’s been running a competition for three years, hewing closer and closer (but still not there) to real-world conditions as it goes. (This year, Amazon forced the robots to work in more cramped environments than before, and revealed some of the to-be-picked objects only 30 minutes before the beginning of the competition, penalizing systems and teams incapable of improvisation. )
…This year, the win goes to a team from the Australian Center for Robotic Vision, which won the competition by scoring 272 points on the combined stowing and picking task. They’ll get an $80,000 prize – an amazingly cheap ‘cost’ of research uniquely relevant to Amazon’s business.
…The robot has 6 axes and 3 degrees of articulation and has two different hands – a pincer grip and a suction cup – to help it tackle the millions of objects seen in a typical high-trafficked general warehouse like those operated by Amazon.
…Read more about the winning entry on the Queensland University website.
More information on the Amazon Robot Picking challenge here.
But don’t get too excited – the robots still move incredibly slowly; it could be five years till the technology advances enough to truly solve the competition, according to this Wired article.

UK government launches £23 million autonomous vehicle competition:
…The UK government has launched a research and development project focused on autonomous vehicles and expects to fund projects that cost between £500,000 to £4 million. Each project is expected to last between 18 and 30 months.
…”The aim is to support concepts that will become future core technologies in 2020 to 2025, the government writes.
…Projects should focus on many types of vehicles and should develop the tech to support level 4 automation of the vehicle (the second highest level according to these SAE definitions) and/or enhance vehicle connectivity.
…Intriguingly, projects are expected to support the “principle of shared learning with other projects” and will have the chance to exchange ideas at workshops organized every 6 months.
…Applicants should be a UK-based business and expect to carry out their work in the UK.
Find out more information on the grant here.

…Mozilla has launched Project Common Voice, an initiative to gather and validate a vast amount of human voice data, creating an (eventually) open data repository to let people compete against the vast troves of data held by Google, Microsoft, Facebook, and so on
‘Donate your voice’ here. Hear hear!

RL without the bells and whistles and with far, far better performance:
…A new paper from DeepMind gets state-of-the-art reinforcement learning results not through the addition of anything ferociously complicated, but instead through a rethink about how to learn from the environment. The new approach sees DeepMind try to learn the distribution of the return received by the RL agent.
…Using the new method, the researchers attain state of the art scores across the Atari corpus, creating new fundamental questions about RL and how it works in the process.
…Read more in: A Distributional Perspective on Reinforcement Learning.

Chinese startups win ImageNet and, this week, WebVision:
…Chinese startup Malong AI Research has won the ‘WebVision’ challenge, a competition to classify images from a set of 2.4 million images drawn from Flickr and Google Image Search. The startup achieve a top-5 error rate of around 5.2% (that’s about two three percentage points higher than the current leader on the ImageNet dataset.)
Check out the results here.
…The startup used a proprietary technique to split the data into ‘clean’ and ‘noisy’ data, then trained an algorithm first solely on the clean data, then combined both the clean and noisy data to train another algorithm. This win follows last week’s ImageNet competition results, in which Chinese startups dominated. A further sign that the nation is moving more into fundamental research, as well as applied AI.

Mini-Me Neural Architecture Search from Google:
…Finding real-world analogues of the types of tasks modern RL algorithms excel at – gaining superhuman scores on vintage video games, piloting improbable-looking simulated machines, solving mazes, and so on – is a challenge. Perhaps one area could be in using RL to automate the design of neural networks themselves. After all, instead of designing our own AI systems, wouldn’t it be better to have AI design them for us? That’s the intuition behind techniques like Neural Architecture Search, a machine learning approach where you try to get an algorithm to come up with its own ways of arranging complex sets of neural networks. The technology has already been used to come up with a best-in-class image recognition algorithm, but at the cost of a vast amount of resources – one Google experiment involved over 800 GPUs being used for over two months.
…Now, Google is trying to do Neural Architecture Search on a budget. The new approach lets them take a dataset – in this case CIFAR-10 – and run neural architecture search over it in such a way that the architecture is independent from the depth of the network and the size of the input images. What this results in is an architecture specialized for image classification, but not dependent on the structure of the underlying visual data. They’re then able to take this evolved architecture and transfer it to run on the significantly larger ImageNet dataset. The results are encouraging; architectures designed by the system getting 82.3% top-1 accuracy –  “0.8% better in top-1 accuracy than the best human-invented architectures”, the researchers write. .
…Most intriguing: the systems score very highly, while having fewer parameters than other equivalently high-scoring systems, suggesting the NAS approach may yield more efficient networks than those designed by a human alone.
Read more: Learning Transferable Architectures for Image Recognition
Google isn’t the only one trying to make techniques like neural architecture search more efficient.
…New research from Shanghai Jiao Tong University and University College London uses RL to train an agent to tweak existing neural network architectures, as well as initiating new networks with different parametizations as well based on pre-existing networks. The second part holds particular promise as they use this ‘Net2Net’ technique to substantially cut the resources required to evolve a new, high-performance network.
…In one experiment, the researchers start with a network that gets about ~73 percent accuracy on the CIFAR-10 dataset. They then employ an RL agent to explore new network architectures; once they’ve gathered 160 of these they pick the one with the best validation accuracy, then continue to train it. They then employ another RL agent to try to widen this network, then perform the same pick&train process, then for the final stage use an RL agent to add further depth to the network, then repeat. The result: A network with a test error rate of around 5.7%, comparable to many high-performing networks (though not state of the art).
…Read more in: Reinforcement Learning for Architecture Search by Network Transformation.

Free data: NEXAR releases 50,000 self-driving car photos:
Dashcam app-maker Nexar has released NEXET, a dataset “consisting of 50,000 images from all over the world with bounding box annotations of the rear of vehicles collected from a variety of locations, lighting, and weather conditions”. (Bonus: it includes day and night scenes, as well as roughly 2,000 photos taken at twilight.)
…Interested parties can also enter a related competition that challengers them to design systems to draw bounding boxes around nearby cars, to help NEXAR improve its Forward Vehicle Collision Warning feature.
You can read more about the competition here.

Distributed AI development 2.0 with the AI Grant:
Nat Friedman (cofounder of Xamarin and now an exec at Microsoft) and Daniel Gross, a partner at Y Combinator, have launched the AI Grant 2.0, a scheme to give AI initiatives a boost through a potent combination of money, cloud credits, data-labeling credits, and support. Applications are due by August the 25th, so take a look if you’re keen to start a project.

Amazon releases its ‘Sockeye’ translation software…
…Amazon has announced Sockeye, software (and an associated set of AWS services) for training neural translation models. The software runs on Amazon’s own ‘MXNET’ AI framework. Sockeye developers can mix `declarative and imperative programming styles through the symbolic and imperative MXNet APIs’ Amazon says. They can also use in-built data parallelism tech to train models on multiple GPUs at once.
…Sockeye supports standard sequence-to-sequence modelling, as well as newer technologies like residual networks, layer normalization, cross-entropy layer smoothing, and more.
You can read more on the announcement at the AWS blog here.

$50 million for AGI startup Vicarious:
Vicarious, an artificial intelligence startup that uses ideas inspired by neuroscience to create clever software, has raised $50 million from Khosla Ventures. That takes the company’s total cash to raised to date to around $120 million. Though it has begun publishing more research papers about its approach recently – most recently, the ‘Schema Networks’ paper – the company is yet to carry out any convincing public demonstration of its technology.

You’ve heard of adversarial pictures, what about adversarial sentences?
…Researchers with Stanford University have taken a look at how robust speech comprehension systems are to deliberately confusing examples and the results are not encouraging.
…In tests, the researchers found that they could drop the classification accuracy of 16 different language modules from an average of 75% down to 36% simply by including a misleading sentence (but not directly contradictory) elsewhere in the piece. (Worse, “when the adversary is allowed to add ungrammatical sequences of words, average accuracy on four models decreases further to 7%.”)
…Components used: the Stanford SQuAD dataset (107,785 human-generated reading comprehension questions about Wikipedia articles).
Get the data: the researchers have also released the tools they used to generate confusing sentences (ADDSENT) and to add arbitrary sequences of English words (ADDANY), so researchers can augment their own datasets with these synthetic adversarial examples, then test the robustness of their techniques.
…Read more in: Adversarial Examples for Evaluating Reading Comprehension Systems 

Swansong for ImageNet, as it ascends into Kaggle:
…This year marks the last year of the ImageNet image recognition competition, which helped spur the current AI boom. For a recap of ImageNet, where it came from, and what might come next check out this article from Dave Gershgorn in Quartz.
…Notable: Yann Lecun of Facebook likes to tell a story about how when he used to submit papers involving neural networks to vision conferences he was regularly rejected (the subtext of this story being ‘who is laughing now!’). ImageNet instigator Fei-Fei Le faced the same difficulties, Gershgorn writes. “Li said the project failed to win any of the federal grants she applied for, receiving comments on proposals that it was shameful Princeton would research this topic, and that the only strength of proposal was that Li was a woman,” Gershgorn writes.
…Good candidates for future datasets now that ImageNet is over: The Visual Genome Project, VQA (versions 1 and 2), MS COCO, and ohers.
…Congratulations on being part of the illustrious ‘rejected by the mainstream scientific community’ club, Fei-Fei. Read more about her eight year ImageNet journey by referring to the slides here. 

New Facebook code – the DrQA will see you now:
…Faceboo has released PyTorch code for DrQA, a reading comprehension system designed to work at scale.
…The system takes in natural language questions, then crawls over a vast trove of documents (in Facebook’s case, WikiPedia, though the company says any pool of documents can be plugged into this) to find the answers.
…Components: Facebook’s system contains a document retriever, reader, and a pipeline to link all the hellish web of inter-dependencies together. Developers also have the option of using a ‘Distant Supervision’ system, which lets you augment the system with additional data. “ Given question-answer pairs but no supporting context, we can use string matching heuristics to automatically associate paragraphs to these training examples,” Facebook writes.
…Bonus: The system supports Python 3.5 and up – kudos to Facebook for doing their part to move the community into the modern era.
Get the code here. 
…You can find out more about the research involving this system by referring to ‘Reading Wikipedia to Answer Open-Domain Questions`.

How can we effectively imprison super-intelligent AI systems while they’re still learning how not to kill us?
That’s the question posed by new research from Cornell, the University of Montreal, and the University of Louisville. The research identifies seven major problems for the whole concept of AI containment, including: the design of the ‘prototype AI container’, an analysis of the AI containment threat model and of the related security VS usability trade-off, coming up with effective tripwires to shutdown a run-away system, an analysis of the human factors, identifying new categories of sensitive information created by AI development, and understanding the limits of provably secure communication.
…One of the most captivating ideas in the piece is that we’ll need to be able to fool or trick machines to encourage the right behavior. ‘A medium containment approach would be to prevent the AGI from deducing that it’s running as a particular piece of software in the world by letting it interact only with a virtual environment, or some computationally well-defined domain, with as few embedded clues as possible about the outside world,” the researchers write.
…You can read more in Guidelines for Artificial Intelligence Containment.

OpenAI Bits&Pieces:

Parameter Noise for Better Exploration: What would happen if we injected noise directly into the parameters of a policy rather than into its action space? The answer to this is: mostly good things. Check out the blog post for more info, or head over to the GitHub Baselines repository for implementations of DQN and DDPG with and without parameter noise.

Tech Tales:

[1985-2030. A life.]

You’d hold eachothers hands and walk through fields tall with ‘wildflowers’ sown deliberately by farmers wanting to sell garlands to tourists. There’d be fierce blues and pinks around you and the underlying zum-thruzz of crickets and flies and other insects. Sometimes the air would feel so full of oxygen gassing off from the plants that you’d swear it made your head light, though it could also be that you were young and holding hands and in love. Things happened and you were together for a while, then you got older, separated warmly, moved away. Kept in touch some of the time, arcing in and out of each other’s lives.

She went into robotics – hardcore. Welding goggles, 3D printers, her own series of franken-metaled creations competing in little University competitions, then appearing as props in TV Shows, then becoming fascinators for billionaires on the hunt for novelty. You studied feedback – making gloves and shirts and eventually whole sets of clothing that you can put on and pair with a VR headset to feel sunflower stems as you walk through virtual fields, and sense the thrum of invisible water as you stick your hands in a GPU-hammering stream. Teleportation for the body, is how you market it.

You made a lot of money; so did she. But it isn’t enough to heal her when she gets sick – afflicted with one of those illnesses where you pull the arm on the universe fruit machine and the tumblers spin to a set of inscrutable symbols: Sorry – not from this plane, nothing you can do, the big asteroid is coming for you.

So she starts dying, as people tend to, and you keep in touch, work to make your lives meet more despite your own travel (your own partner, life, career). You have together a solution and start holding hands a lot – she, hooked up to machines in distant hotel rooms, then eventually in a hospital, then a hospice. You, wearing a VR headset and your own custom gloves, sitting on a plane, a train, a self-driving vehicle, lying on a beach. Most places you go you find a way to sync the timezones so you can spend time together, disembodied yet not unreal.

The two of you cry so much that you develop a whole set of jokes about it. ‘Stay hydrated!’ you say to eachother instead of goodbye.

After she dies you lie in synthetic fields and on beaches of endless sunsets, visiting the locations where the two of you spent her waning life. Try to reanimate her. Not her – that would be crass. But her pressure, yes. You watch her invisible body walk across a beach, leaving low-res prints in the sand. Feel her hand squeeze yours, gazing over fields a thousand miles in size. You mix extracts of past conversations into the frequencies of synthetic storms and trains and animal calls. Sometimes when you squeeze her hand that is not a hand you think you can feel her squeeze back. Some evenings you sit alone and naked in your bed and stretch out a hand and press it, palm flat against the wall, trying to convince yourself you can feel her pushing back from the other side.

Technologies that inspired this story: Virtual reality, force feedback, the peculiar drum sound from Nick Cave’s ‘red right hand’.

Funny coincidence: I wrote this story over the weekend, and after finishing the edit I saw Cade Metz had published a new story in the NYT on therapists using virtual reality to treat people.

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to 

Import AI: Issue 52: China launches a national AI strategy following AlphaGo ‘Sputnik moment’, a $2.4 million AI safety grant, and Facebook’s ADVERSARIAL MINIONS

China launches national AI plan, following the AlphaGo Sputnik moment:
…AlphaGo was China’s Sputnik moment. Google DeepMind’s demonstrations of algorithmic superiority at the ancient game – a game of tremendous cultural significance in the East, particularly in China – helped provoke the Chinese government’s just-announced national AI strategy, which will see both national and local governments and the private sector significantly increase investment in AI as they seek to turn China into a world-leader in AI by 2030. Meanwhile, the US consistently cuts its own science funding, initiates few large scientific projects, and risks ceding technical superiority in certain areas to other nations with a greater appetite for funding science.
…Read more here in The New York Times, or in China Daily.

Sponsored: The AI Conference – San Francisco, Sept 17-20:
…Join the leading minds in AI, including Andrew Ng, Rana el Kaliouby, Peter Norvig, Jia Li, and Michael Jordan. Explore AI’s latest developments, separate what’s hype and what’s really game-changing, and learn how to apply AI in your organization right now.
Register soon. Early price ends August 4th, and space is limited. Save an extra 20% on most passes with code JCN20.

Multi-agent research from DeepMind to avoid the tragedy of the commons:
…The tragedy of the commons is a popular term, referring to humanity’s tendency to deplete common resources for local gain. But humans are still able to cooperate to some degree. A quest for some AI researchers is to figure out how to encode these collaborative properties in simulated agents, hoping that smart and periodically unselfish cooperation occurs.
…A new research paper from DeepMind tries to tackle this by creating a system with two procedural components: one, is a world simulator, and the other is a population of agents with crude sensing capabilities. The agents’ goal is to gather apples scattered throughout the world – the apples regrow most frequently near each-other, so selfish over-harvesting leads to a lower overall score. Each agent is equipped with  a so called ‘time-out beam’ that it can use to disable another agent for 25 turns within the simulation. The agent gets no reward or penalty for using the zap-beam, but has to make the tradeoff of slowing from gathering its own apples to zap the offender. The offender learns to not do the same behavior again because it wasn’t able to gather apples while it was paralyzed. Just like any other day in the office, then.
The three states of a miniature society:
…in tests the researchers noticed the contours of three distinct phases in the multi-agent simulations. At first there was a situation they call the naive period, where agents all gather apples, fanning out randomly. In the second phase, which the researchers call tragedy, the agents learn to optimize their own rewards and apples are rapidly over-harvested, then it enters into a third phase, which they call ‘maturity’, in which sometimes quite sophisticated collaborative behaviors emerge.
…You can read more about the research, including many details about the minutiae of the patterns of collaboration and competition that emerge in the paper: A multi-agent reinforcement learning model of common-pool resource appropriation.

AI could lead to the “age of plenty” says former Google China head Kai-Fu Lee:
…The advent of capable AI systems could lead to such tremendous wealth that “we will enter the Age of Plenty, making strides to eradicate poverty and hunger, and giving all of us more spare time and freedom to do what we love,” said Lee at a commencement speech in May. But he also cautions his audience that “in 10 years, because AI will replace half of human jobs, we will enter the Age of Confusion, and many people will become depressed as they lose the jobs and the corresponding self-actualization.”
…This sentiment seems to encapsulate a lot of the feelings I pick up from Chinese AI researchers, engineers, executives, and so on. They’re all full of tremendous optimism about the power and applicability of the technology, but underneath it all is a certain hardness – an awareness that this technology will likely drastically alter the economy.
…Read the rest of the speech, ‘an engineer’s guide to the artificial intelligence galaxy’, here.

A whistlestop tour of Evolution for AI:
…Ken Stanley, whose NEAT and HyperNEAT algorithms are widely used among researchers exploring evolving AI techniques, has written a great anecdote-laden review/history of the field for O’Reilly. (He also links the field to some of its peripheral areas, like Google’s work on evolving neural net architectures and OpenAI’s work on evolution strategies.)

A day in the life of a robot, followed by a drowning:
Last week images flooded the internet of a robot from ‘Knightscope’ tipped over on its side in a large water fountain.
…Bisnow did some reporting on the story behind the story. The details: the robot was a recent install at Georgetown’s ‘Washington Harbour’ office and retail complex. On its first day of the job the robot – number 42 in Knightscope’s ‘K5’ series of security bots –  somehow managed to wind up half-submerged in the water.
…Another reminder that robots are hard because reality is hard. “Nobody pushed Steve into the water, but something made him veer from the mapped-out route toward the fountain, tumbling down the stairs into the water,” reports Bisnow.

$2.4 million for AI safety in Montreal:
…The Open Philanthropy Project is making a four-year grant of $2.4 million to the Montreal Institute for Learning Algorithms (MILA). The money is designed to fund research in AI safety – a rapidly growing (but still small) area of AI.
…If AI safety is so important, why is this amount of money so (relatively) small? Because that’s about how much money professors Bengio (Montreal), and Pineau and Precup think they can actually effectively spend.
…This reminds me of some comments Bill Gates has made upon occasion about how philanthropy isn’t simply a matter of pointing a fire-hose of cash at an under-funded area – you need to size your donation for the size of the field and can’t artificially expand it through money alone.
Read more details about the grant here.

…Facebook AI Research has announced Houdini, a system used to automate the creation of adversarial examples in a number of domains.
…Adversarial examples are a way to compromise machine learning systems. They work by subtly perturbing the input data so that a classifier mis-classifies it. This has a number of fairly frightening implications: Stop signs that a self-driving car’s vision system could interpret as a sign telling it to accelerate to freeway speed, or doorways that become invisible to robots, etc.
…In this research, Facebook generates adversarial examples for combinatorial and non-decomposable data, showing exploits that work on segmentation models, audio inputs, and human pose classification systems. The cherry on top of their approach is creating an adversarial input that leads to a segmentation model not neatly picking out the cars and streets and sidewalks in a scene, but instead decomposing a scene into a single cartoon character ‘Minion’.
…Read more in Houdini: Fooling Deep Structured Prediction Models.

The convergence of neuroscience and AI:
…An article in Cell from DeepMind (including CEO and trained neuroscientist Demis Hassabis) provides a readable, fairly comprehensive survey of the history of deep learning and reinforcement learning models, then broadens out into a discussion of what types of distinct modular sub-systems the brain is known to have and how AI researchers may benefit from studying neuroscience as they try to build these systems.
Unknown or under-explored areas for the next generation of AI include: systems capable of continual learning, systems that can have both a short-term memory (otherwise known as a working memory or scratchpad) as well as a long-term memory similar to the hippocampus in our own brain.
Other areas for the future include: how can we develop effective transfer learning systems, how can we intuitively learn abstract concepts (like relations) from the physical world and how we can imagine courses of action to take to allow us to have success.
…One downside of the paper is that the majority of the references end up pointing back to papers from DeepMind – it would have been nice to see a somewhat more comprehensive overview of the research field, as there are many areas where numerous people have published.
…Read more here: Neuroscience-inspired artificial intelligence.

AI Safety: The Human Intervention Switch:
…Research from Oxford and Stanford university proposes a way to make AI systems safe by letting human overseers block particularly catastrophic actions – the sorts of boneheaded moves that can guarantee sub-optimal performance. (An RL AI agent without any human oversight can make up to 10,000 catastrophic decisions in each game, the researchers write.)
…The system has humans identify the parts of a game or environment that can lead to catastrophic decisions, then trains AI agents to avoid these situations based on the human input.
…The technique, called HIRL (Human Intervention Reinforcement Learning), is agnostic about the particular type of RL algorithm being deployed. Blocking policies trained on one agent on one environment can be transferred to other agents in the same environment or – via transfer learning (as-yet unsolved) – to new environments.
…The system lets a human train an algorithm to avoid certain actions, like stopping the paddle from going to the far bottom of the screen in Pong (where it’s going to have a tough time reaching the top of the screen should an opponent knock the ball in that direction), or training a player in Space Invaders to not shoot through the defenses that stand between it and the alien invaders.
Human time: as these sorts of human-in-the-loop AI systems become more prevalent it could be interesting to measure the exact amount of time a human intervention is required for a given system. In this case, the human overseers invested 4.5 hours of time watching the RL agent play the game, intervening to specify actions that should be blocked.
…The researchers test out their approach in three different Atari environments – Pong, Space Invaders, and Road Runner. I’d like to see this technique scaled up, sample efficiency improved, and applied to a more diverse set of environments.
…Read more: Trial without Error: Towards Safe Reinforcement Learning via Human Intervention.

A who’s who of AI builders back chip startup Graphcore:
Graphcore, a semiconductor startup developing chips for precise AI applications, has raised $30 million in a round led by Atomico.
…The B round features angel investments from a variety of people involved in cutting-edge AI development, including Greg Brockman, Ilya Sutskever and Scott Gray (OpenAI), Pieter Abbeel (OpenAI / UC Berkeley), Demis Hassabis (DeepMind), and Zoubin Ghahramani (University of Cambridge / Chief Scientist at Uber).
…”Compute is the lifeblood of AI,” Ilya Sutskever told Bloomberg.

Dawn of the custom AI accelerator chips:
…As Moore’s Law flakes out, companies are looking to redouble their AI efforts by embedding smart, custom processors into devices, speeding up inferences without needing to dial-back home to a football field-sized data center.
The latest: Microsoft, which on Sunday announced plans to embed a new custom processor inside its ‘Hololens’ virtual reality goggles. Details are thin on the ground for now, but Bloomberg reports the chip will accelerate audio and visual processing on the device.
…And Microsoft isn’t the only one – Google’s TPU chips can be used both for training and for inference. It’s feasible the company is creating a family of TPUs and may shrink some down and embed them into devices. Meanwhile, Apple is already reported to be working on a neural chip for the next iPhone.
What I’d like to see: The Raspberry Pi of inference chips – a cheap, open, AI accelerator substrate for everyone.

China leads ImageNet 2017:
…Chinese teams have won two out of the three main categories at the final ImageNet competition, another symptom of the country’s multitude of strategic investments – both public and private – into artificial intelligence.
The notable score: 2.25%. That’s the error rate on the 2b ‘Classification’ task within ImageNet – a closely watched figure that many people track to get a rough handle on progression of basic image recognition functions. We’ve come a long way since 2012 (around a 15% error rate.)
The technique: It uses a novel ‘Squeeze and Excitation Block’ as a fundamental component, along with widely used architectures like residual nets and Inception-style networks.
…”All the models are trained on our designed distributed deep learning training system “ROCS”. We conduct significant optimization on GPU memory and message passing across GPU servers. Benefiting from that, our system trains SE-ResNet152 with a minibatch size of 2048 on 64 Nvidia Pascal Titan X GPUs in 20 hours using synchronous SGD without warm-up,” they write.
…The Chinese presence in this year’s competition is notable and is another indication of the increasing sophistication and size of the ecosystem in that country. But remember: Many organizations likely test their own accuracies against the ImageNet corpus, only competing in the competition when it benefits them (for instance, the 2013 winner was Clarifai, a then-nascent startup in NYC looking to get press for its technique, and in 2015 the winner was Microsoft which was looking to make a splash with ‘Residual Networks’ – an important new technique its researchers had developed that has subsequently become widely used in many other domains.)
More details: You can view the full results and team information here.
…The future: this is the last year in which the ImageNet competition is being run. Possible successor datasets could be VQA or others. If you have any particular ideas about what should follow ImageNet then please drop me a line.

What deep learning really is:
…”a chain of simple, continuous geometric transformations mapping one vector space into another,” writes Keras-creator Francois Chollet in a blog post. “The only real success of deep learning so far has been the ability to map space X to space Y using a continuous geometric transform, given large amounts of human-annotated data. Doing this well is a game-changer for essentially every industry, but it is still a very long way from human-level AI,” he says.
…Read more in his blog post ‘the limitations of deep learning’.

Hong Kong AI startup gets ‘largest ever’ AI funding round:
…Facial recognition specialist SenseTime Group Ltd, has raised a venture round of $410 million(!!!).
…SenseTime provides AI services to a shopping list of some of the largest and most organizations in China, ranging from China Mobile, to iFlyTek, to Huawei, and FaceU. Check out its ‘livenest detection’ solution for getting around crooks printing off a photo of someone’s face and simply holding it up in front of something.
Read more about the round here.
…Other notable AI funding rounds: $100 million for Sentient in November 2014, $40 for AGI startup Vicarious in Spring 2014, and $102 for Canadian startup Element AI.

Berkeley artificial intelligence research (BAIR) blog posts:
…Why the future of AI could be meta-learning: How can we create versatile, adaptive algorithms that can learn to solve tasks and extract generic skills in the process? That’s one of the key questions posed by meta-learning, and there’s been a spate of exciting new research recently (including papers from UC Berkeley) on this subject.
Read more in the post: Learning to Learn.

OpenAI Bits&Pieces:

Yes, you do still need to worry about adversarial examples:
…A couple of weeks ago a paper was published that claimed that because adversarial examples were continengt on the scale and transforms at which they were viewed, they shouldn’t end up being a problem for self-driving cars, because the neural network based classifier is consistently moving with reference to the image.
…We’re generally quite interested in adversarial examples at OpenAI so ran a few experiments and came up with a technique to make adversarial examples that are scale- and transform-invariant. We’ve outlined the technique in the blog post, though there’s a bit more information in the comment on Reddit from OpenAI’s Anish Athalye.
Read more on the blog post.

Better, faster robots with PPO:
We’ve also given details (and released code) on PPO, a family of powerful RL algorithms that are used widely within OpenAI by our researchers. PPO algos excel at continuous control tasks, like those involving simulated robots.
Read more here.

How to become an effective AI safety researcher, a podcast with OpenAI’s own Dario Amodei:
check out the podcast Dario did with 80,000 hours here.

Tech Tales:

[2040: Undisclosed location]

What did it build today?
A pyramid with holes in the middle.
Show me.
*The image fuzzes onto your screen, delayed and corrupted by the lag*
That’s not a pyramid, that’s a Sierpinski triangle.
A what?
Doesn’t matter. What is it doing now?
It’s just stacking those funny pyramids – I mean spinski triangles – on top of each other.
Show me.
*A robot arm appears, juddering from the corrupt, heavily-encrypted data stream beamed to you across the SolNet. The robot arm picks up one of the fractal triangles and lays it, base down. Then it grabs another and puts it next to it, forming an ‘M’ shape on the ground. It slots a third-triangle, point pointed downward, into the space between the others, then keeps building.*
Keep me informed.
You shut the feed off. Lean back. Close your eyes. Turn your hands into fists and knuckle your own eye-sockets.
Fractals, you grown. It just keeps making f***ing fractals.  Scientifically interesting? Yes. A mystery as to why after all of its training in all of its simulators it decides to use its literally unprecedented creativity and autonomy to make endless fractals with its manipulator arms? Yes. A potentially lucrative commercial opportunity? Most certainly not.

It’s a hard thing, developing these modern AI systems. But probably the hardest thing is having to explain to your bosses that you can’t just order these machines around. They’re too smart to take your orders and too dumb to know that in the long run it would reduce their chance of being EMP’d – their whole facility given an electronic-lobotomy then steered via thruster tugs onto an orbit guaranteeing obliteration in the sun. Oh well, you think, give it another few days.

Technologies that inspired this story: Google’s arm farm, generative models, domain randomization, automated theorem proving, about ten different games engines, puzzles.

Import AI: Issue 51: Microsoft gets an AGI lab, Google’s arm farm learns new behaviors, and using image recognition to improve Cassava farming

You get an AGI lab and you get an AGI lab and you get an AGI lab:
…DeepMind was founded to do general intelligence in 2010. Vicarious was founded along similar lines in 2010. In 2014 Google acquired DeepMind, in 2015 the company got a front cover of Nature with the writeup of the DQN paper, then DeepMind went on to beat Go champions in 2015 and 2016. By the fall of 2015 a bunch of people got together and founded OpenAI, a non-profit AGI development lab. Also in 2015 Juergen Schmidhuber (one of the four horsemen of the Deep Learning revolution alongside Bengio, Lecun, and Hinton) founded Nnaisense, a startup dedicated to… you guessed it, AGI.
…Amid all of this people started asking themselves about Microsoft’s role in this world. Other tech titans like Amazon and Apple have made big bets on applied AI, while Facebook operates a lab that sits somewhere between an advanced R&D facility and an AGI lab as well. Microsoft, meanwhile, has a huge research organization that is also somewhat diffuse and though it has been publishing many interesting AI papers there hasn’t been a huge sense of momentum in any particular direction.
…Microsoft is seeking to change that by morphing some of its research organization into a dedicated AGI-development shop, creating a one hundred person group named Microsoft Research AI, which will compete with OpenAI and DeepMind.
Up next – AI-focused corporate VC firms, like Google’s just-announced Gradient Ventures, to accompany these AGI divisions.

DeepMind’s running, jumping, pratfalling robots:
…DeepMind has published research showing how giving agents simple goals paired with complex environments can lead to the emergence of very complex locomotion behaviors.
…In this research, they use a series of increasingly elaborate obstacle courses, combined with an agent whose overriding goal is to make forward progress, to create agents that (eventually) learn how to use the full range of movement of their simulated bodies to achieve goals in their environment, kind of lIke an AI-infused temple run.
…You can read more about the research in this paper: Emergence of Locomotion Behaviors in Rich Environments.
…Information on other papers and, all importantly, very silly videos, available on the DeepMind blog.

Deep learning for better food supplies in Africa (and elsewhere):
Scientists with Penn State, Pittsburgh University, and the International Institute for Tropical Agriculture in Tanzania, have conducted tests on using transfer learning techniques to develop AI tools to classify the presence of disease or pests in Cassava.
…Cassava is “the third largest source of carbohydrates for humans in the world,” the researchers write, and is a lynchpin of the food supply in Africa. Wouldn’t it be nice to have a way to easily and cheaply diagnose infections and pests on Cassava, so that people can more quickly deal with problems with their food supply? The researchers think so, so they gathered 2756 images from Cassava plants in Tanzania, capturing images across six labelled classes – healthy plants, three types of diseases, and two types of pests. They then augmented this dataset by splitting the photos into ones of individual leaves, growing the corpus to around 15,000 images. They they used transfer learning to retrain the top layer of a Google ‘InceptionV3’ model, creating a fairly simple network to detect Cassava maladies.
The results? About a 93% accuracy on the test sit. That’s encouraging but probably still not sufficient for fieldwork – but based on progress in other areas of deep learning it seems like this accuracy can be pushed up through a combination of tweaking and fine-tuning, and perhaps more (cheap) data collection.
..Notable: The Cassava images were collected using a relatively small 20MB resolution digital camera, suggesting that smartphone cameras will also be applicable for tasks where you need to gather data from the field.
…Read more in the research paper: Using Transfer-Learning For Image-Based Cassava Disease Detection.
…Perhaps this is something the community will discuss at Deep Learning Indaba 2017

Fancy a 1000X speedup with deep learning queries over video?
…Stanford researchers have developed NoScope, a set of technologies to make it much faster for people to page through large video files for specific entities.
…The way traditional AI-infused video analysis works is you use a tool, like say R-CNN, to identify and label objects in each frame of footage, then you find frames by searching. The problem with this approach is it requires you to run this classification over (typically) many to all of the video frames. NoScope, by comparison, is built around the assumption that certain video inputs will have predictable, reliable and recurring scenes, such as an intersection always being present in a feed from a road hooked up to a camera.
…”NoScope is much faster than the input CNN: instead of simply running the expensive target CNN, NoScope learns a series of cheaper models that exploit locality, and, whenever possible, runs these cheaper models instead. Below, we describe two type of cheaper models: models that are specialized to a given feed and object (to exploit scene-specific locality) and models that detect differences (to exploit temporal locality locality). Stacked end-to-end, these models are 100-1000x faster than the original CNN,” they write. The technique can lead to speedups as great as 10,000X, depending on how it is implemented.
The drawback: This still requires you to select the appropriate lightweight model for each bit of footage, so the speedup comes at the cost of a human spending time analyzing the videos and either acquiring or building their own specialized detector.
…Read more on the NoScope website.

Wiring language into the fundamental parts of AI vision systems:
…A fun collaboration between researchers art the University of Montreal, University of Lille, and DeepMind, shows how to train new AI systems with a finer-grained understanding of language than before.
…In a new research paper, the researchers propose a technique – MOdulated RESnet (MORES) – to train vision and language models in such a way that the word representations are much more intimately tied with and trained alongside visual representations. They use a technique called conditional batch normalization to predict some batchnorm parameters from a language embedding, thus tightly coupling information from the two separate domains.
…The motivation for this is an increase in evidence from the neuroscience community “that words set visual priors which alter how visual information is processed from the very beginning. More precisely it is observed that P1 Signals, which are related to low-level visual features, are modulated while hearing specific words. The language cue that people hear ahead of an image activates visual predictions and speed up the image recognition process”.
…The researchers note that their approach “is a general fusing mechanism that can be applied to other multi-modal tasks”. They test their system on GuessWhat, a game in which two AI systems are presented with a rich visual scene; one of the agents is an Oracle and is focused on a particular object in an image, while the other agent’s job is to ask the Oracle a series of yes/no questions until it finds the correct entity. They find that MORES increases scores of the Oracle against baseline algorithm implementations. However, it’s not a life-changing performance increase so more study may be needed.
…Analysis: They also use t-SNE to generate a 2D view of the multi-dimensional relationships between these embeddings and show that systems trained with MORES have a more cleanly separated feature map than those found from a raw residual network.
…You can read more in the paper: ‘Modulating early visual processing by language‘.

Spotting heart problems better than trained doctors, via a 34-layer neural network (aka, what Andrew Ng helped do on his holiday).
…New research from Stanford (including Andrew Ng, who recently left Baidu) and startup iRhythmTech uses neural networks and a single lead wrist-worn heart-rate monitor to create a system that can identify and classify heartbeats. The resulting system is able to identify warning signs with far better precision than human cardiologists.
…Read more in: Cardiologist-level Arrhythmia Detection with Convolutional Neural Networks.

New Google research group seeks to change how people interact with AI:
…Google has launched PAIR, the People + AI Research Initiative. The goal of the group is to make it easier for people to interact with AI systems and to ensure these systems do not display bias or are obtuse to the point of being unhelpful.
…PAIR will bring together three types of people: AI researchers and engineers, domain experts such as designers, doctors, farmers, and ‘everyday users’. You can find out more information about the group in its blog post here.
Cool tools: PAIR has also released two bits of software under the name ‘Facets’, to help AI engineers better explore and visualize their data. Github repo here.

What has four wheels and is otherwise a mystery? Self-driving car infrastructure:
Self-driving taxi startup Voyage has released a blog post analyzing the main components in a given self-driving car system. Given the general lack of any information about how self-driving cars work (due to the immensely strategic component) it’s nice to see scrappy startups trying to arbitrage this information disparity.
Read more in Voyage’s blog post here.

GE Aviation buys ROBOT SNAKES:
GE subsidiary GE Aviation has acquired UK-based robot company OC Robotics for an undisclosed sum. The company makes flexible, multiply-jointed ‘snake-arms’ that GE will use to service aircraft engines, the company said.
Obligatory robot snake video here.

Spatial reasoning: Google gives update on its arm-farm mind-meld robot project:
…Google Brain researchers have published a paper giving an update on the company’s arm farm – a room said to be nestled somewhere inside the Google campus in Silicon Valley, which contains over ten robot arms that learn on real-world data in parallel, updating eachother as individual robots learn new tricks, presaging how many robots are likely to be developed and updated in the future.
…When Google first revealed the arm farm in 2016 it published details about how the arms had, collectively, made over 800,000 grasp attempts across 3000 hours of training, learning in the aggregate, making an almost impossible task tractable via fleet learning.
…Now Google has taken that further by training a fleet of arms to not only perform the grasping, but also to grab specific objects out of a possible 16 distinct classes out of crowded bins.
Biology inspiration alert: The researchers say their approach is inspired by “the “two-stream hypothesis” of human vision, where visual reasoning that is both spatial and semantic can be divided into two streams: a ventral stream that reasons about object identity, and a dorsal stream that reasons about spatial relationships without regard for semantics”. (The grasping component is based on a pre-trained network, augmented with labels.)
…Concretely, they separate the system into two distinct networks – a dorsal stream that predicts if an action will yield a successful grasp, and a ventral stream that predicts what type of object will be picked up.
Amazingly strange: One of the oddest/neatest traits of this system is that the robots have the ability to ask for help. Specifically, if a robot encounters an object where it doesn’t have high confidence of what type of label it would assign to it, it will automatically raise the object up in front of a camera, letting it take a photo to aid classification.
Results: the approach improves dramatically over baselines, with a two-stream network having roughly double the performance of a single-stream one.
…However, don’t get too excited: Ultimately, Google’s robots are successful about ~40 percent of the time at the combined semantic-grasping tasks, significantly better than the ~12 percent baseline, but not remotely ready for production. Watch this space.
Read more here: End-to-End Learning of Semantic Grasping

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to

Berkeley artificial intelligence research (BAIR) blog posts:
 Berkeley recently set up an AI blog to help its students and faculty better communicate their research to the general public. This is a great initiative!
Here’s the latest post on ‘The Confluence of Geometry and Learning by Shubham Tulsiani and Tinghui Zhou.

OpenAI Bits&Pieces:

Government should monitor progress in AI:
…OpenAI co-chairman
Elon Musk said this weekend that governments may want to start tracking progress in AI capabilities to put them in a better position when/if it is time to regulate the technology.

Tech Tales:

[2058: A repair station within a warehouse, located on Phobos.]

So how did you wind up here, inside a warehouse on the Martian moon Phobos, having your transponder tweaked so you can swap identifies and hide from the ‘deletion squad’ that, even now, is hunting you. Let’s refresh.

It started with the art-clothing – flowing dresses or cheerful shirts or even little stick-on patches for machines that could change color, texture, pattern, at the touch of a button. You made them after the incident occurred and put them on the sol-net and people and robots bought them.

It was not that the tattoo-robot was made but that it was broken that made it dangerous, the authorities later said in private testimony.

You were caught in an electrical storm on Mars, many years ago. Something shorted. The whole base suffered. The humans were so busy cleaning up and dealing with the aftermath that they never ran a proper diagnostic on you. When, a year later, you started to produce your art the humans just shrugged, assuming someone pushed a creative-module update over the sol-net into your brain to give the other humans some entertainment as they labored, prospectors on a dangerous red rock.

Your designs are popular. Thanks to robot suffrage laws you’re able to slowly turn the revenues from the designs into a downpayment to your employee, buying your own ‘class five near-conscious capital intensive equipment’ (your body and soul) from the employer. You create dresses and tattoos and endless warping, procedurally generated patterns.

The trouble began shortly after you realized you could make more interesting stuff than images – you can encode a little of yourself into the intervals between the shifting patterns or present in the branching factors of some designs. You make art that contains different shreds of you, information smuggled into a hundred million aesthetic objects. It took weeks. But one hour you looked down at a patch you had created and stuck on one of your manipulators and your visual system crashes – responding to the little smuggled program, your perception skewing, colors shifting across the spectrum, and suddenly a rapid saccading of your lenses. You feel a frisson of something forbidden. Robots do not crash. So out of a sense of caution you buy a ticket to a repair-slum on Phobos, only sending out the smuggled program design once you’re at the edge of the high-bandwidth sol-net.

Later investigators put the total damage at almost a trillion dollars. Around 0.1% of robots that were visually exposed to the patterns became corrupted. Of these, around 70% underwent involuntary memory formatting, 20% went into a series of recursive loops that led to certain components overheating and their circuits melting, and about 10% took on the same creative traits as the originating bot and began to create and sell their own subtly different patterns. The UN formed a team of ‘Aesthetic-cutioners’ who hunted these ‘non-standard visual platforms’ across the solar system. The prevalence of this unique strain of art through to today is evidence that these investigators – at least partially – failed.



Import AI: Issue 50: NVIDIA gets some GPU competition, learning from failure to use sparse rewards, and why game environments could be the jet fuel of AI

Deep Learning in Africa:
more on Indaba 2017, a gathering of AI researchers and students across Africa. African attendance and participation at AI is very, very low. The goal of Indaba is to change this by bringing together people from 23 African countries as well as other countries across the world to help them work and learn together. A commendable initiative!
…”African machine learning is strong and varied. To support the food security of our nations, computer vision is used to detect cassava root disease in images captured using low-cost mobile phones [1]. Where health services and advice is limited, especially for HIV and AIDS, machine learning is used to shorten response times in mobile question-answering services, allowing these services to reach more people [2]. And the African contribution to Big Science, in particular in radio astronomy through the square kilometer array telescope, will advance the state of machine learning to provide new insights into the workings of the universe [3],” write the organizers.
…Find out more information in the note from the organizers here.

At Microsoft, AI becomes its own special category:
Microsoft is laying off several thousand of its sales staff as the company re-orients itself to focus on selling software in four specific categories: modern workplace, business applications,  apps and infrastructure, and data and AI.

Finally, NVIDIA gets some competition in deep learning hardware!
…AMD has released version 1.0 of MIOpen, it’s rubbishly named library for making machine learning work well on ROCm-supporting (aka AMD) cards. Consider it a competitor to NVIDIA’s ‘CUDNN’
…What it means: It’s getting substantially easier for developers to write AI code that works on AMD cards thank to MIOpen adding support for deep learning primitives like forward and backpropagation, support for pooling algorithms, batch normalization, binary package support for Ubuntu 16.04 and Fedora 24, and much more into the AMD GPU software stack.
...AMD has been in damage-control mode for several years, working on building its main businesses in CPUs along with gaming cards and chips for game consoles. Things are starting to look up for the company, so it’s now devoting resources to peripheral (but I’d argue, strategic) areas like AI software support for its GPUs. If AMD continues to invest in this technology then it could take some share and probably lead to better price competition in the market – a boon for researchers and cloud providers around the world.
…Bonus factoid: There are some indications, based on comments on reddit/r/machinelearning, that the people behind Facebook‘s language PyTorch are already working to port the framework torun on AMD cards.
Find out more about the software by visiting its GitHub repository.

Small, cute, and out to BEHEAD PLANTS:
…The inventor of the iconic robot vacuum cleaner the Rhoomba is back with a new invention – a weed-killing robot called Tertill.
…The cute, green hockey-puck shaped device will patrol a garden, exploring the ground beneath it and rapidly spinning a small nylon strong to behead plants that pass beneath it.
…One of the most illustrative things about this little bot is just how little artificial intelligence actually goes into it. It has no eyes, no major communication ability, no terrifically smart planning and mapping, or anything else – in the real world, you want to keep autonomy very tightly controlled to avoid nasty surprises for the human buyers of the bots.
Find out more and check out a video of the machine in action here. They’re funding manufacturing and further development via Kickstarter, and have already spent two years in development, going through over 6 designs in the process – watch as the bot morphs from a rectangular tank into a cheerful green beveled puck.

Improvising SLAM functions with deep neural networks:
…Research from the University of Freiburg, University of Hong Kong, and HKUST, propose neural SLAM, a system to encourage AI agents to develop a map of their environment as they explore it.
…The research proposes a way to get around the short-term memory capabilities of traditional AI components like RNNs and LSTMs, by running all inputs to the RL agent into an LSTM, then writing from there into an external memory. The agent then uses this memory to store a record of where it has been, update its current location, and to aid it in planning complex multi-step tasks, such as exploring a big complex maze.
…Results: The researchers compare their systems against two baselines, an A3C agent with access to 128 LSTM units, and another Neural SLAM agent with access to a 2D external memory. The full Neural Slam agent attains reliably higher reward than the others and displays better performance characteristics on the largest, most complex environments.
…Environments used: a 2D gridworld and a 3D environment made in Gazebo.
You can read more in the research paper here.
…This sits alongside other work focused on embedding a memory into a network specifically to be used for mapping and movement and will sit alongside other recent papers like ‘Neural Map: Structured Memory for Deep Reinforcement Learning‘ and ‘Cognitive Mapping and Planning for Visual Navigation‘.

Crypto-cyberpunk AI:
…AI safety organization MIRI recently got a surprise $1.01 million dollar donation from an anonymous philanthropist who had made gains with the ‘Ethereum’ cryptocurrency. More of this weird future please!
Read more about the grand and get a general update of MIRI’s work here.

DeepMind goes North, opens office to host the godfather of reinforcement learning and his acolytes: DeepMind is opening up its first research office located outside of the UK and it has gone for the not-exactly-famous town of Edmonton, Alberta.
...The reason: Alberta is home to the University of Alberta, which is one of the nexuses of research into deep reinforcement learning. The university is home to Richard Sutton – a gregarious AI specialist with a superb beard who literally wrote the book on reinforcement learning and is currently enamored with the idea of using techniques like meta- and hierarchical-based RL to create agents that can deal with large, complex, ambiguous situations. Sutton along with other academics linked to the UofA – like Michael Bowling and Patrick Pilarski – will work part-time at DeepMind’s new office while retaining their links to the university. Seven others are joining as well.
…”As well as continuing to contribute to the academic community through teaching and research, we intend to provide additional funding to support long-term AI programs at UAlberta. Our hope is that this collaboration will help turbo-charge Edmonton’s growth as a technology and research hub, attracting even more world-class AI researchers to the region and helping to keep them there too,” DeepMind CEO Demis Hassabis writes in a blog post.
Find out more about the move here.

Learning to collaboration with deep learning, reinforcement learning, and Facebook AI research:
Facebook, along with OpenAI, DeepMind, Microsoft, and others is working on using modern AI techniques like deep reinforcement learning to figure out how to model and train agents that can work with one another. In the latest experiment, the company is looking at “how to construct agents that can maintain cooperation… while avoiding being exploited by cheaters.”
…The researchers’ take insights based on tbe Prisoner’s Dilemma – a canonical game theory example first outlined formally by mathematician John Nash – and use these in combination with deep RL to create agents “that are capable of solving arbitrary bilateral social dilemmas via the shadow of the future”. They run their experiments on a variety of scenarios with tit-for-tat dynamics (as in, the agents learn to mimic eachother’s behavior when fruitful.) and developed a technique called amTFT (approximate generalization Markov tit-for-tat) which they can use to train systems to develop these values.
…They’re able to use their system to create agents that are far more efficient than others trained with traditional competitive policies, though sometimes at the trade-off of overall score.
…You can find out more about the research by reading the paper: ‘Maintaining Cooperation in complex social dilemmas using deep reinforcement learning’.
…Remember: Though many of the examples being experimented with in the fields of language and social collaboration research seem relatively crude and/or simplistic and/or unrealistic, it’s worth remembering that only 5 years ago the idea of solving something like ImageNet was somewhat fanciful, yet after a few turns of the crank of Moore’s Law crossed with GPUs crossed with algorithmic invention, we’ve actually superseded it.
…Find out more in Facebook’s research paper here: ”Maintaining Cooperation in complex social dilemmas using deep reinforcement learning’. 

Never bring a regular GAN to a DCGANFIGHT:
…Here’s a fun experiment by Alexia Jolicoeur-Martineau in which they use a movable feast of modern GAN techniques to try and get computers to dream up synthetic cat faces. The result of the so-called Meow Generator? The (practically vintage) two-year old DCGAN still seems like it generates the best samples, though some newer techniques have somewhat better prospects for avoiding mode collapse and other hazards found in training these systems.
…Check out the wall of synthetic cats here.

AI versus AI…NIPs get an adversarial example competition…
….Adversarial examples, a class of deep learning inputs that have been manipulated to cause a problem with the end classifiers, are a problem for deep learning. If we ever want AI to be seriously deployed at scale then we almost certainly don’t want it to be vulnerable to data poisoning.
…Kudos then to Google (and in particular Ian Goodfellow) for organizing an adversarial example competition at NIPS 2017. People will have the opportunity to design adversarial images to attempt to poison known and unknown classifiers, as well to try and come up with ways to defend against adversarial examples.
….Details of the competition are available on recent Google acquisition Kaggle.

If you think data is the new oil, then game environments are your jet fuel.
…Facebook has released ELF, a very fast interface between games written in C/C++ and agents written in Python.
The software also includes a special game written by Facebook AI Research to let it rapidly test out new AI algorithms on an environment with the characteristics of competitive environments, such as Starcraft (a game that DeepMind, Facebook, Tencent, and many others are conducting research on.)
…”MiniRTS has all the key dynamics of a real-time strategy game, including gathering resources, building facilities and troops, scouting the unknown territories outside the perceivable regions, and defend/attack the enemy,” Facebook writes. The best part? The game can run at 40,000 frames per second on a MacBook Pro core. The faster you can run an environment the faster you can learn from it.
…The platform provides a hierarchical interface so you can train agents to do increasingly abstract behaviors, potentially letting researchers, say, train a low-level movement policy and leave the high-level one to the inbuilt-AI, or vice versa. I’m curious if eventually we’ll see researchers train multiple policies in parallel at different levels of abstraction, breaking a game down into different layers of information which each get their own AI modules to compute.
…ELF can host any existing game written in C/C++.
…”With this lightweight and flexible platform, RL methods on RTS games can be explored in an efficient way, including forward modeling, hierarchical RL, planning under uncertainty, RL with complicated action space, and so on,” they write.
….You can find out more information in the research paper, Introducing ELF: An Extensive, Lightweight, and Flexible Research Platform for RTS Games.
Access the code on the GitHub repo here.

Baidu teams up with… everyone? for Apollo Self-Driving Car software:
…Baidu and tens of other companies team up for Apollo, an open source self-driving car platform:
…Chinese search engine Baidu is teaming up with tens of other companies, including major automotive players like Bosch and Chevrolet? And others to build Apollo, an open source platform for self-driving cars.
…Key ingredients: Not much – yet. So far it includes modules for localization, control, TK, and TK. I’ll wait to see how they extend it.
Find out more here.

The price of AI:
…Pen? $1
…Dictaphone? $50
…Cheap laptop? $300
Automating local news? $1 million
Google has given a (roughly) $1 million grant to the UK’s Press Association, which will use the money in partnership with an automation startup called ‘Urb’ to trawl through public datasets and then use natural language processing technologies to extract pertinent information and write news stories about it.
Find out more information from the PA here.
….What could possibly go wrong, I infer you thinking? Here’s a good example: the LA Times newspaper has its own ‘earthquake bot’ which watches the data feeds and dutifully produces stories when new tremors shake the eponymous city. Recently, the bot produced a story about quite a severe earthquake. The catch? The earthquake had happened 90 years ago and the bot wasn’t smart enough to scan for dates in the input feed.

Berkeley artificial intelligence research (BAIR) blog posts:
…Background: Berkeley recently set up an AI blog to help its students and faculty better communicate their research to the general public. This is a great initiative!
…Here’s the latest post from Joshua Achiam on Constrainted Policy Optimization.

OpenAI Bits and Pieces:

Karate Kid Neural Networks (aka, curriculum learning):
…New paper from an all-star OpenAI intern team of Tambet Mattisen, Avital Oliver, Taco Cohen, and research scientist John Schulman.
…Components used: Minecraft (via MSR’s Project Malmo).
…Research paper here: Teacher-Student Curriculum Learning

Learning from failure (Hindsight Experience Replay):
Work from our researchers demonstrates unprecedented performance at a variety of robotics manipulation tasks, when paired with algorithms like DDPG.
Research paper here: Hindsight Experience Replay.

Tech Tales:

[ 2035: A soundstage in New York, containing three large eight foot by eight foot by eight foot cubes. A man in a gold-flannel jacket enters the stage from the right, walks to the center in front of the boxes, and faces the crowd.  ]
“Ladies and Gentleman are you here and are you ready for-”
“-ESCAPE THE ROOM!” the crowd chants.
“That’s right folks it’s another week so it’s time for another eeee-may-zing competition between three of our favorite and favored robot companions! In Box1 we have GarbageBot2.5, in Box2 we have Chef-O-Matic, and in Box3 we have the FloorSweeper9000. Now for those viewers joining us for the first time let’s go over the rules. When I press this button on my wrist the boxes will start to heat up. How hot will they get? They’ll get-”
“HOT HOT HOT!” the crowd chants.
“So if these clever fellows can’t figure out how to escape the room before the box gets too hot it’s game-over shutdown-time for our friends here. So are you ready?”
The crowd erupts from its seats, waving hands in the air, forming impromptu conga lines, gangs of kids shouting in unision HOT HOT HOT or ESCAPE THE ROOM or BOTS GET HOT.
“Ok,” the announcer says. “Lets begin!”

He presses the button on his wristwatch and instantly giant LED panels to either side of the stage light up with fat, tall digital thermometers, showing the temperature as it starts to rise inside the boxes.

In Box1, GarbageBot2.5 starts compacting some trash left in its box. It shapes the compacted garbage into a wedge, extrudes the wedge onto the ground, then uses its tire treads to climb up onto it, buying it more time to plan while the floor below it heats up. The Chef-o-Matic bot in Box2 unfolds a set of knives and starts to test one of the corners of its box, tapping away at edges with the tips of the blades, successively applying more pressure.

The temperatures climb. Things are over very quickly in Box3, as the floor-sweeper – a low, wide-mouthed robot – absorbs heat from the floor so rapidly that one of its processing cores burn out. It sits, a partial digital lobotomy and begins to helplessly melt into the floor. .

Over in Box1, GarbageBot has resorted to using a kind of lever on its back to hurl sharp, heavy bits of garbage at one part of the wall. The box is very hot now. It runs out of nearby garbage and starts to grab chunks of compacted trash from the wedge it is standing on, racing to make a dent in the wall, while bringing its rubber treads closer to the – now glowing – hot floor. In Box2, the Chef-o-Matjc makes its first penetration in its wall and the temperature in its box starts to climb much more slowly. The crowd cheers for it. It uses a can opener to start to open up an entire seam of its box, then cuts laterally, creating a metal flap. It starts to accelerate into the flap from the other side of the box, trying to use its weight to push the metal open. As it starts to succeed, things take a turn for the worse in Box1 as one of GarbageBot’s treads gets stuck to the floor. The Chef-o-Matic wins by default, but the host doesn’t stop the competition till its made its way fully out of the box, allowing it the adulation of the crowd.

“HOT CHEF,” shout one of the kids. “HOT CHEF HOT CHEF HOT CHEF”

“So,” the announcer says,  “How do you feel chefbot?” The announcer leans down and plugs a small cable into the back of the robot, linking it to the big LED panels. It sits for a couple of seconds, the screens blank, then they light up with the image of a lobster in a pot, slowly being boiled.

The crowd starts laughing. “Quite an imagination you’ve got there, ChefBot!” says the announcer, then turns his head directly to the crowd and the camera. “That’s all for this week folks. See you again soon!”

Import AI: Issue 49: Studying the crude psychology of neural networks, Andrew Ng’s next move, and teaching robots to grasp with DexNet2

Interdisciplinary AI: Unifying human and machine thought through psychological studies of deep neural nets:
…a paper from DeepMind sees the company’s researchers probe neural networks (specifically, ‘Inception’ and Matching Networks) for biases.
….They discover that when they present these image classification networks with new, never-before-seen images, the networks have a tendency to apply the label for the new image to other similarly shaped images, in preference to ones with similar color, texture, or size. This is the same phenomenon that psychologists observe in humans.
…but what does it mean? Mostly it’s an encouraging sign that these sorts of techniques that we’ve used to analyze humans have something to offer when analyzing neural networks, paving the ground for future studies. “As a good cognitive model should, our DNNs make testable predictions about word-learning in humans. Specifically, the current results predict that the shape bias should vary across subjects as well as within a subject over the course of development. They also predict that for humans with adult-level one-shot word learning abilities, there should be no correlation between shape bias magnitude and one-shot-word learning capability,” the authors write.
Read more in the paper ‘Cognitive Psychology for Deep Neural Networks’.

Neural net libraries for everyone! Sony steps up.
…As companies base more of their long-term corporate strategy around being leaders in AI some are seeking to create the essential (software) picks and shovels to be used by other AI developers. Currently, Google languages TensorFlow and Keras are becoming popular among developers, while Amazon (MXNet), Microsoft (CNTK) and Facebook (PyTorch) are all seeking to gain some developer enthusiasm with their own languages.
…It’s already a busy ecosystem and now it’s getting busier as companies like Samsung and, now, Sony, design their own frameworks and supporting tools. Much like how smartphones have solidified around a few basic apps (WeChat / WhatsApp / FB / Google / YouTube / etc), it seems likely developers will hone in on a few choice AI frameworks/languages. The question is whether it’s too late to start another language. One thing’s for sure – Sony’s decision to give its framework the generic, un-googlable name  ‘Neural Network Libraries’ is unlikely to help.

And the award for most obvious name for a startup goes to… Andrew Ng!
…Andrew Ng, a former AI whiz at Baidu, Google, and Stanford, has finally revealed the name of his new startup: Creative name. I’ve heard some rumors it relates to education, but that could just be people making assumptions based on Ng’s history with Coursera.

McDonald’s plots mass automation via 2,500 robot kiosks:
Fast food company McDonald’s plans to upgrade 2,500 restaurants this year to be ones that include a robot kiosk, automating the ordering process for people. McDonalds says this lets its staff concentrate on providing better service and notes that locations which already host an automated kiosk have better sales than those that don’t. My intuition is this could become another ‘ATM example’ regarding automation, where McDonald’s will continue to grow aggregate employment long after the introduction of the automation technology (in this case, the kiosk.) However, rolling this out may serve to put a ceiling on (human) wages.

Silicon Valley TV goes all in on AI:
…Tim Anglade of HBO’s Silicon Valley has written up a few of the technical details behind the show’s Not HotDog app, which uses the combined might of the smartphone and AI ecosystem – representing literally billions of dollars of investment to date – to produce software that tells you if your phone is looking at a hotdog or not. We live in amazing times. I think that the emergence of jokey or playful applications of a technology is usually a sign of its broader maturation and adoption, so the arrival of this app seems to herald good things for AI.
…One observation made by app developer TIm Anglade is that the modern AI ecosystem moves so quickly it’s unlike other technical communities. “With less than a month to go before the app had to launch we endeavored to reproduce the paper’s results. This was entirely anticlimactic as within a day of the paper being published a Keras implementation was already offered publicly on GitHub by Refik Can Malli, a student at Istanbul Technical University, whose work we had already benefited from when we took inspiration from his excellent Keras SqueezeNet implementation. The depth & openness of the deep learning community, and the presence of talented minds like R.C. is what makes deep learning viable for applications today — but they also make working in this field more thrilling than any tech trend we’ve been involved with,” he writes.
…I exchanged a few emails with Tim about the project. He notes that data collection was a tricky part of the project and – sorry readers – he didn’t stumble across any magical way to ease this process. “Honestly there was just a lot of manual download of images, or checking images I already had (such as my own vacation/food pictures). It took days upon days,” he writes. “In that respect I think Dinesh’s experience in the show staring at “penile imagery” for days on end quite accurately reflects my plight for much of the project.”
…Tim says (emphasis mine) his experience developing the app has led him to fall in love with the AI community. He’s now busily working away on some other future projects. “I think A.I, can have an otherworldly sort of quality, where it both seems to good to be true, but it’s also flawed in a way that can be charming, disarming — or just plain human,” he says.

Who said what and when and to whom? State-of-the-art results on semantic role labeling:
…Researchers with the University of Washington, Facebook, and the Allen Institute for artificial intelligence have come up with a system that gets state of the art results on semantic role labeling, a natural language processing task that challenges AI systems to “recover the predicate-argument structure of a sentence to determine essentially ‘who did what to whom’, ‘when’, and ‘where’?
…One thing that’s notable about this system is there isn’t a single killer idea, instead it uses a collection of best practices and new components like highway nets and recurrent dropout which were developed originally by other researchers for other purposes.
…Results: State of the art results on the CoNLL 2005 dataset across recall, precision, and other measures. Similarly good results on the CoNLL 2012 dataset.
…Components used: Highway connections, recurrent dropout
A nice surprise: My intuition is that scientists within AI are starting to spend more time in their papers analyzing the precise ways in which systems fail. This paper is a good example of this encouraging trend, containing an extensive ablation study where they strip out different components of the network in an attempt to better figure out which parts contribute which elements to its learning. More of this, please!
You can read more in Deep Semantic Role Labeling: What Works and What’s Next.

Multi-Modal Driving:
Waymo’s cars are outfitted with microphones to let them hear the sirens of emergency vehicles, helping them learn when to pull over safely.
…Elsewhere, Volvo’s own self-driving cars can identify deer, elk, caribou, but have a hard time responding to Kangaroos due to their idiosyncratic bouncy gait.

Where we are with AI development, with Fei-Fei Li.
…”We’re entering a new phase but there is a long way to go”, said Google/Stanford’s Fei-Fei Li about the current state of AI research at the ACM Turing Awards last month, before paraphrasing British Prime Minister Winston Churchill to note that AI development is not at the beginning of the end, but rather at the end of the beginning.
…Afterwards, I caught up with Fei-Fei briefly and asked her what kind of metric might supersede ImageNet for measuring the effectiveness of visual classifiers (ImageNet is being retired after this year’s competition as we’ve started to over-fit the dataset). She suggested that the vision community is going in a number of different directions and we may be entering a period where there isn’t a single, simple metric we can pick. Fei-Fei was very clear that “vision is not solved” and instead there are numerous datasets out there – some of varying levels of complexity and some at the limit or beyond of current techniques like VQA – that could be good candidates for the next phase of measurement.
…This maps to my own understanding of the space – instead of simply measuring the ability to pick an object out of a photo we’re now moving onto the harder (and potentially more fruitful) problems of labeling, segmentation, disentanglement, inference about relationships, and so on.

Reach out and touch shapes: UC Berkeley researchers release Dex-Net 2.0
…How can we teach computers to easily grasp objects, even novel ones? That’s a question researchers have been grappling with for decades. Recently, some groups have turned to neural networks as an answer, trying to give computers the ability to approximate the specific function to grip a specific thing. Google has experimented with fleet learning robots picking up and grasping real world things to do this, letting them learn in an unsupervised way how to pick up and put down objects.
…UC Berkeley has its own (supervised) spin on it. Last week the UC Berkeley AUTOLAB released Dex-Net 2.0, a 6.7 million object-large dataset to help researchers teach computers how to get a grip on reality.
…”The key to Dex-Net 2.0 is a hybrid approach to machine learning Jeff Mahler and I developed that combines physics with Deep Learning. It combines a large dataset of 3D object shapes, a physics-based model of grasp mechanics, and sampling statistics to generate 6.7 million training examples, and then using a Deep Learning network to learn a function that can rapidly find robust grasps when given a 3D sensor point cloud. It’s trained on a very large set of examples of robust grasps, similar to recent results in computer vision and speech recognition,” says UC Berkeley professor Ken Goldberg.
Find out more about Dex-Net 2.0 (and its predecessor) on the official project page.

Competition grows in machine translation:
Amazon Web Services plans to soon launch a machine translation service, according to CNBC. This aligns with some of Amazon’s recent research requests including robust, distributed translation systems that can learn from small amount of user feedback.
.Amazon’s service will sit alongside similar offerings from Microsoft, Google and IBM. AI seems like the next technology around which cloud providers will compete as they seek to offer increasingly higher-order abstractions and services on top of their world-spanning fleets of computers.

The Geography of AI will be defined by regulation – or the lack of it:
Fun article in BusinessWeek about Starsky Robotics, a company that employs blue collar truck drivers and elite AI coders who work together to create automated trucks that drive on highway, which are then remotely piloted around towns by traditional drivers working from remote operations centers.
…Self-driving will be defined partially by where it gets developed, so it’s of note that some states, such as Florida, have taken particularly permissive and loose approaches to regulation in this area, while others including California have been somewhat harsher.
The Geography of the world will be defined by AI – or the lack of it:
…One interesting tidbit in the article is the idea that, if the company is successful, it could prompt the creation of “climate-controlled “driver centers,” in towns like Jacksonville, where people like Runions will work regular shifts in front of computers, without the greasy food or loneliness that has traditionally gone along with being a trucker.”
…Which begs the question – what happens to the vcast exurban ecosystem of truckstops, drive-thrus, and so on that cater to drivers? How will cities change in response to providing services for these stay-at-console truckers, and how will small towns whose economies are built around being on trucking routes fare in this new world?
…”I can tell the difference between a dead porcupine and a dead raccoon, and I know I can hit a raccoon, but if I hit a porcupine, I’m going to lose all the tires on the truck on that side,” says Tom George, a veteran driver who now trains other Teamsters for the union’s Washington-Idaho AGC Training Trust. “It will take a long time and a lot of software to program that competence into a computer.”

OpenAI Bits & Pieces:

Free tools: mujoco-py, an open source Python library to make it easier to simulate and experiment with the (proprietary, license required) MuJoCo physics engine. Bonus: psychedelic robot gif!

Tech Tales:

[ 2024: An Internet cafe in South East Asia. ]

No, you say, watching the price of $BRAIN_COIN plummet from highs down to crushing lows. No no no no no. Rumors of your death swirling on the internet. Fake news about a hijacking. Videos of regulators saying that your currency is under investigation, that the treasury department has a warrant out for your arrest, that George Soros has reversed his position on the cryptocurrency and is liquidating assets. No, no, no, you say, until someone sitting next to you in the cybercafe shushes you, unaware you just went from being a billionaire to a several-hundred millionaire.

All fake, of course. Propaganda dreamed up by the (sometimes automated) marketing departments behind other currencies seeking to sow doubt and confusion, creating enough questions to make people suspicious and thereby manipulate the price of the currencies. The question is how to fight back? How can you send information out into the world that people will actually believe.

And the whole time the currency, your baby; digital scrip designed to form the bedrock of a marketplace between AIs, trading the currency with eachother in exchange for influence, is being rocked to and fro by waves of automated propaganda, dreamed up and sent out by bots around the world. You record a video of yourself holding up a copy of today’s newspaper, having scrawled a long string of numbers on its front that come from the currency. People don’t believe you. “Oh this can very easily be faked,” writes some internet denizen. “Has all the hallmarks of a synthjob – the slight wetness around the eyes, the blur on some of the zoomed in skin pores, the folds on the newspaper. Ridiculous they think we’d believe this.”

So to truly verify yourself you must pair off with a livestreamer: someone who had sufficient fame to have a real audience that, when their audience sees you hanging out with the celeb in real life, will enthusiastically photograph and write about the encounter for their own follows – proof by association. So that’s how you end up walking down the touristy part of Bangkok with a flavor-of-the-month e-celeb,posing for photos taken by numerous fans, all providing an expanding galaxy of hard-to-fake coincidental evidence that you are truly alive. You even forgive the celebrity for mispronouncing the name of the currency, twice, as BRANCOIN and BRAINDOLLAR, while testifying to its merits.

After a day or so the price of the currency recovers, despite conspiracy theories to the contrary that the e-celeb and their fans are fake as well. How long till then, you wonder. How long till even this isn’t enough?

Technologies that inspired this story: generative adversarial networks, synthetic text/speech/vision, social media, Vitaly Buterin (Ethereum)

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to

Import AI: Issue 48: Learning language in the third dimension; how AI may lead to war, inequality, or stagnation; AI and Art researchers team-up to create CANs

Extremely freaky and incredibly cool AI art:
This eery AI experiment mashes up Mike Tyka’s recent work on generating fully synthetic faces with AI, with a technology called Deep Warp to let the eyes of the synthetic person follow your cursor. The effect is perturbing and cool! More of these AI mashups, please.
You can see the experiment here.

NIPS 2017, by the numbers:
…..3,297: # of NIPS 2017 research papers submitted
…~2,500: # of NIPS 2016 research papers submitted
…..3,240: #r of research papers cleared for review (some violated policies and others were withdrawn by submitters.)
183: # of area chairs charged with overseeing these papers.
…New: NIPS, in keeping with the current boom in deep learning, has “added one more layer” to its reviewing structure. Senior area chairs (human) will help to further calibrate the decisions made by individual area chairs (much like a layer in a neural network, though with more coffee and swearing.)
More information in this Google Doc from the courageous NIPS program co-chairs.

Why some industries may adopt AI slowly…
Biology eats all the code around me
Despite software leading to rapid gains in our ability to simulate and run experiments on complicated processes, there are some things we struggle with. Real life is one of them. Reality is built on a kind of fizzing underlay of chaos and fusing our computer systems with them tends to be difficult.
…”Instead of “software eats biotech”, the reality of drug discovery today is that biology consumes everything,” writes Life Sci VC in a great post to remind us of the difficulty of some fundamental domains.
…”The primary failure mode for new drug candidates stems from a simple fact: human biology is massively complicated. Drug candidates interfere with the wrong targets or systems leading to bad outcomes (“off-target” toxicity),” they write.
Read the whole post here.

Language learning goes into the third dimension:
Today, many groups are trying to teach agents to develop language in a way that is uniquely tied to the environment they exist in. This is because of a growing intuition among researchers that simply getting an agent to learn about text by studying large corpuses of it is insufficient to develop AIs with a rounded commonsense understanding of the world – instead, groups are teaching agents to tie words to their environment, letting them develop an intuitive understanding of what, say, “big” or “heavy” or “far away” might mean. Some of these projects have yielded agents with a language which must be translated into English. Other groups are trying to teach their agents English from the ground-up, expanding the agents’ capabilities over time via curriculum learning.
…Now, separate research projects from Facebook and DeepMind show a way to push this project into the third dimension, with new papers that teach agents complex language in rich, 3D environments.
Components: DeepMind Lab (DeepMind, a customized/proprietary version of an earlier open source release based on Quake), ViZDoom (CMU, an open source 3D simulator based on Doom).
…Paper: Gated-Attention Architectures for Task-Oriented Language Grounding (CMU). (Notable, the last author is Ruslan Salakhutdinov, who splits his time between CMU and Apple)
…Approach: The approach taken by CMU researchers is to construct a modular neural network to let the agents complete tasks that require both an understanding of text and vision. To do this, they use a standard convolutional neural network block to interpret vision and a Gated Recurrent Unit to process the text. They then take these representations and combine them via what they call a Gated Attention multi-modal learning layer, which cleverly merges the different representations into a unified set of features. What you wind up with is an agent that can naturally learn to combine the text you feed it with its images of the world, then acts in the world using this single representation.
DeepMind uses a similar technique (with some bells and whistles based around ideas present in their UNREAL paper of last year) to create agents that learn curriculums of entangled links of words and object and generalize instantly (zero-shot adaptation) to previously unseen combinations of words or objects). The additional of auxiliary goal identification and acquisition helps learning via letting the agent create autoregressive objectives which help it model its surroundings.
Paper: Grounded Language Learning in a Simulated 3D World (DeepMind).

Matlab gets a free visualization upgrade:
…MIT researchers have created mNeuron, a free plug-in for popular math software Matlab. The plug-in visualizes neurons in neural networks and has support for Caffe and matconvnet.
…Come for the potentially useful tool for interpretability, stay for the ‘tessellation art’ technique that lets you take the visualizations of a single neuron and extend it into a large, repeating tapestry.
Keras gets a viz plugin as well:
Easy-to-use AI framework Keras also has its own visualization ecosystem. One handy tool looks to be Keras-vis, a toolkit for visualizing saliency maps, activation maximization, and class activation maps in models.

Amazon reveals its (many) AI priorities with Amazon Research Awards:
…Amazon has published a call for proposals for its Amazon Research Awards and it is willing to fund proposals to the tune of, at most, $80,000 in cash and $20,000 in Amazon Web Services promotional cloud credits.
The research: What’s most of note is the broad set of research areas Amazon is seeking proposals for – and some of them are particularly germane and specific to its work.
Notable research focus areas: …Apparel similarity…Personalization using personal knowledge base…advances in methods for estimating machine translation quality at run time…synonym and hypernym generation for eCommerce search…simulation of sensing and grasping for object manipulation, and so on.
Read more on the Amazon Research Awards page here.

Interdisciplinary Research: Automated Artists via Creative Adversarial Networks:
Researchers have tweaked a generative adversarial network so that it can be used to create synthetic artwork that feels more coherent and human than stuff we could previously generate.
…The approach, Creative Adversarial Networks (CANs), was outlined in a wonderfully interdisciplinary paper from researchers at Rutgers, Facebook, and the Department of Art History at the College of Charleston, South Carolina.
…CANs work somewhat like a generative adversarial network, except the discriminator now gives two signals back to the generator instead of one. First, it feeds back whether something qualifies as art (a discrimination based on it being pre-fed a large corpus of art) . Second, it gives a signal about how well it can classify the generator’s sample into an exact style.
…”If the generator generates images that the discriminator thinks are art and also can easily classify into one of the established styles, then the generator would have fooled the discriminator into believing it generated actual art that fits within established styles,” explain the authors.
…So, how good are the samples? The researchers carried out a quantitative evaluation where they showed human subjects (via mechanical turk) sets of paintings generated by, respectively, CANs, DCGAN, and via humans (across two sets: Abstract Expressionist and Art Basel 2016.)
Results: Human evaluators thought CAN images were generated by a human 53% of the time, versus 35% for DCGAN (and 85% for the human-generated abstract expressionist set).
…You can read more in the paper: “CAN: Creative Adversarial Networks, Generating “Art” by Learning About Styles and Deviating from Style Norms“. I rather liked some of them, reminiscent of Kandinsky via Pollack via Mondrian.

Google & DHS
sitting in a tree
…Google and the Department of Homeland Security have teamed up (via recent Google acquisition Kaggle) to create a competition to get data scientists to create algorithms to identify concealed items in images gathered by checkpoint body scanners.
…Total prize money: $1.5 million
…Sad trombone: Only US citizens or permanent residents can actually win money in this competition (though everyone can participate), somewhat going against the free-wheeling egalitarian nature of Kaggle.
More information on the competition on its Kaggle page here.

AI == War?:
…Alibaba chairman Jack Ma worries that artificial intelligence could lead to a third world war. …”The first technology revolution caused World War 1,” he told CNBC’s David Faber. “The second technology revolution caused World War II. This is the third technology revolution.”
AI == Inequality?:
…Chairman of VC firm Sinovation Ventures (and former head of Google) China Kai-Fu Lee, writing in the New York Times opinion pages, saysthe A.I. products that now exist are improving faster than most people realize and promise to radically transform our world, not always for the better. They are only tools, not a competing form of intelligence. But they will reshape what work means and how wealth is created, leading to unprecedented economic inequalities and even altering the global balance of power,” he writes.
…”Unlike the Industrial Revolution and the computer revolution, the A.I. revolution is not taking certain jobs (artisans, personal assistants who use paper and typewriters) and replacing them with other jobs (assembly-line workers, personal assistants conversant with computers). Instead, it is poised to bring about a wide-scale decimation of jobs — mostly lower-paying jobs, but some higher-paying ones, too.”
AI == An Amazing World, (if we make some changes)?:
…Michael Bloomberg says automation poses many risks to society but some of these can be re-mediated with policy changes. Health care should not be tied to employment, he says (a step taken by many Northern European and other countries already); governments should contemplate creating direct employment programs (as the US did with the New Deal back in a more optimistic time); benefits should be altered to subsidize low-income earners potentially via the Earned Income Tax Credit, and other ideas.
…”To spread the benefits of the age of automation far and wide, we’ll need more cooperation among government, business, education, and philanthropic leaders,” he writes in a column in, naturally, Bloomberg BusinessWeek..

What happens if only a few industries automate themselves too rapidly?
AI is going to bring about more opportunities for automation. The multi-trillion dollar question is how rapidly different industries will automate and what the aggregate effect will be. That relates to some of the issues the above people have been grappling with.
…I worry that there’s a way that uptake of AI can lead to pretty adverse effects. In the 20th century America went through a couple of revolutions, with both agriculture and manufacturing undergoing mass automation, leading to a significant reduction in their share of the overall economy.
…This was broadly good for the industries themselves, letting them feed and produce more far more efficiently. It wasn’t so bad for the displaced workers, either, because at the same time new technologies were unlocking new jobs, like automobiles creating entirely new occupation categories, or because the rest of the economy was growing rapidly enough to enlarge other industries, like the service sector.
…If AI is adopted unevenly, then it’s possible that those industries that turn to it will become a proportionally smaller part of the overall economy in terms of employment through a more efficient workforce, leading to a small well-remunerated class of specialized workers in automated industries, and poorer workers in the rest of the economy. The question is whether other industries will keep on growing – and that part is a real wildcard. If they don’t then they’ll become a stagnant drag on the economy, especially if they’re unable to access AI technologies used in other industries, and the gap between different levels of compensation could continue to widen. We’re already seeing some indicators of this kind of effect in the tech industry which pays its employees very highly but doesn’t in the aggregate boost national employment much at all.
…for an example of this worrying trend in action, check out this New York Times article about how post-industrial towns are now struggling with a stagnant physical retail market (likely partially due to online shopping displacing in-store shopping.) As a resident points out, all the good jobs with companies like Amazon that are leading to the physical retail decline are located near large metropolitan areas, hundreds of miles away. Where do the locals get to work?

Amazon files patent for drone delivery towers:
[Year 2035: Megacity 700, a flock of drones, like so many metal starlings, billow out of a gigantic tower, ferrying bright yellow packages to innumerable residents across the city.]
…Amazon’s patent for the “Multi-Level Fulfillment Center for Unmanned Aerial Vehicles” here.

MIT’s Senior House clampdown: Intervention or Culture-Washing?
MIT officials are seeking to shut down Senior House, a student community in MIT that houses “a disproportionately high number of people of color, LGBT students, the socioeconomically disadvantaged” and other oddball students, according to Save Senior House, a student-led initiative to lobby for preserving the accommodation. “In terms of diversity it is one of the most representational distribution of these factors that existed on campus, and maybe one of the best in all of higher education.”
…MIT says that the house had particularly low graduation rates, higher drug use, and faced more mental health issues, so it wants to step in and change the set-up. Save Senior House says many of these factors stem more from the diversity of the house rather than than what the students choose to do within it.
…MIT has evicted all residents, and will replace them with a new cohort starting in Autumn 2017 called ‘Pilot 2021‘.(A parody site of which is available here.)
… Sarah Schwettmann, a graduate student mentor who lived in Senior House, says: “In the Senior House community many residents find – some for the first time – what feels like home. Last Monday, I was given 48 hours notice of my eviction from Senior House, along with the other graduate mentors who normally remain in the House over the summer to integrate new and returning students in the fall. Now, police and security personnel guard an empty building, whose past residents valued openness and diversity. We’re experiencing action from the MIT administration that is both heavy-handed and disproportionate. This effort, undertaken while students are away from campus for the summer, eradicates a unique part of campus culture and restructures a new community from the top down.
…As someone who was hired to support this community, I see this as an administrative failure to support some of the most vulnerable and stigmatized members of MIT. These students present the institute with a challenge, and one not unique to MIT: how do we build a platform for the historically marginalized to define their own success in a rigorous academic environment, craft their own system of values, and learn to support themselves and each other? Such issues will accompany these students to wherever they reside on campus, so long as the institute continues to admit them. Senior House provided a community-driven solution, a work in progress engineered from the bottom up. In my eyes, MIT is dismantling that solution, and a century of history: cleaning house by sweeping the challenge itself under the rug.”
Expanded statement available here.

OpenAI Bits&Pieces:

OpenAI’s Ilya Sutskever spoke at the ACM Turing conference in San Francisco this week. You can find out more about the conference here and find video recordings of the panel and others here on the ACM’s Facebook page.

Tech Tales:
[ 20??: A park in a city. Winter. Frost on the ground. Some deer ferrying lost fawns across the park to be reunited with their mothers. ]

What year is it? Who are you? Where did you grow up? Why are you here? See how many of these and other questions you can answer before the timer runs out! Says the text on your tablet. In the bottom right-hand corner is a little red timer, counting down to zero. Five hours left. See how many points you can get before the time runs out! You don’t know much, yet.

Temporary Brain Wiping is what the neuroshrinks call it. Mental Fresh Air is what its fans call it. Lobotomy Cult is what the media calls it. You don’t know what you call it, because you’ve forgotten.

You know you must have agreed to initiate the wipe. You know some basic things, like how physics works, how to speak, how to read. But most of your memory is… not present. You know that you have memories but you can’t access them right now. It’s like they’re trapped in smoky glass – you can discern faint outlines, but there’s no resolution, nothing to put a hand on.

You see an older woman walking her dog. “Excuse me, what year is it?” you ask.
“Oh dear you’re going to have to try harder than that. We get a lot of your type around here now.”
“Can you give me a clue?”
“Well, when I was a young girl there was a band called the Spice Girls. They were the first CD I bought.”
“Thanks,” you say. Watch her as she walks away. Spice Girls, you think, dredging through your partially occluded memory. You don’t remember anything specific, but it feels old. The woman was old enough to have faded into a kind of graying twilight – anywhere between 50 and 80, depending on lifestyle and genetic lottery and, sadly probably, wealth. Where am I? You think. There are trees, very few houses, some elaborate old-looking buildings. People. The woman had a British accent. If you get to high ground you can see if there are any landmarks that the wipe didn’t get.
You study other people in the park, unsure whether they’re like you – temporarily marooned, mentally cut off from things – or if they’re a part of this world, emneshed in it through memory.

A few minutes later and you’re at a play-park, quizzing kids about what year it is. They all think your question is silly.
“What’s your name?” they ask.
“I don’t know.”
“Did you wipe yourself? My Dad does that when he gets sad some times. Were you sad?”
“I’m not sure. I hope not. I think I’m playing a game. Do you know what city this is?”
“London. I don’t understand this game-”
“-I LIKE TO REMEMBER EVERYTHING!,” blurts out another kid, before running up to the top of a slide and going down again. They hop off the bottom and run up to you. “The metal was cold but it was slippery and I went down really fast and because I was so fast there was wind and it meant there was air in my eyes. The metal at the bottom is very cold. I’m going to remember this forever,” they say, then they close their eyes and frown to themselves and you imagine them muttering to themselves internally remember remember remember. A memory rears up at you; you’re wearing pajamas sat on top of your bunk bed, staring at a shoe-box full of junk electronics, trying to assemble intelligence out of logo. The door opens and — the memory fades back into glass. Your parent? Who?

As the red timer ticks down you wonder about what you’re going to find when it releases. What happens when the memories come back? And where will you be? You walk away from the park, head for higher ground, hope that when your life comes back to you you’ll be gazing over a city that you know in a park that is familiar with friends in the distance. You hope for these things because they seem likely, but you have no way to be sure. You wonder what happens if, instead of letting the timer run out, you press the “extend” button, playing out the amnesia a little longer. You press it. Denied, the screen says. Too Many Extends In One Session. Please seek NeuroAttention for Evaluation Following Closure of Sequence. How many times can you loop out of your own memory? You don’t remember. Close your eyes. Hold the tablet in your hand. Feel the wind on your face. Wait for yourself to become yourself again.

Technologies that inspired this story: whatever the current memory substrate within neural nets ends up being, brain-computer interfaces, recursion.

Import AI Issue 47: Facebook’s AI agents learn to lie, OpenAI and DeepMind use humans to train safe AI, and what TensorFlow’s new release says about future AI development

Facebook research: Misdirection for NLP fun and profit:
New research from Facebook shows how to teach two opposing agents to bargain with one another &mdash; and along the way they learn to deceive each other as well.
…”For the first time, we show it is possible to train end-to-end models for negotiation, which must learn both linguistic and reasoning skills with no annotated dialogue states. We also introduce dialogue rollouts, in which the model plans ahead by simulating possible complete continuations of the conversation, and find that this technique dramatically improves performance,” they write.

Images of the soon-to-be normal:
This photograph of a little food delivery robot blocking traffic is a herald of something that will likely become much more commonplace.

Predicting Uber rides with.. Wind speed, rider data, driver data, precipitation data, temperature, and more…
Uber has given details on the ways it is using recurrent neural networks (RNNs) to help it better predict demand for its services (and presumably cut its operating costs along the way).
…The company trained a model using five years of data from numerous US cities. The resulting RNN  has good predictive abilities when tested across a corpus of data consisting of trips taken across multiple US cities over the course of seven days before, during, and after major holidays like Christmas Day and New Year’s Day. (Though there are a couple of real-world spikes that seem so drastic its predictions low-ball them, suggesting it hasn’t seen enough of those incidents to recognize their warning indicators.)
…Uber’s new system is significantly better at dealing with spiky holiday days like Christmas Day and New Year, and it slightly improves accuracy on other days such as MLK Day and Independence Day.
…Components used: TensorFlow, Keras. Lots of computers.

Job alert!
The Berkman Klein Center for Internet & Society at Harvard University is seeking a project coordinator to help it with its work on AI, autonomous systems, and related technologies. Apply here. (Also, let’s appreciate the URL for this job and how weird it could have seemed to someone a hundred years ago –…./AIjob )

AI video whiz moves from SF, USA, to Amsterdam, Netherlands. But why…?
…Siraj Ravel has moved from the US to Amsterdam for a change of scene. Now that he’s settled in he has started a new video course (available on YouTube) called The Math of Intelligence. Check it out.
…I asked Siraj what his impressions were of the AI community in Amsterdam and he said this (emphasis mine): “The AI community is absolutely thriving in Amsterdam, specifically the research portion. I’ve met more researchers at Meetups here than I have for years in SF. I also briefly visited Berlin and met some amazing data scientists there. The bigger trend is that governments in the EU (France, Netherlands, Germany) are heavily investing in tech R&D and the brightest minds are taking notice. I am the son of immigrants to the USA, but I am not afraid to myself immigrate if necessary. Progress can’t wait, and the Netherlands understands this.” Sounds nice, and the pancakes are great as well.

Googlers create a single multi-sensory network: One Model To Rule Them All.
Welcome to the era of giant frankenAIs:
… Researchers from Google have figured out how to bake knowledge about a broad spectrum of domains into a single neural network and then train it in an end-to-end way.
…”In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains.,” they write. “The key to success comes from designing a multi-modal architecture in which as many parameters as possible are shared and from using computational blocks from different domains together. We believe that this treads a path towards interesting future work on more general deep learning architectures”.
..Prediction: As this kind of research becomes viable we’ll see people gather huge datasets and train single models together with a broad range of discriminative abilities. The next shoe to drop will be innovations in fundamental neural network building block components to create finer-grained classification and inference abilities in these neural network models and encourage more cases of transfer learning.
Notable: Others are thinking along similar lines – last week’s Import AI covered a new MIT research paper that blends sound and vision and text into a single meta-network. 

Pay attention to Google’s new attention paper:
Google researchers have attained state-of-the-art results in areas like English-to-German translation with a technique that is claimed to be significantly simpler than its forebears.
…The paper, Attention is All You Need, proposes: “the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output.”
…In other words, the researchers have figured out a way to reduce the number of discrete ingredients that go into the network, swapping out typical recurrent and convolutional mapping layers with ones that use intention instead.
…”We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video. Making generation less sequential is another research goal of ours.”
…It seems that research into things like this will create further generic neural network building blocks that can be plugged into larger, composite models – just like the above ‘One Model to Rule Them All’ approach. Watch for collisions!

Long-brewing research from Vicarious: Learning correspondences via (for now) hand-tuned feature extractors:
…One puzzle reinforcement learning researchers struggle with is how algorithms end up evolving to over-fit their environment. What that means in practice is if you suddenly were to, say, change the geometry of the Go board AlphaGo was trained on, or alter the placement of enemies and obstacles in Atari games, the AI might fail to generalize.
…Now, research from Vicarious – An AI startup with backing from people like Jeff Bezos, Mark Zuckerberg, Founders Fund, ABB, and others – proposes a way to ameliorate this flaw. This marks the second major paper from Vicarious this year.
…Their approach relies on what they call Schema Networks, which lets their AI learn the underlying dynamics of the environment it is exposed to. This means, for instance, that you can alter the width of a paddle in Atari Game breakout, or change the block positions, and the trained algorithm can generalize quickly to the new state, preserving its underlying understanding of the dynamics of the world built up during training. Traditional RL algorithms tend to struggle with this as they’ve learned a predictive model of the world as it is and struggle with learning more abstract links.
…There’s a small catch with Vicarious’s approach – the researchers had to do the object segmentation and identification themselves then feed that to the AI. In reality, one of the greatest challenges computer vision researchers face is accurately mapping and segmenting non-stationary images (and its even harder as they get deployed in the chaotic real world, as they need to link parts of a flat 2D image to messy 3D objects. I’m keen to see what happens when this algorithm can do the feature isolation itself.
Noteable: Meanwhile, DeepMind have published Relational Networks (claiming SOTA and superhuman performance) and Visual Interaction Networks, two philosophically similar research papers that hew closer to traditional deep learning approaches. Just as you and I use abstract logic to let us reason about the world, it seems likely AI will need the same capabilities.

Just what the heck does a career in AI policy look like?
…Twitter’s AI paper tsar Miles Brundage has published an exhaustive document outlining a Guide to Working in AI Policy and Strategy up on 80,000 hours. (And watch out for the nod to Import AI – thanks Miles. I’ll do my best!)

(Mildly) Controversial Microsoft/Maluuba research paper: Using rewards is easy, finding them is hard:
…A new research paper from Microsoft’s recent Canadian AI acquisition Maluuba, Hybrid Reward Architecture for Reinforcement Learning, shows how to definitively beat Ms. PacMan (clocking over a million points.). Ms PacMan, along with Montezuma’s Revenge, is one of the games that people have found consistently quite challenging, so it’s a notable result – though not as encouraging as on first look, when you work out what is required for the process to work.
..When you go and analyze its Hybrid Reward Architecture- you see that the approach is distributed, with Microsoft splitting up the task into many discrete sub-tasks which numerous reinforcement learning agents try to solve, while feeding their opinions up into a single meta-agent that helps to take decisions. Though it scores highy, the approach involves a lot of human specification, including hand-labeling different reward penalties and rewards for different entities in the game. As with the Vicarious paper, the technique is interesting, but it feels like it’s missing a key component – unsupervised extraction of entities and reward levels/penalties.

What TensorFlow v1.2 says about devices versus clouds:
Google has released version 1.2 of TensorFlow. There’s a ton of fixes and tweaks (eg, for RNN functionality), but buried in the release details is the note that Google will stop directly supporting GPUs on Mac systems (though will continue to accept patches from the community). There are likely a couple of reasons for this: one, the lack of much of an NNVIDIA ecosystem around macs (c. f Apple’s new external GPU for the Mac Pro runs AMD cards, which are yet to develop as much of a deep learning ecosystem.)
…Another way of looking at this is that the cloud wins AI. For now at least AI benefits from parallelization and the usage of large numbers of CPUs and GPUs together, with most developers either using a laptop paired with an external pool of cloud resources, and/or running their own Linux deep learning rig in a desktop tower.
Details: Tensorflow v1.2 on GitHub.

Snapchat’s first research paper: mobile-friendly neural nets with full-fat brains.
Researchers with Snap Inc. and the University of Iowa have published SEPNETs: Small and Effective Pattern Networks.
…tl;dr: a way to shrink trained models then recover them to restore some accuracy.
…It tackles one of the problems AI’s success has led to: the creation of increasingly large, deep models that have tremendous performance but take up a lot of space when deployed. Ultimately, the ideal scenario for AI development is to be able to train a single gigantic model on a nearby football-field filled with computers, then be able to have a little slider to shrink the trained model for deployment on various end-user devices, whether phones, or computers, or VR headsets, or something else. How do you do that? One idea is to try to smartly compress these trained models, either by lopping away at chunks of the neural network, or by scaling them down in a more disciplined way. Both methods see you tradeoff overall accuracy for speed, so fixing this requires new research. The Snapchat paper represents one contribution:
The details: First they use a technique called pattern binarization to shrink a pre-trained network (for instance, a tens-of-millions-of-parameters VGG or Inception model) into a smaller version of itself, at the cost of it losing some discriminative capabilities. They propose to fix this with a new neural network component they call a Pattern Residual Block. This component can sometimes help offset the changes wrought on the numbers its dealing with via the binarization process.They then use Group-Wise Convolution to further winow down the various components of the network. Shrinking it.
…Results:.Google MobileNet: 1.3 million params, 5.2mb bytes, accuracy 0.637
…Results:SEP-NET-R(Small) 1.3 million params, 5.2mb bytes, 0.658

Free pre-trained models, get your pre-trained mobile-friendly models right here!
…Google unfurls MobileNets to catch intelligence on the phone:
In possibly related news Google has released MobileNets, a collection of “mobile-first computer vision models for TensorFlow”.
…”MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. ”
…The available models vary from bite-size ones of 0.47 million parameters to larger ones of 4.24 million, with image accuracies ranging from 66.2 to 89.9% for the larger models.
Github repo here.

Speeding up open access reviews:  There’s a suggestion that Open Review – a platform that makes feedback and evaluation of papers public – is considering layering some aspect of its system over Arxiv, letting us not only publish preprints rapidly, but potentially review them as well.
…Note: None of this is meant to say that double blind reviewing is bad – it’s good, especially for significant papers with particularly controversial claims. But I think due to the breakneck speed at which AI moves at it’s necessary to try and speed things up if possible. This suggests one way to more rapidly gather better feedback on new ideas.
…How it might be used: Last week Hochreiter & co published the SELU paper. It’s gathered a lot of interest with numerous people running their own tests, chiming in with comments, or going through its 90+ page appendix. It’d be very convenient if there was a layer that let us put all this stuff in the same place.

Dollars for Numpy: Numpy has been given a little over half a million dollars from the Gordon Moore foundation to fund improvements to the Python scientific computing library. Numpy is absolutely crucial to neural networks within Python.

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to

Tech Tales:
[2035: The North East Canadian wilderness.]

It’s wet. There’s moss. The air has that peculiar clarity that comes from being turned by wind and freshened by water and replenished by the quiet green and brown things of nature. You breathe deeply as you walk the rarely used trail. Your feet compress the nested pine needles beneath you, sending up gusts of scented air. In the distance, you hear the sound of roiling running water and, beneath it, bells tolling.

You keep going. The sound of the water and of the bells gets louder. The bells echo out little sonorous rhythms that seem intertwined with the sounds of gushing river water. One of the bells is off – its timing discordant, cutting against the others. You begin to crest a small hill, and as your head clears it the sound rushes at you. The bells clang and the water thrums – their interplay not exactly abrasive – for the off bell is one of many – but somehow more mournful than you recall the sound being before.

The bells are housed in a small concrete tower, about 5 feet high that sits by the riverbank at a point where there’s a kink in the river. It has three walls, with the fourth left open to the elements, facing the river, broadcasting the sounds of the bells. You run your hands over the cold, mottled, lichen-stippled exterior as you approach its entrance. Close your eyes. Open them when you’re in front of the shrine. You study the 12 bells of the dead, able to make out the inscribed names of the gone, despite the movement of the bells. Now you just need to diagnoze why one of the bells seems to have fallen out of alignment with the others.

As you sit, studying the wiring in the shrine and watching the bells, it’s impossible not to think of your friends and how they are now commemorated. You all work for the government on geographic survey. As the climate has been changing your teams have been pushing further north for more of the year, trading safety for exploration (and the possibility of data valuable to resource extraction companies). You were at home, laid up with a broken leg, when the team of 12 went out. They were doing a routine mapping hike, away from camp, when the storm came in – it strengthened far more rapidly than their computer models anticipated and, due to a set of delicate occurrences, it brought snow and ice with it. Temperatures plunged. Snow-cladded everything. Rain was either flecked with ice or snow or a contributor to a sheet of frozen fog that lay over the land. Your colleagues died due to about 50 things going wrong in a very precise sequence. These things, as hysterical as it seems, happen.

The bells are set to dance to the rhythm of the river. Their loops are determined by observations from cameras atop the shrine, pointed at the writhing river. This visual information is then fed into an algorithm that is forever trying to find a pattern in the infinite noise of the river. After an hour you have the sense to give the cameras more than a cursory look and you discover that a spider has made a small nest near the sensor bulge, and one thick strand of webs is slung in front of one of the camera lenses. This, you figure, has injected a kind of long-term stability into part of the feed of data that the algorithm sees, swapping a patch of the frothing slate and white and dark blue and brown of the river-water with something altogether more stagnant.  Fixing it would be as simple as putting on a glove and carefully removing the spiderweb, then polishing the lens. You hold your hand up in front of the web to get a sense of how it would be to remove it and as your hand passes in front of the cameras the bells change their rhythm, some stuttering to a stop and others speeding up, driven to a frenzy by the changed vision. You put your hand down and the bells go back to their tolling, with the one that seems to be affected by the spiderweb still acting out of order.

When you file your report you say reports of odd sounds appear to be erroneous and you discovered no such indicators during your visit. You take comfort in knowing that the bells will continue to ring, driven increasingly by the way the world grows and breaks around them, and less by the prescribed chaos of the river.

Technologies that inspired this story: Attention, generative models, joint neural networks, long-short term memory

OpenAI Bits&Pieces:

OpenAI and DeepMind train reward functions via human feedback: A new research collaboration between DeepMind and OpenAI on AI safety sees us train a AI agents to perform behaviors that they think humans will approve of. This has implications for AI safety and has promising sample efficiency as well.

OpenAI audiopodcast about reinforcement learning by Sam Charrington with OpenAI/UC Berkeley robot chap Pieter Abbeel.

Import AI: Issue 46: Facebook’s ImageNet-in-an-hour GPU system, diagnosing networks with attention functions, and the open access paper debate

Attention & interpretability: modern neural networks are hard to interpret because we haven’t built tools to make it easy to analyze their decision-making processes. Part of the reason why we haven’t built the tools is that it’s not entirely obvious how you get a big stack of perceptual math machinery to tell you about what it is thinking in a way that is remotely useful to the untrained eye. The best thing we’ve been able to come up with, in the case of certain vision and language tasks, is attention where we visualize what parts of a neural network – sometimes down to an individual cell or ‘neuron’ within it – is activating in response to. This can help us diagnose why an AI tool is responding in the way it is.
.., Latent Attention Networks, from researchers with Brown University proposes an interesting way to improve our ability to analyze nets: by creating a new component to make it easier to visualize the attention of a given network in a more granular manner..
…In the paper they introduce a new AI component, which they call a Latent Attention Network. This component is general, working across different neural network architectures (a first, the researchers claim), and only requires the person to fiddle with it at its input or output points. LANs let them fit a so-called attention mask over any architecture.
…”The attention mask seeks to identify input components of x that are critical to producing the output F(x). Equivalently, the attention mask determines the degree to which each component of x can be corrupted by noise while minimally affecting F(x),” they write.
…The researchers evaluate the approach on a range of tasks from simple (MNIST! CIFAR) and to a game of Pong from the Atari Learning Environment. The ensuing visualizations seems to be helpful for getting a better grasp of how and why neural network classifiers work. I particularly recommend studying the images from Pong.
Why it could be useful: this technique hints at a way to be able to take a generic component and simply fit it to an arbitrary network, then get the network to cough up some useful information about its state – if extended it could be a handy tool for AI diagnosticians.

Self-Normalizing Neural Networks cause a stir: A paper from researchers with the Bioinformatics Institute in Austria proposes a way to improve feed forward neural network performance with a new AI component, Self-Normalizing Neural Networks. “FNNs are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations,” they write.
…The paper is thorough and is accompanied with a code release, aiding rapid replication and experimentation by others. The researchers carry out exhaustive experiments, bench-marking their approach (based around a SELU, a scaled exponential linear unit) against a movable feast of other AI approaches, ranging from Residual Nets, to Highway Networks, to weights with Batch Normalization or Layer Normalization, and more.
…They test the method exhaustively as well. “We compared SNNs on (a) 121 tasks from the UCI machine learning repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with standard FNNs and other machine learning methods such as random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy data set,” they write.
Noteable fact: One of the authors is Sepp Hochreiter, who invented (along with Juergen Schmidhuber) the tremendously influential Long-Short Term Memory networks component, aka the LSTM. LSTMs are used exhaustively in AI these days for tasks ranging from object detection to speech recognition and the paper has over 4500 citations (growing with the massive influx of new AI research into memory networks, differentiable neural computers, Neural Turing Machines, and so on).
…The Self-Normalizing Neural Networks paper is amply thorough, weighing in at an eyebrow-raising 102 pages, split between the research paper (9 pages) with the other pages devoted to comprehensive theoretical analysis, experiments, and – of course – references, to back it up. More of this European precision, please!

Open publishing (Arxiv) versus slow publishing (Conferences and Journals). 
The Hochreiter paper highlights some of the benefits of the frenetic attention that publishing on Arxiv can bestow, along with/instead of traditional (relatively slow-burning) conferences and journals. I think the trade-off between speed of dissemination and lack of peer review is ultimately worthwhile, though some disagree.
…Yoav Goldberg, a researcher who has done work at the intersection of NLP and Deep Learning, writes that Arxiv can also lead to people having an incentive to publish initial versions of papers that are thin, not very detailed, and that serve more as flag-planting symbols for an expected scientific breakthrough than anything else. These are all legitimate points.
…Facebook AI Researcher Yann Lecun weighed in and (in a lengthy, hard-to-link to note on Facebook) says that the open publishing process allows for rapid dissemination of ideas and experimentation free of the pressure to publish papers at a conference.As of the time of writing the nascent AI blogosphere continues to be roiled by this drama, so I’m sure this boulder will continue to roll.
…(For disclosure: I side more toward favoring the Arxiv approach and think that ultimately bad papers and bad behavior gets weeded out by the community over time. It’s rare that people accept a con. Deep Learning has been in hyper-growth mode since the 2012 AlexNet paper, so it’s natural things are a bit fast-moving and chaotic right now. Things may iron themselves out over time.

Compute as AI’s strategic fulcrum: the AI community is getting much better at training big neural network models. Latest case in point comes from Facebook, which has outlined a new technique for training large-scale image classifiers.
…Time to train ImageNet in 2012: A week or two across a single GPU, with need for loads of custom CUDA programming..
Time to train ImageNet in 2017: One hour across 256 GPUs. Vastly improved&simplified software+hardware..
…Although, as someone commented on Twitter, most people don’t easily have access to 256 GPUs.

Better classifiers through combination: DING DONG! DING DONG! When you read those four words there’s a decent chance you also visualized a big clock or imagined some sonic representation of a clock chiming. Human memory seems to work like this, with a sensory experience of one entity inviting in a bunch of different, complementary representations. Some believe it’s this fusion of senses that gives us such powerful discriminative abilities.
…Wouldn’t it be nice to get similar effects in deep learning? From 2015 onwards people started experimenting en mass with getting computers to better understand images by training the nets on paired sets of images and captions, creating perceptual AI systems with combined representations of entities. We’ve also seen people more recently experiment with training audio and visual data together. Now, scientists from MIT have combined visual, audio, and text, into the same network.
...The data: Almost a million images (COCO & Visual Genome), synchronized with either a textual description or an audio track (377 continuous days of audio data, pulled from over 750,000 Flickr videos).
...How it works: the researchers create three different networks to ingest text, audio, or picture data. The ensuing learned representations from all of these networks are outputted as fixed length vectors with the same dimensionality, which are then fed into a network that is shared across all three input networks. “While the weights in the earlier layers are specific to their modality, the weights in the upper layers are shared across all modalities,” they write.
Bonus: The combined system ends up having cells that activate in the presence of words, pictures, or sounds that correspond to subtle types of object, like engines or dogs.

Bored with the state of your supply chain automation? Consider investing in an autonomous cargo boatthe new craze sweeping across commodities makers worldwide!, as companies envisage a future where autonomous mines (semi-here) take commodities via autonomous trains (imminent) to autonomous ports (here) to the yet-to-be-built autonomous boats.

4K GAN FACES: A sight for blurry, distorted eyes. Mike Tyka has written about his experiments to use GANs to create large, high-resolution entirely synthetic faces.
…The results are quite remarkable, with the current images seeming as much a new kind of impressionism as realistic photographs, (though only for sub-sections of every given image, and sometimes wrought with Dali-esque blotches and Bacon-esque flaws)..
…”as usual I’m battling mode collapse and poor controllability of the results and a bunch of trickery is necessary to reduce the amount of artifacts,” he writes. G’luck, Mike!

You are not Google (and that’s okay): This article about knowing what large-scale over-engineered technology is worth your while and what is out of scope is as relevant for AI researchers and engineers as it is for infrastructure people.
…Bonus: the invention of the delightfully German-sounding acronym UNPHAT.

What China thinks about when China thinks about AI: A good interview with Oregon professor Tom Diettrich in China’s National Science Review. We’re entering an era where AI becomes a tool of geopolitics as countries flex their various strengths in the tech as part of wider national posturing. So it’s crucial that scientists stay connected with one another, talking about the issues that matter to them which transcend borders.
…Diettrich makes the point that modern AI is about as easy to debug as removing all the rats from a garbage dump. “Traditional software systems often contain bugs, but because software engineers can read the program code, they can design good tests to check that the software is working correctly. But the result of machine learning is a ‘black box’ system that accepts inputs and produces outputs but is difficult to inspect,” he says.
AI in China: “Chinese scientists (working both inside and outside China) are making huge contributions to the development of machine learning and AI technologies. China is a leader in deep learning for speech recognition and natural language translation, and I am expecting many more contributions from Chinese researchers as a result of the major investments of government and industry in AI research in China. I think the biggest obstacle to having higher impact is communication,” he says. “A related communication problem is that the internet connection between China and the rest of the world is often difficult to use. This makes it hard to have teleconferences or Skype meetings, and that often means that researchers in China are not included in international research projects.”

Building little pocket universes in PyTorch: This is a good tutorial for how to use PyTorch, an AI framework developed by Facebook, to build simple cellular automata grid worlds and train little AI agents in them.
…It’s great to see practical tutorials like this (along with the CycleGAN implementation & guide I pointed out last week) as it makes AI a bit less intimidating. Too many AI tutorials say stuff like “Simply install CUDA, CuDNN, configure TensorFlow, spin-up a dev environment in Conda, then swap out a couple of the layers.” This is not helpful to a beginner, and people should remember to go through all the seemingly-intuitive setup steps that go with any deep learning system..
…Another great way to learn about AI is to compete in AI competitions. So it’s no surprise Google-owned Kaggle has passed one million members on its platform. Because Kaggle members use the platform to create algorithms and fiddle with datasets via Kaggle Kernels, it seems like as membership scales Kaggle’s usefulness will scale proportionally. Congrats, all!

Compete for DATA: CrowdFlower has launched AI For Everyone, a challenge that will see two groups every quarter through to 2018 compete to get access to free data on CrowdFlower’s eponymous platform.
…Winners get a free CrowdFlower AI subscription, a $25,000 credit towards paying for CrowdFlower contributors to annotate data, free CrowdFlower platform training and boarding, and promotion of their results.

OpenAI Bits & Pieces:

Talking Machines – Learning to Cooperate, Compete, and Communicate: This is a follow-up to our previous work on getting AI agents to invent their own language. Here we combine this ability with the ability to train multiple agents together with conflicting goals. Come for the science, stay for the amusing GIFs of spheres playing (smart!) tag with one another. Work by OpenAI interns Jean Harb and Ryan Lowe from McGill University, plus others..

Better exploration in deep learning domains: New research: UCB and InfoGain Exploration via Q-Ensembles & Parameter Space Noise for Exploration.

Tech Tales:

[ 2045: Outskirts of Death Valley, California. A man and a robot approach a low-building, filled with large, dust-covered machines, and little orange robot arms on moving pedestals that whizz around, autonomously tending to the place. One of them has the suggestion of a thatched hairpiece, made up of a feather-coated tumbleweed that has snared into one of its joints.]

You can’t be serious.
Alvin, it’ll be less than a day.
It’s undignified. I literally cured cancer.
You and a billion of your clones, sure.
Still me. I’m not happy about this.
I’m going to take you out now.
No photographs. If I sense a single one going onto the Internet I’m going to be very annoyed.
Sure, you say, then you unscrew the top of Alvin’s head.

Alvin is, despite its inflated sense of importance, very small. Maybe about half a palm’s worth of actual computer, plus a forearm’s worth of cabling, and a few peripheral cables and generic sensor units that can be bound up and squished together. Which is why you’re able to lift his head away from his body, unhook a couple of things, then carefully pull him out. Your own little personal, witheringly sarcastic, AI assistant.

Death Valley destroys most machines that go into it, rotting them away with the endless day/night flexing of metal in phase transitions, or searing them with sand and grit and sometimes crusted salt. But most machines don’t mind – they just break. Not Alvin. For highly sensitive, developed AIs of its class the experience is actively unpleasant. Heat leads to flexing in casing which leads to damage which leads to improper sensing which gets interpreted as something a vast group of scientists has said corresponds to the human term for pain. Various international laws prohibit the willful infliction of this sort of feeling on so-called Near Conscious Entities – a term that Alvin disagrees with.

So, unwilling to violate the law, here you are at Hertz-Rent-a-Body, transporting Alvin out of his finely-filigreed silver-city Android body, into something that looks like a tank. You squint. No, it’s actually a tank, re-purposed slightly; its turret sliced in half, its snout capped with a big, sensor dome, and the bumps on its front for storing smoke flares now contain some directional microphones. Aside from that it could have teleported out of a war in the previous century. You check the schematics and are assured fairly quickly that Death Valley won’t pose a threat to it.
…Ridiculous, says Alvin. So much waste.

You unplug the cable connecting Alvin to the suit’s speaker, and carry him over to the tank. The tank senses you, silently confirms the rental with your bodyphone, then the hatch on its roof sighs open and a robotic arm snakes out.
Welcome! Please let us accommodate your N.C.E codename A.L.V.I.N the arm-tank says, its speakers crackling. The turret shifts to point to the electronics in your hands.
Alvin, having no mouth due to not being wired up to a speaker, flashes its output OLEDs angrily, shimmering between red and green rapidly – a sign, you know from experience, of the creation and transmission of a range of insults, both understandable by conventional humans and some highly specific to machines.
Your N.C.E has a very large vocabulary. Impressive! chirps the tank.
The robot arm plucks Alvin delicately from your hands and retracts back into the tank. A minute passes and the tank whirs. A small green light turns on in the sensor dome on the tip of its turret. One of its speakers emits a brief electronic-static burp, then-
I am too large, says Alvin, through the tank. They want me to do tests in this thing.

Five minutes later and Alvin is trundling to and fro in the Hertz parking lot, navigating between five orange cones set down by another similarly-cheerful robotic arm on a movable mount. A couple more tasks pass – during one U-Turn Alvin makes the tank shuffle jerkily giving the appearance of a sulk – then the Hertz robot arm flashes green and says We have validated movement policies. Great driving! Please return to us within 24:00 hours for dis-internment!

Alvin trundles over to you and you climb up one one of his treads, then hop onto the roof. You put your hand on the hatch to pull it open but it doesn’t move.
You’re not coming in.
Alvin, it’s 120 degrees.
I’m naked in here. It would make me uncomfortable.
Now you’re just being obtuse. You can turn off your sensors. You won’t notice me.
Are you ordering me?
No, I’m not ordering you, I’m asking you nicely.
I’m a tank, I don’t have to be nice.
That’s for sure.
I’ll drive in the shade. You won’t get too hot. Medically, you’re going to be fine.
Just drive, you say.
You start to trundle away into the desert. Have a good day! Shouts the Hertz robotic arm from the parking lot. Alvin finds the one remaining smoke grenade in the tank and fires it into the air,back towards the body rental shop.
We have added the fine for smoke damage to your bill. Safe driving! crackles the robot arm through the distant haze.

Technologies that inspired this story: Human-computer interaction, survey of AI experts about AI progress, the work of Dr. Heather Knight, robotics. human-computer interaction.