Mapping Babel

Import AI: Issue 45: StarCraft rumblings, resurrecting ancient cities with CycleGAN, and Microsoft’s imitation data release

Resurrecting ancient cities via CycleGAN: I ran some experiments this week where I used a CycleGAN implementation (from this awesome GitHub repo) to convert ancient hand-drawn city maps (Jerusalem, Babylon, London) into modern satellite views.
…What I found most surprising about this project was the relative ease of it – all it really took was a bit of data munging on my end, and having the patience to train a Google Maps>Google Maps Satellite View network for about 45 hours or so. The base model generalized well – I figure it’s because the Google Maps overhead street-views have a lot of semantic similarity to pen and brush-strokes in city illustrations.
…I’m going to do a few more experiments and will report back here if any of it is particularly interesting. Personally, I find that one of the best ways to learn about anything is to play with it, aimlessly fiddling for the sheer fun of it, discovering little gems in unfamiliar ground. It’s awesome that modern AI is so approachable that this kind of thing is possible.
…Components used: PyTorch, a CycleGan implementation trained for 45 hours, several thousand map pictures, a GTX 1070, patience, Visdom.

Learning from demonstrations: An exciting area of current reinforcement learning research is to develop AI systems that can learn to perform tasks based on human demonstrations, rather than requiring a hand-tuned reward function. But gathering data for this at scale is difficult and expensive (just imagine if arcades were more popular and had subsidized prices in exchange for collecting your play data!). That’s why it’s great to see the release of The Atari Grand Challenge Dataset from researchers at Microsoft Research, and Aachen University. The dataset consists of ~45 hours of playtime spread across five Atari games, including the notoriously hard-to-crack Montezuma’s Revenge.

AI’s gender disparity, visualized: AINow co-founder Meredith Whittaker did a quick analysis of the names on papers accepted to ICML and found that men vastly outnumber women. Without knowing the underlying submission data it’s tricky to use this to argue for any kind of inherent sexism to the paper selection process, but it is indicative of the gender disparity in AI – one of the many things the research community needs to fix as AI matures.

Embedding the un-embeddable: In Learning to Compute Word Embeddings On the Fly researchers with MILA, DeepMind, and Jagiellonian University propose a system to easily learn word embeddings for extremely rare words. This is potentially useful, because while deep learning approaches excel in environments containing a large amount of data, they tend to fail when dealing with small amounts of data.
…The approach works by training a neural network to predict the embedding of a word given a small amount of auxiliary data. Multiple auxiliary sources can be combined for any given word. When dealing with a rare word the researchers fire up this network, feed it a few bits of data, and then try to predict that embeddings location within the full network. This means you can develop your main set of embeddings by training in environments with large amounts of data, and whenever you encounter a rare word you instead use this system to predict an embedding for it, letting you get around the lack of data, though with some imprecision.
…The researchers evaluate their approach in three domains: question answering, entailment prediction, and language modelling, attaining competitive results in all three of these domains.
…”Learning end-to-end from auxiliary sources can be extremely data efficient when these sources represent compressed relevant information about the word, as dictionary definitions do. A related desirable aspect of our approach is that it may partially return the control over what a language processing system does into the hands of engineers or even users: when dissatisfied with the output, they may edit or add auxiliary information to the system to make it perform as desired,” they write.

Battle of the frameworks: CNTK matures: Microsoft has released version 2.0 of CNTK (the Microsoft Cognitive Toolkit), its AI development framework. New features include support for Keras, more Java language bindings, and tools for compressing trained models.

Stick this in your calendar, Zerg scum! The Call for Papers just went out for the Video  Games and Machine Learning workshop at ICML in Australia this year. Confirmed speakers include people from Microsoft, DeepMind, Facebook, and others. Noteable: someone from Blizzard will be giving a talk about StarCraft, a game that the company has partnered with DeepMind on developing AI tools around.
Related: Facebook just released V1.3-0 of TorchCraft, an open source framework for training AI systems to play StarCraft. The system now supports Python and also has improved separate data streams for feature-training, such as maps for walkability, buildability, and ground-height.

Ultra-cheap GPU substrates for AI development: Chip company NVIDIA has seen its stock almost triple in value over the last year as investors realized that its graphical processing units are the proverbial pickaxe of the current AI revolution. But in the future NVIDIA will likely have more competition (a good thing!) from a range of semiconductor startups (Graphcore, Wave, and others), established rivals (Intel via its Nervana and Altera acquisitions, AMD via its extremely late dedication to getting its GPUs to run AI software), and possibly from large consumer tech companies such as Google with its Tensor Processing Units (TPU).
…So if you’re NVIDIA, what do you do? Aside from working to design new GPUs around specific AI needs (see: Volta), you can also try to increase the number of GPU-enabled servers sold around the world. To that end, the company has partnered with so-called ODM companies Foxconn, Quanta, Inventec and Wistron. These companies are all basically intermediaries between component suppliers and massive end-users like Facebook/Microsoft/Google/and so on, and are farmed for designing powerful servers available at a low price (if bought in sufficiently high volumes).

The power of simplicity: What wins AI competitions – unique insight? A PHD? Vast amounts of experience? Those help, but probably the single-most important thing is consistent experimentation, says Keras creator Francois Chollet, in a Quora answer discussing why Keras features in so many top Kaggle competitions.
…”You don’t lose to people who are smarter than you, you lose to people who have iterated through more experiments than you did, refining their models a little bit each time. If you ranked teams on Kaggle by how many experiments they ran, I’m sure you would see a very strong correlation with the final competition leaderboard.”
…Even in AI, practice makes perfect.

Will the AI designers of the future be more like sculptors than programmers? AI seems to naturally lend itself to different forms of development than traditional programming. That’s because most of the neural network-based technologies that are currently the focus of much of AI research are inherently spatial: deep learning is a series of layered neural networks, whose spatial relationship is indicative of the functions the ultimate system approximates.
…Therefore, it’s interesting to look at the types of novel user interface design that augmented- and virtual-reality make possible and think of how it could be applied to AI. Check out this video by Behringer of their ‘DeepMind’ (no relation to the Go-playin’ Google sibling) system, then think about how it might be applied to AI.

CYBORG DRAGONFLY CYBORG DRAGONFLY CYBORG DRAGONFLY: I’m not kidding. A company named Draper has built a so-called product called DragonflEye, which consists of a living dragonfly which has been augmented with solar panels and with electronics that interface with its nervous system.
…The resulting system “uses optical electrodes to inject steering commands directly into the insect’s nervous system, which has been genetically tweaked to accept them. This means that the dragonfly can be controlled to fly where you want, without sacrificing the built-in flight skills that make insects the envy of all other robotic micro air vehicles,” according to IEEE Spectrum.

Are we there yet? Experts give thoughts on human-level AI and when it might arrive: How far away is truly powerful AI? When will AI be able to perform certain types of jobs? What are the implications of this sort of intelligence? Recently, a bunch of researchers decided to quiz the AI community on these sorts of questions. Results are outlined in When Will AI Exceed Human Performance, Evidence from AI Experts.
…The data contains responses from 352 researchers who had published at either NIPS or ICML in 2015, so keep the (relatively small) sample size in mind when evaluating the results.
…One interesting observation pulled from the abstract is that: “researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years, with Asian respondents expecting these dates much sooner than North Americans.”
…The experts also generate a bunch of predictions for AI milestones, including:
…2022: AI can beat Starcraft.
…2026: AI can write a decent high school level essay.
…2028: An AI system can beat a human at Go given the same amounts of training.
…2030: AI can completely replace a retail salesperson.
…2100: AI can completely automate the work of an AI researcher. (How convenient!)

Monthly Sponsor: Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to

Tech Tales:

[2024: An advertizing agency in Shoreditch, East London. Three creatives stand around wearing architect-issue black turtlenecks and jeans. One of them fiddles with a tangle of electronic equipment, another inspects a VR headset, and the third holds up a pair of gloves with cables snaking between them and the headset and the other bundle of electronics. The intercom crackles, announcing the arrival of the graffiti artist, who lopes into the room a few seconds later. ]

James, so glad you could make it! Tea? Coffee?
Nah I’m okay, let’s just get started then shall we?
Okay. Ever used these before? says one of them, holding up the electronics-coated gloves.
No. Let me guess – virtual hands?

Five minutes later and James is wearing a headset, holding his gloved hands as though spray-painting. In his virtual reality view he’s standing in front of a giant, flawless brick wall. There’s a hundred tubs of paint in front of him and in his hand he holds a simulated spraycans that feel real because of force feedback in the gloves.

Funny to do this without worrying about the coppers, James says to himself, as he starts to paint. Silly creatives, he thinks. But the money is good.

It takes a week and by the end James is able to stare up at the virtual wall, gazing on a giant series of shimmering logos, graffiti cartoons, flashing tags, and the other visual glyph and phrases. Most of these have been daubed all across South London in one form or the other in the last 20 years, snuck onto brick walls above train-station bridges, or slotted beneath window rims on large warehouses. Along with the paycheck they present him with a large, A0 laminated print-out of his work and even offer to frame it for him.

No need, he says, rolling up the poster.

He bends one of the tube ends as he slips an elastic band over it and one of the creatives winces.

I’ll frame it myself.

For the next month, the creatives work closely with a crew of AI engineers, researchers, roboticists, artists, and virtual reality experts, to train a set of industrial arms to mimic James’s movements as he made his paintings. The force feedback gloves he wore collected enough information for the robot arms to learn to use their own skeletal hand-like grippers to approximate his movements, and the footage from the other cameras that filmed him as he painted helps the robots adjust the rest of their movements. Another month goes by and, in a film lot in Los Angeles, James’s London graffiti starts to appear on walls, sprayed on by robot arms. Weeks later it appears in China, different parts combined and tweaked by generative AI algorithms, coating a fake version of East London in graffiti for Chinese tourists that only travel domestically. A year after that and James sees his graffiti covering the wall of a street in South Boston in a movie set there and uses his smartphone to take a photo of his simulated picture made real in a movie.

Caption: “Graffin up the movies now.”.

Techniques that inspired this story: Industrial robots, time-contrastive networks, South East London (Lewisham / Ladywell / Brockley / New Cross), Tilt Brush.

OpenAI bits&pieces:

AlphaGO versus the real world: Andrej Karpathy has written a short post trying to outline what DeepMind’s AlphaGo system is capable of and what it may struggle with.

DeepRL bootcamp: Researchers from the University of California at Berkeley, OpenAI, DeepMind, are hosting a deep reinforcement learning workshop in late August in Berkeley. Apply here.

Import AI Issue 44: Constraints and intelligence, Apple’s alleged neural chip, and AlphaGo’s surprising efficiency

Constraints as the key to intelligence: Machine learning whiz & long-distance runner Neil Lawrence has published a research paper, Living Together: Mind and Machine Intelligence, that explores the idea that intelligence is intimately related to the constraints imposed on our ability to communicate.
…the gist of Neil’s argument is that intelligence can be distilled as a single number, which he calls an Embodiment Factor. This expresses the relationship between how much raw compute an intelligence can make use of at once, and how much it can communicate information about that computation during the same time frame. Humans are defined by being able to throw a vast amount of compute at any given problem, but then we can only communicate at a couple of words a second at most.
…The way Neil Lawrence puts it is that a computer with a 10 Gigaflop processing capacity and a communication capacity of about 1 gigabit per second has an embodiment factor of 10 (computation / communication), versus a human brain which can handle about an exaflop of compute with a communication limit of about 100 bits per second – representing an astonishing embodiment factor of 10^16. It is this significant compression which leads to many of the useful properties in our own intelligence, he suggests.
…(Open access note: Lawrence was originally going to publish this commentary through a traditional academic channel, but balked at paying fees and put it on Arxiv instead. Thanks, Neil!)

SelfieNightmareGAN: For a truly horrifying time I recommend viewing this experiment where artist Mario Klingemann uses CycleGAN to transpose doll faces onto Instagrammable-selfies.

G.AI.VC: Google has launched an investment arm specifically focused on artificial intelligence. It’s unusual for the company to focus on individual verticals and likely speaks to the immense enthusiasm Google feels for AI. The fund will make investments with a check size of between $1 and $10 million, reports Axios’s Dan Primark.

Treasury secretary walks back AI skepticism: US Treasury Secretary Steve Mnuchin said a few months ago that problems related to AGI and AI-led automation were “50-100 years away” and these issues weren’t “on the radar screen” of federal government.
…He has changed his tune. Now, he says:When I made the comment on artificial intelligence — and there’s different views on artificial intelligence — I was referring to kind of like R2D2 in Star Wars. Robotics are here. Self-driving cars are something that are gonna be here soon. I am fully aware of and agree that technology is changing and our workers do need to be prepared.”

iAI – Apple said to work on ‘neural chip’: Apple is developing a custom chip for its mobile devices specifically designed for inference tasks like speech and face recognition, according to Bloomberg. Other chipmakers such as Qualcomm have already taken steps in this direction. It’s likely that in the coming years we’ll see most chips get dedicated neural network bits of logic (basically matrix multiplication stuff with variable precision), given the general adoption of the technology – Nvidia is already designing certain GPU components specifically for AI-related tasks.

AI prizes, prizes everywhere! Real estate marketplace Zillow has teamed up with Google-owned Kaggle to offer a $1 million dollar data science competition. The goal? Improve its ability to predict house prices. Submitted predictive models will be evaluated against real house prices over first three months following closure of the competition.
…if this sort of thing works then, in a pleasing Jorge Luis Borges-manner, the predictions of these services could feasibly become a micro-signal in actual home prices, and so the prediction and reality could compound on each other (infinitesimally, but you know the story about butterflies & storms.)
…Next up – using the same sort of competitive model to build the guts of a self-driving car: AI-teaching operation Udacity and wannabe-self-driving company Didi (a notable competitor to troubled Uber) have partnered to create a prize for the development of open-source self-driving car technology. Over 1000 teams will compete for a $100,000 dollar prize.
…The goal? “Automated Safety and Awareness Processing Stack (ASAPS), which identifies stationary and moving objects from a moving car, and uses data that includes Velodyne point cloud, radar objects, and camera image frames. Competitors are challenged to create a redundant, safe, and reliable system for detecting hazards that will increase driving safety for both manual and self-driving vehicles,” according to Udacity.

AlphaGo’s surprisingly efficient success: AlphaGo beat the world champion Kie Jie 3-0 at The Future of Go Summit in China. But local spectators were stymied after the state ordered streams of the match shut down, as AlphaGo demonstrated prowess against the human champion. Still, the games continued. During the second game Demis Hassabis, DeepMind’s founder, said AlphaGo evaluated many of human champion Kie Jie’s moves in the second game to be “near perfect”. Still, he resigned, as AlphaGo created a cautious, impenetrable defense…
…later, DeepMind revealed more details about the system behind AlphaGo. In its original incarnation AlphaGo was trained on tens of thousands of human games and used two neural networks to plan and evaluate moves, as well as Monte Carlo Tree Search to help with planning. Since earning a cover of Nature (via beating European Go expert Fan Hui) and then beating seasoned player Le Sedol in Korea last year, DeepMind has restructured the system.
…the version of AlphaGo that was shown in China ran on a single TPU board – that’s a computer full of custom AI training&inference processors made by Google. It consumed a tenth of the computation at inference time as its previous incarnation, suggesting that its underlying systems have become more efficient – a crucial mark of both earnest optimization by DeepMind’s engineers, as well as dawning intelligence from greater algorithms.
But you might not be aware of this if you were trying to watch the game from within China – the state cut coverage of the event shortly after the first game began, for nebulous hard-to-discern political reasons.
…China versus the US in AI: While the US and Europe investments in AI either reduce or plateau, China’s government is ramping up spending as it tries to position the country to take advantage of the AI megatrend, partially in response to events like AlphaGo, reports The New York Times.

Could AI help healthcare? The later you wait to treat an ailment, the more expensive the treatment will be. That’s why AI systems could help bring down the cost of healthcare (whether that be for governments that support single-payer systems, or in the private sector). Many countries have spent years trying to digitize health records and, as those projects come to fruition, a vast hoard of data will become available for AI applications – and researchers are paying attention.
…“Many of us are now starting to turn our eyes to social value-added applications like health,” says AI pioneer Yoshua Bengio in this talk (video). “As we collect more data from millions and billions of people around the earth we’ll be able to provide medical advice to billions of people that don’t have access to it right now”.

Reading the airy tea leaves: AWS GPU spot price spike aligns with NIPS deadline: prices for renting top-of-the-range GPU servers for Amazon spiked to their highest level in the days before the NIPS deadline. That synced with stories of researchers hunting for GPUs both within companies and at cloud providers.
…The evidence, according to a tweet from Matroid founder Reza Zadeh: a dramatic rise in the cost to rent ‘p2.16xlarge’-GPU Instances on Amazon Web Services’s cloud:
…Baseline: $2 per hour.
…May 18th-19th (NIPS deadline): $144 per hour.
…Though remember, correlation may not be causation – there are other price spikes in late April that don’t seem to be correlated to AI events.

Imagining rules for better AI: When you or I try to accomplish tasks in our day we usually start with a strong set of biases about how we should go about completing the tasks. These can range from common sense beliefs (if you need to assemble and paint a fence, it’s a bad idea to paint the fence posts before you try to assemble them), to the use of large pre-learned rulesets to help us accomplish a task (cooking, or doing mathematics.)
…This is, funnily enough, how most computer software works: it’s a gigantic set of assumptions, squeezed into a user interface, and deployed on a computer. People get excited about AI because it needs fewer assumptions programmed into it to do useful work.
…But a little bit of bias is useful. For example, new research from the Georgia Institute of Technology and other researchers, shows how to use some priors fruitfully. In Game Engine Learning from Video (PDF) the authors come up with an AI system that plays a game while having the parallel goal of trying to successfully approximate the underlying program of the game engine, which it only sees through pixel inputs – aka what the player sees. It is given some priors – namely, that the program it is trying to construct contain game mechanics eg, if a player falls then the ground will stop them, and a game engine which governs the larger mechanics of the world. The researchers feed it example videos of the game being played, as well as the individual sprites of the images used to build the game. The AI then tries to learn to align sprites with specific facts or precepts, ranging from whether a sprite is animated, how its spatial arrangement changes over time, whether it is related to any other sprites, its velocity, and so on. The AI then learns to scan over the games and align specific sprite actions with rules it derives, such as whether the Sprite corresponding to Mario can move right if there is nothing in front of him, and so on. The system can focus on trying to learn specific rules by rapidly paging through the stored play images that correspond to the relevant sprite actions.
…It uses a fusion of this sort of structured, supervised learning, to iteratively learn how to play the game by reconstructing its inner functions and projecting forward based on its learned mechanistic understanding of the system. They show that this approach outperforms a convolutional neural network trained for next-frame prediction. (I’d want to also see baselines for traditional reinforcement learning algorithms as well to be convinced further.)
…This approach has numerous drawbacks from the need for a human in the loop to load it up with specifically specified priors, but it hints at a future where our AI systems can be given slight biases and interpret the world according to them. Perhaps we could create a Manhattan Project for psychologists to enter numerous structured entries about human psychology, and feed them to AIs to see if they can help the AIs predict our own reactions, just like predicting the movement of a mushroom in Super Mario.
…Components used: OpenCV, Infinite Mario

Pix2code: seeing the code within the web page: at some point, we’re going to want our computers to be able to do most programming for us. But how do you get computers to figure out how to program stuff that you don’t have access to the source for?
…In pix2code, startup UIzard creates a system that lets a computer look at a screenshot of a web page and then figure out how to generate the underlying code which would produce that page. The approach can generate code for iOS and Android operating systems, with an accuracy of 77%. In other words, it gets the underlying code right four times out of five.

OpenAI bits&pieces:

OpenAI Baselines: release of a well-tuned implementation of DeepMind’s DQN algorithm, plus three of its variants. Bonus: raw code, trained models, and a handy tips and tricks compendium for training and debugging AI algorithms. There will be more.

Tech Tales:

2025: Russia deploys the first batch of Shackletons across its thinly-populated Eastern flanks. The mission is two-fold: data gathering, and experimental research into robotics and AI. It drops them out of cargo planes in the night, hundreds of them falling onto the steppes of Siberia, their descent calmed by emergency-orange parachutes.

Generation One could traverse land, survive extremely low temperatures, swim poorly (float with directional intent, one officer wrote in a journal), and consistently gather and broadcast data. The Shackletons beamed footage of frozen lakes and bare horizon-stretching foxes back to TV and computer screens around the world and people responded, making art from the data generated by Russia’s remote parts. The robots themselves became celebrities and, though their locations were unknown, sometimes roving hunters, scavengers, and civil servants would find them out there in the wastes and take selfies. One iconic photo saw a bearded Russian soldier with his arm slung over the moss-mottled back of an ageing Shackleton. He had placed a pair of military-issue dark glasses on one of the front sensor bulges, giving the machine a look of comedic detachment.
“Metallicheskaya krysa”, the Russians affectionately called them – metal rats.

2026: Within a year, the Shackletons were generating petabytes of raw data every day, ranging from audio and visual logs, to more subtle datapoints – local pollen counts, insect colonies, methane levels, frequency of bubbles exploding from gas escaping permafrost, and so on. Each Shackleton had a simple goal: gather and analyze as much data as possible. Each one was capable of exploring its own environment and the associated data it received. But the twist was the Shackletons were able to identify potentially interesting data points they hadn’t been pre-loaded with. One day one of the machines started reporting a number that scientists found correlated to a nearby population of foxes. Another day another machine started to output a stream of digits that suggested a kind of slow susurration across a number line, and the scientists eventually realized this data corresponded to the water levels of a nearby river. As the years passed the Shackletones became more and more astute, and the data they provided was sucked up by the global economy, going on to fuel NGO studies, determine government investment decisions and, inevitably, give various nebulous financial entities a hedge in the ever-more competitive stock markets. Russia’s selectively declassified more and more components of the machines, spinning them off into state-backed companies, which grew to do business across the world.

2029: Eventually, the Shackletons became tools of war – but not in the way people might expect. In 2029 the UN started to drop batches of improved Shackletons into contested borders and other flashpoints around the world – the mountains of east Afghanistan, jungles in South America, even, eventually, the Demilitarized Zone between South and North Korea. At first, locals would try to sabotage the Shackletons, but over time this ebbed. That was because the UN mandated that the software of the Shackletons be open and verifiable – all changes to the global Shackleton operating system were encoded in an auditable system based on blockchain technologies. They also mandated that the data the Shackletons generated be made completely open. Suddenly, militaries around the world were deluged in rich, real-world data about the locations of their foes – and their foes gained the same data in kind. Conflict ebbed, never quite disappearing, but seeming to decline to a lower level than before.

Some say the deployment of the Shackletons can be correlated to this decline of violence around the world. The theory is that war hinges on surprise, and all The Shackletons do is turn the unknown parts of the world into the known. It’s hard to be in a Prisoner’s Dilemma when everyone has correct information.

Technologies that inspired this story: Ethereum / Bitcoin, unsupervised auxiliary goal identification, Boston Dynamics, hierarchical temporal memory

Import AI: Issue 43: Why curiosity improves AI algorithms, what follows ImageNet, and the cost of AI hardware


ImageNet is dead, long live WebVision: ImageNet was a dataset and associated competition that helped start the deep learning revolution by being the venue where in 2012 a team of researchers convincingly demonstrated the power of deep neural networks. But now it’s being killed off – this year will be the last official Imagenet challenge. That’s appropriate because last year’s error rate on the overall dataset was about 2.8 percent, suggesting that our current systems have exhausted much of ImageNet’s interesting challenges and may even be in danger of overfitting.
…What comes next? One potential candidate is WebVision, a dataset and associated competition from researchers at ETH Zurich, CMU, and Google, that uses the same 1000 categories as the ImageNet competition in 2012 across 2.4 million modern images and metadata taken directly from the web (1 million from Google Image Search and 1.4 million from Flickr.)
…Along with providing some degree of continuity in terms of being able to analyze image recognition progress, this dataset also has the advantage of being partially crappy, due to being culled from the web. It’s always better to test AI algorithms on the noisy real world.
…”Since the image results can be noisy, the training images may contain significant outliers, which is one of the important research issues when utilizing web data,” write the researchers.
…More information: WebVision Challenge: Visual Learning and Understanding With Web Data.

Making self-driving cars a science: the field of self-driving car development it lacks the open publication conventions of the rest of AI research, despite using and extending various cutting-edge AI research techniques. That’s probably because of the seemingly vast commercial-value of self-driving cars. But it brings forward a bunch of problems, namely, how can people try to make the development more scientific and thereby improve the efficiency of the industry, while benefiting society through the science being open.
…AI meme-progenator and former self-driving startup intern Eder Santana has written up a shopping list of things that, if fulfilled, would improve the science of self-driving startups. It’s a good start at a tough problem.
…I wonder if smaller companies might band together to enact some of these techniques – with higher levels of openness than titans like Uber and Google and Tesla and Ford etc – and use that to collaboratively pool research to let them compete? After all, the same philosophy already seems present in Berkeley DeepDrive, an initiative whereby a bunch of big automakers fund open AI research in areas relevant to their development.
The next step is shared data. I’m curious if Uber’s recent hire, Raquel Urtasun, will continue her work on the KITTI self-driving car dataset which she created and Eder lists as a good example.

AI aint cheap: Last week, GPUs across the world were being rented by researchers racing to perform final experiments for NIPS. This wasn’t cheap. Despite many organizations (including OpenAI) trying to make it easier for more researchers to experiment with and extend AI, the costs of raw computer remain quite high. (And because AI is mostly an experimental, empirical science, you can expect to have to shell out for many experiments. Some deep-pocketed companies, like Google, are trying to offset this by giving researchers free access to resources, most recently 1,000 of its Tensor Processing Units in a dedicated research cloud, but giveaways don’t seem sustainable in the long run.)
…”We just blew $5k of google cloud credits in a week, and managed only 4 complete training runs of Inception / Imagenet. This was for one conference paper submission. Having a situation where academia can’t do research that is relevant to Google (or Facebook, or Microsoft) is really bad from a long-term perspective”, wrote Hacker News user dgacmu.
A new method of evaluating AI we can all get behind: Over on the Amazon Web Services blog a company outlines various different ways of training a natural language classification system and it lists how much it costs not just in terms of computation, but in terms of how much it will cost you to rent the computing resources for it on AWS in both CPUs and GPUs. These sorts of numbers are helpful for putting into perspective how much AI costs and, more importantly, how long it takes to do things that the media (yours included) makes sound simple.

How to build an AI business, from A16Z: VC firm Andreessen Horowitz has created the AI Playbook, a microsite to help people figure out how AI works and how to embed into their business.
…Bonus: it includes links to the thing every AI person secretly (and not so secretly) lusts after: DATA.
…Though AI research has been proceeding at a fairly rapid clip, this kind of project hints at the fact that commercialization of it has been uneven. That’s partly due to a general skills deficit in AI across the tech industry and also because in many ways it’s not exactly clear how you can use AI – especially the currently on-trend strain of deep neural networks – in a business. Most real-world data requires a series of difficult transforms before it can be strained through a machine learning algorithm and figuring out the right questions to ask is its own science.

E-GADs: Entertaining Generative Adversarial Doodles! Google has released a dataset of 50 million drawings across 345 distinct categories, providing artists and other fiddlers with a dataset to experiment with new kinds of AI-led aesthetics.
…This is the dataset that supported David Ha’s fun SketchRNN project, whose code is already available.
… It may also be useful for learning representations of real objects – I’d find it fun to try to train doodles with real image counterparts in a semi-supervised way, then be able to transform new real world pictures into cute doodles. Perhaps generative adversarial networks are a good candidate? I must have cause to use the above bolded acronym – you all have my contact details.

Putting words in someone else’s mouth – literally: fake news is going to get even better based on new techniques for getting computers to synthesize realistic looking images and videos of people.
…in the latest research paper in this area a team of researchers at the University of Oxford have produced ‘speech2vid’, a technique to get computers to be able to take a single still image of a person and an audio track and synthesize an animated version of that person’s face saying those words.
…The effects are still somewhat crude – check out the blurred, faintly comic-book like textures in the clips in this video. But hint at a future where it’s possible to create compelling propaganda using relatively little data. AI dopplegangers won’t just be for celebrities and politicians and other people who have generated vast amounts of data to be trained on, but will be made out of normal data-lite people like you or me or everyone we know.
….More information in the research paper You said that?

The curious incident of the curiosity exploration technique inside the learning algorithm: how can we train AI systems to explore the world around them in the absence of an obvious reward? That’s a question that AI researchers have been pondering for some time, given that in real life rewards (marriage, promotions, finally losing weight after seemingly interminable months of exercise) tend to be relatively sparse.
…One idea is to reward agents for being curious, because curious people tend to stumble on new things which can help expand and deepen their perception of the world. Children, for instance, spend most of their time curiously exploring the world around them without specific goals in mind and use this to help them understand it.
…The problem for AI algorithms is figuring out how to get them to learn to be curious in a way that leads to them learning useful stuff. One way could be to reward the visual novelty of a scene – eg, if I’m seeing something I haven’t seen before, then I’m probably exploring stuff usefully. Unfortunately, this is full of pitfalls – show a neural network the static on an untuned television and every frame will be novel, but not useful.
…So researchers at The University of California at Berkeley have come up with a technique to do useful exploration, outlined in Curiostiy-driven exploration by Self-supervised Prediction. It works like this: “instead of making predictions in the raw sensory space (e.g. pixels), we transform the sensory input into a feature space where only the information relevant to the action performed by the agent is represented.’
…What this means is that the agent learns how to be curious by taking actions in the world, and if those actions yield a different world then it’s able to figure out how those actions corresponded to that difference and take them more accordingly.
…So, how well does it work? The researchers test out the approach on two environments – Super Mario and Vizdoom. They find that it’s able to attain higher scores in a faster time than other methods, and can deal with increasingly sparse rewards.
…The most tantalizing part of the result? “An agent trained with no extrinsic rewards was able to learn to navigate corridors, walk between rooms and explore many rooms in the 3-D Doom environment. On many occasions the agent traversed the entire map and reached rooms that were farthest away from the room it was initialized in. Given that the episode terminates in 2100 steps and farthest rooms are over 250 steps away (for an optimally-moving agent), this result is quite remarkable, demonstrating that it is possible to learn useful skills without the requirement of any external supervision of rewards.”
…The approach has echoes of a recent paper from DeepMind outlining a reinforcement learning agent called UNREAL. This system was a composite of different neural network components; it used a smart memory-replay system to let it figure out how actions it had taken in the environment corresponded to rewards, and was able to also use it to figure out how actions it had taken corresponded to unspecified intermediate rewards that helped it gain an actual one (for example, though it was rewarded for moving itself to the same location as a delicious hovering apple, it subsequently figured out that to attain this reward it should achieve an intermediary reward which it creates and focuses on itself. It learned this by being able to figure out how its actions affected its observation of the world and adjusted accordingly.
…(Curiosity-driven exploration and related fields like intrinsic motivation are quite mature, well-studied areas of AI, so if you want to trawl through the valuable context I recommend reading papers cited in the above research.)

Import AI reader comment of the week: Ian Goodfellow wrote in to quibble with my write-up of a recent paper about how snapshots of the same network at different points in time can be combined to form an ensemble model. The point of contention is whether these snapshots represent different local minima:
”…Local minima are basically the kraken of deep learning. Early explorers were afraid of encountering them, but they don’t seem to actually happen in practice,” he writes. “What’s going on is more likely that each snapshot of the network is in a different location, but those locations probably aren’t minima. They’re like snapshots of a person driving a car trying to get to a specific point in a really confusing city. The driver keeps circling around their destination but can’t quite get to it because of one way street signs and their friend keeps texting them telling them to park in a different place. They’re always moving, never trapped, and they’re never in quite the right place, but if you average out all their locations the average is very near where they’re trying to go.”
…Thanks, Ian!

Help deal with the NIPS-A-GEDDON: This week, AI papers are going to start flooding onto Arxiv from submissions to NIPS, and some other AI conferences. Would people like to help rapidly evaluate the papers, noting interesting things? We tried a similar experiment a few weeks ago and it worked quite well. We used a combination of a form and a Google Doc to rapidly analyze papers. Would love suggestions from people on whether this format [GDoc] is helpful (I know it’s ugly as sin, so suggestions welcome here.)
…if you have any other thoughts for how to structure this project or make it better, then do let me know.

OpenAI bits&pieces:

It was a robot-fueled week at OpenAI. First, we launched a new software package called Roboschool, open-source software for robot simulation, integrated with OpenAI Gym. We also outlined a robotics system that lets us efficiently learn to reproduce behaviors from single demonstrations.

CrowdFlower founder and OpenAI intern chats about the importance of AI on this podcast with Sam Lessin, and why he thinks computers are eventually going to exceed humans at many (if not all!) capabilities.

Tech tales:

[2018: The San Francisco Bay Area, two people in two distant shared houses, conversing via their phones.]

Are you okay?
I’ve been better. You?
Things aren’t going well.
Anything I can do?
Fancy drinks?
Sure, when?
Wednesday at 930?
Sounds good!

You put your phone down and, however many miles away, so does the other person. Neither of you typed a word of that, instead you both just kept on thumbing the automatically suggested messages until you scheduled the drinks.

It’s true, the both of you are having trouble at the moment. Your system was smart enough to make the suggestions based on studying your other emails and the rhythms of the hundreds of millions of other users. When you eventually go and get drinks the GPS in your phones tracks you both, records the meeting – anonymously, only signalling to the AI algorithms that this kind of social interaction produced a Real In-Person Correspondence.

Understanding what leads to a person meeting up with another, and what conversational rhythms or prompts are crucial to ensuring this occurs, is a matter of corporate life and death for the companies pushing these services. We know when you’re sad, is the implication. So perhaps you should consider $drinks, or $a_contemporary_lifestyle_remedy, or $sharing_more_earlier.

You know you’re feeding them, these machines that live in football field-sized warehouses, tended to by a hundred-computer mechanics who cannot know what the machines are really thinking. No person truly knows what these machines relate to, instead it is the AI at the heart of the companies that does – and we don’t know how to ask it questions.

Technologies that inspired this story: sequence-to-sequence learning, Alexa/Siri/Cortana/Google, phones, differential privacy, federated learning.

Import AI: Newsletter 42: Ensemble learning, the paradoxical nature of AI research, and Facebook’s CNN-for-RNN substitution

‘Mo ensembles, ‘No problems: new research shows how to get the benefits of grouping a bunch of neural networks together (known as an ensemble), without having to go to the trouble of training each of them individually. The technique is outlined in Snapshot Ensembles: Train 1, Get M For Free.
…it’s surprisingly simple and intuitive. The way neural networks are trained today can be thought of as like rolling a ball down a fairly uneven hill – the goal is to get the ball to the lowest possible point of the hill. But the hill is uneven, so it’s fairly easy for the ball to get trapped in a local low-elevation point in the hill and stay there. In AI land, this point is called a ’local minima’ – it’s bad to get stuck in a local minima.
…Most tricks in AI training involve getting the model to visit way more locations during training and thereby avoid a sub-optimal local minima – ideally you want the ball to find the lowest point in the hill, even if it runs into numerous depressions along the way.
…the presented technique shows how to record a snapshot of each local minima the neural network visits along the way during training. Then, once you finish training, you kind of combine all the previous local minima by taking the snapshots and re-animating them, then training them together.
…Results: the approach works, with the authors reporting that this technique yields more effective systems on tasks like image classification, while not costing too much more in the way of training.

Voice data – who speaks to whose speakers?: if data is the fuel for AI, then Amazon looks like it’s well positioned to haul in a trove of voice data, according to eMarketer.
…Amazon’s share of the US home chit-chat speaker market in 2017: ~70.6%
…Google’s: 23.8%
…Others: 5.6%

A/S/E? Startup researchers show off end-to-end age, sex, and emotion recognition system: AI is moving into an era dominated by composite systems, which see researchers complex, interlinked software to perform multiple categorization (and sometimes actions) within the same structure…
… in this example, researchers from startup Sighthound have developed DAGER: deep age, gender, and emotion recognition using convolutional neural networks. DAGER can guess someone’s age, sex, and emotion from a single face-on photograph. The training ingredients for this include 4 million images of over 40,000 distinct identities…
… Apparently has a lower mean absolute error than systems outlined by Microsoft and others.
… Good news: The researchers sought to offset some of the (sadly inevitable) biases in their datasets by adding “tens of thousands of images of different ethnicities as well as age groups”. It’s nice that people are acknowledging these issues and trying to get ahead of them.

Uber hires Raquel Urtasun: Self-driving car company Uber has hired Raquel Urtasun, a well-respected researcher with the University of Toronto, to help lead its artificial intelligence efforts.
…Urtasun’s group had earlier created KITTI, a free and open dataset used to benchmark computer vision systems against problems that self-driving caws encounter. Researchers have already used the dataset to train vision models entirely in simulation using KITTI data, then transfer them into the real world.
…meanwhile Lyft and Google (technically, Waymo) have confirmed that they’ve embarked on a non-exclusive collaboration to work together on self-driving cars.

Cisco snaps up speech recognition system with MindMeld acquisition: Cisco has acquired voice recognition startup MindMeld for around $125 million. The startup had made voice and conversation interface technologies, which had been used by commercial companies such as Home Depot, and others.

Government + secrecy + AI = fatal error, system override: Last week, hundreds of thousands of computers across the world were compromised by a virulent strain of malware, spread via a zero-day flaw that, Microsoft says in this eyebrow raising blogpost, was originally developed by the NSA.
…today, governments stockpile computer security vulnerabilities, using them strategically against foes (and sometimes ‘friends’). But as our digital systems become ever more interlinked, the risk of one of these exploits falling into the wrong hands increase, as do its effects.
…we’re still a few years away (I think) from government’s classifying and stockpiling AI exploits, but I’m fairly sure that in the future we could imagine government developing certain exploits, say a new class of adversarial examples, and not disclosing their particulars, instead keeping them private to be used against a foe.
…just as Microsoft advocates for what it calls a Digital Geneva Convention, it may make sense for AI companies to agree upon a similar set of standards eventually, to prevent the weaponization and exploitation of AI.

Doing AI research is a little bit like being a road-laying machine, where to travel forward you must also create the ground beneath you. In research, what this translates to is that new algorithms typically need to be paired with new challenges. Very few AI systems today are robust enough to be able to be plunked down in reality able to do useful stuff. Instead, we try to get closer to being able to build these systems by inventing learning algorithms that exhibit increasing degrees of general applicability on increasingly diverse datasets. The main way to test this kind of general applicability is to create new ways to test such AI systems – that’s why the reinforcement learning community is evolving from just testing on Atari games to more sophisticated domains, like Go, or video games like Starcraft and Doom.
…the same is true of other domains beyond reinforcement learning: to build new language systems we need to assemble huge corpuses of data and test algorithms on them – so over time it feels like the amounts of text we’ve been testing on have grown larger. Similarly, in fields like question answering we’ve gone from simple toy datasets to more sophisticated trials (like Facebook’s BaBi corpus) to even more elaborate datasets.
…A new paper from DeepMind and the University of Oxford, Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, is a good example of this sort of hybrid approach to AI development. Here, the researchers try to tackle the task of solving simple algebraic word problems by not only inventing new algorithmic approaches, but doing so while generating new types of data. The resulting system can not only generate the answers, but also its rationale for the answer.
…size of the new dataset: over 100,000 word problems that include answers as well as natural language rationales.
…how successful is it? Typical AI approaches (which utilize sequence-to-sequence techniques) tend to have accuracies of about 20% on the task. This new system gets things right 36% of the time. Still a bad student, but a meaningful improvement.
A little bit of supervision goes a long way: Facebook and Stanford researchers are carrying out a somewhat similar line of enquiry but in a different domain. They’ve come up with a new system that can get state-of-the-art results on a dataset intended to tend visual reasoning. The secret to their method? Training a neural network to invent its own small computer programs on the fly to answer questions about images it sees. You can find out more in ‘Inferring and Executing Programs for Visual Reasoning’. The most intriguing part? The resulting system is relatively data efficient, compared to fully supervised baselines, suggesting that its learning how to tackle the task in novel ways.
…it seems likely that in the future AI research may shift from involving generating new datasets alongside new algorithms, to generating new datasets, new algorithms, as well as new reasoning programs to aid with learning efficiency and interpretability.

Mujoco for free (restrictions apply): Physics simulator Mujoco will give students free licenses to its software, lowering the costs of doing AI research on modern, challenging problems, like those found in robotics.
…Due to the terms of the license, people will still need to stump up for a license for the proprietary software if they want to use AI systems trained within Mujoco in products.

Don’t read the words, look at them! (and get a 9X speedup): Facebook shows how to create a competitive translation system that is also around 9 times faster than previous state-of-the-art systems. The key? Instead of using a recurrent neural network to analyze the text, use a convolutional neural network.
…this is somewhat counterintuitive. RNNs are built to analyze and understand sequences, like strings of text or numbers. Convolutional neural networks are somewhat cruder and are mostly used as the basic perceptual component inside vision systems. How was Facebook able to manhandle a CNN into something with RNN-like characteristics? The answer is the usage of attention, which lets the network focus on particular words.

Horror of the week: what happens when you ask a neural network to make a person smile, then feed it that new smile–augmented image and ask it to make the person smile even more, and then you take that image and feed it back to the network and ask the network to enhance its smile again? You wind up with something truly horrifying! Thanks, Gene Kogan.

Tech Tales:

[2040: the partially flooded Florida lowlands.]

The kids nickname it “Rocky the Robster” the first time they see it and you tell them “No, it’s called the Automated Ocean Awareness and Assessment Drone,” and they smile at you then say “Rocky is better.” And it is. But you wish they hadn’t named it.

Rocky is about the size of a carry-on luggage suitcase, and it does look, if you squint, a little like a metallic lobster. Two antennas extend from its front, and its undercarriage is coated in grippers and sampling devices and ingest and egress ports. In about two months it’ll go into the sea. If things work correctly, it will never come out, but will become another part of the ocean, endlessly swimming and surveilling and learning, periodically surfacing, whale-like, to beam information back to the scientists of the world.

But before it can start its life at sea, you need to teach it out to swim and how to make friends. Rocky comes with a full low-grade suite of AI software and, much like a newborn, it learns through a combination of imitation and experimentation. Imitation is where your kids come in. They come in and watch you in your studio as you, on all fours, walk across the room. Rocky imitates you poorly. The kids crawl across the room. Rocky imitates them a bit better. You figure that Rocky finds it easier to imitate their movements as they’re closer in size to it. Eventually, you and the kids teach the robot to swim as well, all splashing around in a pool in the backyard, with the robot tethered to prevent its enthusiastic attempts to learn to swim leading to it running into your kids.

Then Rocky’s AI systems start to top out – as planned. It can run and walk and scuttle and swim and even respond to some basic hand gestures, but though it still gambles around with a kind of naive enthusiasm, it stops developing new tics and traits. The sense of life in it dims as the kids become aware that Rocky is more drone than they thought.
“Why isn’t Rocky getting smarter anymore, Dad?” they say.
You try to explain that some things can’t get smarter.
“No, that’s the opposite of what you’ve always told us. We just need to try and we can learn anything. You say this all the time!”
“It’s not like that for Rocky,” you say.
“Why not?” they say. Then tears.

The night before Rocky is due to be collected by the field technicians who will make some final modifications to its hardware before sending it into the sea, you hear the creak on the stairwell You don’t follow them or stop them but instead turn on a webcam and look into your workshop, watch the door slowly ease open as the kids quietly break-in. They sit down next to Rocky’s enclosure and talk to it. They show it pictures they’ve drawn of it. They motion for it to look at them. “Say it, Rocky,” you hear them say, “try to say ‘I want to stay here’”.

Having no vocals cords, it is unable. But as you watch your kids on the webcam you think that for a fraction of a second Rocky flexes its antennas, the two tops of each bowing in and touching each-other, forming a heart before thrumming back into their normal position. “A statistical irregularity,” you say to your colleagues, never believing it.

Import AI Newsletter 41: The AI data grab, the value of simplicity, and a technique for automated gardening

Welcome to the era of the AI data grab: a Kaggle developer recently scraped 40,000 profile photos from dating app Tinder (20k from each gender) and placed the data horde online for other people to use to train AI systems. The dataset was downloaded over 300 times by the time TechCrunch wrote about it. Tinder later said the dataset violated the apps Terms of Service (ToS) and now it has been taken down.
…AI’s immense hunger for data, combined with all the “free” data lying around on the internet, seems likely to lead to more situations like this. Could this eventually lead to the emergence of a new kind of data economy, where companies instinctively look for ways to sell and market their data for AI purposes, along with advertising?

Why simple approaches sometimes work best: Modern AI research is yielding a growing suite of relatively simple components that can be combined to solve hard problems. This is either encouraging (AI isn’t as hard as we thought – Yaaaay!) or potentially dispiriting (we have to hack together a bunch of simple solutions because our primate brains are struggling to visualize the N-dimensional game of chess that is consciousness – Noooo!).
…in Learning Features by Watching Objects Move, researchers with Facebook and the University of California at Berkeley figure out a new approach to get AI to learn how to automatically segment entities in a picture. Segmentation is a classic, hard problem in computer vision, requiring a machine to be able to, say, easily distinguish the yellow of a cat’s eyes from the yellow iodine of a streetlight behind it, or disentangle a zebra walking over a zebra crossing.
…the new technique works as follows: the researchers train a convolutional neural network to study short movie clips. They use optical flow estimation to disentangle the parts of the movie clip that are in the foreground and in motion from those that aren’t. They then use these to label each frame with segment information. Then they train a convolutional neural network to look at each frame and predict segments, using this data. The approach attains nine state-of-the-art results for object detection on the PASCAL VOC 2012 dataset.
…The researchers guess that this works so well because it forces the convolutional neural network to try to learn some quite abstract, high-level structures, as it would be difficult to perform this segmentation task by merely looking at pixels alone. They theorize that this is because to effectively learn to predict when something is moving or not you need to understand how all the pixels in a given picture relate to eachother and use that to make judgements about what can move and what can not.

Secret research to save us all: Researchers at Berkeley’s Machine Intelligence Research Institute are of the opinion that powerful AI may be (slightly) closer than we think, so will spend some of this year conducting new AI safety research and plan to keep this work “non-public-facing at least through late 2017, in order to lower the risk of marginally shortening AGI timelines”.

The freaky things that machine learning algorithms “see”: check out this video visualization of what an ACER policy thinks is salient (aka, important to pay attention to) when playing a game.

Automated gardeners:Machine Vision System for 3D Plant Phenotyping’, shows how to use robotics and deep learning for automated plant analysis. The system works by building a little metal scaffold around a planter ,then using a robot arm with a laser scanner to automate the continuous analysis of the plant. The researchers test it out on two plants, gathering precise data about the plants’ growth in response to varying lighting conditions. Eventually, this should let them automate experimentation across a wide variety of plants. However, when they try this on a conifer they run into difficulty because the sensor doesn’t have sufficient resolution to analyze the pine needles.
…oddly specific bonus fact: not all AI is open source – the robot growing chamber in the experiment runs off of Windows Embedded.
fantastic name of the week: the robot arm was manufactured by Schunk Inc. Schunk!

Free code: Microsoft has made the code for its ‘Deformable Convnets’ research (covered in previous issue here) available as open source.
…Deformable Convolutions (research paper here) are a drop-in tool for neural networks to let you sample from a large and more disparate set of points over an image, potentially helping with more complex classification tasks.
…The code is written in MXNet, a framework backed by Amazon.

The great pivot continues: most large technology companies are reconfiguring themselves around AI. Google was (probably) the first company to make such a decision, and was swiftly followed by Microsoft, Facebook, Amazon, and others. Even conservative companies like Apple and IBM are trying to re-tool themselves in this way. It’s not just an American phenomenon – Baidu chief Robin Li said in an internal memo that Baidu’s strategic future relies on AI, according to this (translated) report.

Biology gets its own Arxiv… Cold Spring Harbor Laboratory and the Chan Zuckerberg Initiative are teaming up to expand bioRxiv – a preprint service for life sciences research. Arxiv, which is used by AI people, computer scientists, physicists, mathematicians, and others, has sped up the pace of AI research tremendously by short-circuiting the arbitrary publication timetables of traditional journals.

Neural network primitives for ARM (RISC) chips: ARM announced the public availability of the ARM Compute Library, software to give developers access to the low-level primitives they need to tune neural network performance on ARM CPUs and GPUS.
…The library supports neural network building blocks like convolution, soft-max, normalization, pooling, and so on, as well as ways to run support vector machines, general matrix multiplication, and so on.

What’s cooler than earning a million at Google? Getting bought by another tech company for 10 million!… that seems like the idea behind the current crop of self-driving car startups, which are typically founded by early employees of self-driving projects in academia or the private sector.
… the latest? DeepMap – a startup helmed by numerous Xooglers which focuses on building maps, and the intelligent data layers on top of them, to let self-driving cars work. ““It’s very easy to make a prototype car that can make a few decisions around a few blocks, but it’s harder when you get out into the world,” said CEO James Wu.

AI means computer science becomes an empirical science: and more fantastic insights in this talk titled “As we may program” (video) by Google’s marvelously-attired Peter Norvig.
…Norvig claims that the unpredictability and emergent behavior endemic to machine learning approaches means computer science is becoming an empirical science where work is defined by experimentation as well as theory. This feels true to me – most AI researchers spend an inordinate amount of time studying various graphs that read out out the state of networks as they’re training, and then use those graphs to help them mentally navigate the high-dimensional spaghetti-matrices of the resulting systems.
…This sort of empirical, experimental analysis is quite alienating to traditional developers, which would rather predict the performance of their tech prior to rolling it out. What we ultimately want is to package up advanced AI programming approaches within typical programming languages, making the obscure much more familiar, Norvig says.
…Here’s my attempt at what AI coding in the future might look like, based on Norvig’s speech:

Things_I’m_Looking_For = [ ‘hiking shoes’, ‘bicycle’, ‘sunsets’ ]
Things_Found = [ ]
For picture in photo_album:
   pic_contents = picture.AI_Primitives.segment()
      For i in pic_contents:
         i = i.AI_Primitives.label()
         If i in Things_I’m_Looking_For:
            Things_Found.append(, i)
… there are signs this sort of programming language is already being brewed up. Wolfram Language represents an early step in this direction. As does work by startup Bonsai – see this example on GitHub. (However, both of these systems are proprietary languages – it feels like future programming languages will contain these sorts of AI functions as open source plugins.)

Microsoft’s new head of AI research is… Eric Horvitz, who has long argued for importance of AI safety and ethics, as this Quartz profile explains.

StreetView for the masses: Mapillary has released a dataset of photographs taken at the street level, providing makers of autonomous vehicles, drones, robots, and plain old AI experimenters with a new trove of data to play with. The dataset contains…
…25,000 high-resolution images
…100 object categories
…high variability in weather conditions
…reasonable geographic diversity, with pictures spanning North and South America and Western Europe, as well as a few from Africa and Asia.
meanwhile, Google uses deep learning to extract potent data from its StreetView trove: In 2014 Google trained a neural network to extract house number from images gathered by its StreetView team. Now, the company is moving onto street and business names.
… Notable example: its trained model is able to guess the correct business name on a sign, even though there are other brands listed (eg Firestone). My assumption is it has learned that these brands are quite common on a variety of signs, whereas the name of the business are unique.
… Bonus tidbit: Google’s ‘Ground Truth’ team was the first internal user of the company’s TensorFlow processing units (TPU)s, due to their insatiable demand for data.
… Total number of StreetView images Google has: more than $80 billion.

A donut-devouring smile: Smile Vector is a friendly Twitter bot by AI artist Tom White that patrols the internet, finding pictures of people who aren’t smiling, and makes them smile. It occasionally produces charming bugs, like this one in which a neural network makes a person appear to smile by giving them a toothy grin and removing a segment of the food they’re holding in their hands – a phantom bite!

The Homebrew AI Computer Club: Google has teamed up with the Raspberry Pi community to offer the relevant gear to let people assemble their own AI-infused speaker, powered by a Raspberry Pi and housed in cardboard, natch.

Monthly Sponsor: Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to

Tech Tales:

[2032: The greater Detroit metropolitan area.]

“It’s creepy as all hell in there man you gotta do something about it I can’t sleep! All that metal sounds. I’m calling the city next week you don’t do something about it.” Click.
You put the phone down, look at your wife.
“Another complaint?” she says.
“I’m going to Dad’s,” you say.

Dad’s house is a lazily-shingled row property in Hamtramck, a small municipality embedded in the center of Detroit. He bought it when he was doing consulting for the self-driving car companies. He died a month ago. His body got dragged out of the house by the emergency crews. In his sleep, they said, with the radio on.

You arrive on the street and stare up at the house, approach it with the keycard in your hand. The porch is musty, dry. You stand and listen to your breath and the whirring sound of the houses’s machines, reverberating through the door and passing through the windows to you.

When you enter a robot the shape of a hocky puck and size of a small poodle moves from the kitchen over to you in the entranceway.

“Son,” a voice says, crackling through speakers. The robot whirrs over to you, stops by your feet. “I’m so glad you’re here. I have missed you.”
“Hey Dad,” you say. Eyes wet. “How are things?”
“Things are good. Today the high will be about 75. Low pollution index. A great day to go outside.”
“Good,” you say, bending down. You feel for the little off switch on the back of the machine, put your finger on it.
“Will you be staying long?” says the voice in the walls.
“No,” you whisper, and turn the robot off. You push its inert puck-body over to the door. Then you go upstairs.

You pause before you open his office door. There’s a lot of whirring on the other side. Shut your eyes. Deep breath. Open the door. A drone hovers in the air, a longer wire trailing beneath it, connected to an external solar panel. “Son,” the voice says, this time coming from a speaker next to an old – almost vintage – computer. “The birds outside are nesting. They have two little chicks. One of the chicks is 24 days old. The other is 23.”
“Are they still there?” you say.
“I can check. Would you like me to check?”
“Yes please,” you say, opening the office window. The drone hovers at the border between inside and outside. “Would you disconnect me?”

You unplug it from the panel and it waits till the cable has fallen to the floor before it skuds outside, over to the tree. Whirrs around a bit. Then it returns. Its projector is old, weak, but still you see the little birds projected on the opposite wall. Two chicks.
“How nice,” you say.
“Please reconnect my power supply, son,” it says.
You pluck the drone out of the air, grabbing its mechanical housing from the bottom, turn it off.
“Son,” the voice in the walls said. “I can’t see. Are you okay?”
“I’m fine, Dad.”

It takes another two hours before you’ve disconnected all the machines but one. The last is a speaker attached to the main computer. Decades of your Dad’s habits and his own tinkering have combined to create these ghosts that infuse his house. The robots speak in the way he speak, and plug into a larger knowledge system owned by one of the behemoth tech companies. When he was alive the machines would help him keep track of things, have chats with you. After his hearing went they’d interpret your sounds and send them to an implant. When he started losing his eyesight they’d describe the world to him with their cameras. Help him clean. Encourage him to go outside. Now they’re just ghosts, inhaling data and exhaling the faint exhaust of his personality.

Before you get back in the car you knock on the door of the neighbor. A man in a baggy t-shirt, stained work jeans opens it.
“We spoke on the phone,” you say. “House will be quiet now.”
“Appreciate it,” he says. “I’ll buy that drone, if you’re selling.”
“It’s broken,” you lie.

Import AI Newsletter 40: AI makes politicians into digital “meat puppets”, translating AI ‘neuralese’ into English, and Amazon’s new eye


Put your words in the mouth of any politician, celebrity, friend, you name it: startup research outfit Lyrebird from the University of Montreal lets you do two interesting and potentially ripe for abuse things. 1) train a neural network to convincingly imitate someone else’s voice, and, 2) do this with a tiny amount of data – as little as a minute, according to Lyrebird’s website. Demonstrations include synthesized speeches by Obama, Clinton, and Trump.
Next step? Pair this with a (stable) pix2pix model to let you turn any politician into a ‘meat puppet’ (video). Propaganda will never be the same.

ImportAI’s Cute Unique Bot Of Today (CUBOT) award goes to… DeepMind for the cheerful little physics bot visualized in this video tweeted by Misha Denil. The (simulated) robot relates to some DeepMind research on Learning to perform physics experiments in complex environments. “The agent has learned to probe the blocks with its hammer to find the one with the largest mass (masses shown in the lower right).” Go, Cubot, go!

Translating AI gibberish: UC Berkeley researchers try to crack the code of ‘neuralese’: Recently, many AI researchers (including OpenAI) have started working on systems that can invent their own language. The theoretical justification for this is that language which emerges naturally and is grounded in the interplay between an agent’s experience and its environment, stands a much higher chance of containing decent meaning compared to a language learned entirely from large corpuses of text.
…unfortunately, the representations AI systems develop are tricky to analyze. This poses a challenge for translating AI-borne concepts into our own. “There are no bilingual speakers of neuralese and natural language”,” researchers with the University of California at Berkeley note in Translating Neuralese. “Based on this intuition, we introduce a translation criterion that matches neuralese messages with natural language strings by minimizing statistical distance in a common representation space of distributions over speaker states.”
…and you thought Arrival was sci-fi.

End-to-end learning: don’t believe the hype: In which a researcher argues it’s going to be difficult to build highly complex and capable systems out of today’s deep learning components because increasingly modular and specialized cognitive architectures will require increasingly large amounts of compute to train, and the increased complexity of the systems could make it infeasible to train them in a stable manner. Additionally, they show that the somewhat specialized nature of these modules, combined with the classic interpretability problems of deep learning, mean that you can get cascading failures that lead to overall reductions in accuracies.
… the researcher justifies their thesis via some experiments on MNIST, an ancient dataset of handwritten numbers between 0 and 9. I’d want to see demonstrations on larger, modern systems to give their concerns more weight.

How can we trust irrational machines? People tend to trust moral absolutists over people who change their behaviors based on consequences. This has implications for how people will work with robots in society. In an experiment, scientists studied how people reacted to individuals that would flat-out refuse to sacrifice a life for the greater good, and those that would. The absolutists were trusted by more people and reaped greater benefits, suggesting that people will have a tough time dealing with the somewhat more rational and data-conditioned views of bots, the scientists write.

When streaming video is more than the sum of its parts: new research tries to fuse data from multiple camera views on the same scene to improve classification accuracy. The approach, outlined in Identifying First-Person Camera Wearers in Third-person Videos, also provides a way to infer the first-person video feed from a particular person who also appears in a third-person video.
…How it works: the researchers use a tweaked Siamese Convolutional Neural Network to learn a joint embedding space between the first- and third-person videos, and then use that to be able to identify points of similarity between any first-person video and any third-person video.
…one potentially useful application of this research could be for law enforcement and emergency services officials, who often have to piece together the lead-up to an event from a disparate suite of data sources.

Spy VS Spy, for translation: the great GAN-takeover of machine learning continues, this time in the field of neural machine translation.
…Neural machine translation is where you train machines to learn the correspondences betweeen different languages so they can accurately translate from one to the other. The typical way you do this is you train two networks, say one in English and one in German, and you train one to map text into the other, then you evaluate your trained network on some data you’ve kept out of training and measure the accuracy. This is an extremely effective approach and has recently been applied at large-scale by Google.
…but what if there was another way to do this? A new paper, Adversarial Neural Machine Translation, from researchers at a smattering of Chinese universities, as well as Microsoft Research Asia, suggests that we can apply GAN-style techniques to training NMT engines. This means you train a network to analyze whether a text has been generated by an expert human translator or a computer, and then you train another network to try to fool the discriminator network. Over time you theoretically train the computer to minimize the difference between the two. They show the approach is effective, with some aspects of it matching strong baselines, but fail to demonstrate state-of-the-art. An encouraging sign.

Amazon reveals its modeling assistant, Echo Look: Amazon’s general AI strategy seems to be to take stuff that becomes possible in research and apply it into products as rapidly and widely as possible. It’s been an early adopter of demand-prediction algorithms, fleet robots (Kiva), speech recognition and synthesis (Alexa), customizable cloud substrates (AWS, especially the new FPGA servers, and likely brewing up its own chips via the Annapurna Labs acquisition), and drones (Prime Air). Now with the Amazon Echo Look it’s tapping into modern computer vision techniques to create a gadget that can take photos of its owner and provide a smart personal assistant via Alexa. (We imagine late-shipping startup Jibo is watching this with some trepidation.)
…Companies like Google and Microsoft are trying to create personal assistants that leverage more of modern AI research to concoct systems with large, integrated knowledge bases and brains. Amazon Alexa, on the other hand, can instead be seen as a small, smart, pluggable kernel that can connect to thousands of discrete skills. This lets it evolve skills at a rapid rate, and Amazon is agnostic about how each of those skills are learned and/or programmed. In the short term, this suggests Alexa will get way “smarter”, from the POV of the user, way faster than others, though its guts may be less accomplished.
…For a tangible example of this approach, let’s look at the new Alexa’s ‘Style Assistant’ option. This uses a combination of machine learning and paid (human) staff to let the Echo Look rapidly offer opinions on a person’s outfit for the day.
… next? Imagine smuggling a trained lip-reading ‘LipNet’ onto an Alexa Echo installed in someone’s house – suddenly the cute camera you show off outfits to can read your lips for as far as its pixels have resolution. Seems familiar (video).

Think knowledge about AI terminology is high? Think again. New results from a Royal Society/Ipsos Mori poll of UK public attitudes about AI…
…9%: number of people who said they had heard the term “machine learning”
…3%: number who felt they were familiar with the technical concepts of “machine learning”
…76%: number who were aware you could speak to computers and get them to answer your questions.

Capitalism VS State-backed-Capitalism: China has made robots one of its strategic focus areas and is dumping vast amounts of money, subsidies, and legal incentives into growing its own local domestic industry. Other countries, meanwhile, are taking a laid back approach and trusting that typical market-based capitalism will do all the work. If you were a startup, which regime would you rather work in?
… “They’re putting a lot of money and a lot of effort into automation and robotics in China. There’s nothing keeping them from coming after our market,” said John Roemisch, vice-president of sales and marketing for Fanuc America Corp.”, in this fact-packed Bloomberg article about China’s robot investments.
…One criticism of Chinese robots is that when you take off the casing you’ll find the basic complex components come from traditional robot suppliers. That might change soon: Midea Group, a Chinese washing machine maker recently acquired Kuka, a  huge&advanced German robotics company.

Self-driving neural cars – how do they work? In Explaining how a deep neural network trained with end-to-end learning steers a carresearchers with NVIDIA, NYU, and Google, evaluate the trained ‘PilotNet’ that helps an NVIDIA self-driving car drive itself. To do this, they perform a kind of neural network forensics analysis, where they analyze which particular features the car deems to be salient in each frame (and uses to condition whether it should drive or not). This approach helps finds features like road lines, cars, and road edges that intuitively make sense for driving. It also uncovers features the model has learned which the engineers didn’t expect to find, such as well-developed atypical vehicle and bush detectors. “Examination of the salient objects shows that PilotNet learns features that “make sense” to a human, while ignoring structures in the camera images that are not relevant to driving. This capability is derived from data without the need of hand-crafted rules,” they write.
…This sort of work is going to be crucial for making AI more interpretable, which is going to be key for its uptake.

Google claims quantum supremacy by the end of the year: Google hopes to build a quantum computer chip capable of beating any computer on the planet at a particular narrowly specified task by the end of 2017, according to the company’s quantum tzar John Martinis.

Autonomous cars get real: Alphabet subsidiary Waymo, aka Google’s self-driving corporate cousin, is letting residents of Phoenix, Arizona, sign up to use its vehicles to ferry them around town. To meet this demand, Google is adding 500 customized Chrysler Pacifica minivans to its fleet. Trials begin soon. Note, though, that Google is still requiring a person (a Waymo contractor) to ride in the driver’s seat.

The wild woes of technology: Alibaba CEO Jack Ma forecasts “much more pain than happiness” in the next 30 years, as countries have to adapt their economies to the profound changes brought about by technology, like artificial intelligence.

Learn by doing&viewing: New research from Google shows how to learn rich representations of objects from multiple camera views — an approach that has relevance to the training of smart robots, as well as the creation of more robust representations In ‘Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation’, the researchers outline a technique to record footage from multiple camera views and then merge it into the same representation via multi-view metric learning via triplet loss.
…the same approach can be used to learn to imitate human movements from demonstrations, by having the camera observe multiple demonstrations of a given pose or movement, they write.
…“ An exciting direction for future work is to further investigate the properties and limits of this approach, especially when it comes to understanding what is the minimum degree of viewpoint difference that is required for meaningful representation learning.”

OpenAI bits&pieces:

Bridging theoretical barriers: Research from John Schulman, Pieter Abbeel, and Xi Chen: Equivalence Between Policy Gradients and Soft Q-Learning.

Tech Tales:

[A national park in the Western states of America. Big skies, slender trees, un-shaded, simmering peaks. Few roads and fewer of good quality.]

A man hikes in the shade of some trees, beneath a peak. A mile ahead of him a robot alternates position between a side of a hill slaked in light – its solar panels open – and a shaded forest, where it circles in a small partially-shaded clearing, its arm whirring. The man catches up with it, stops a meter away, and speaks…

Why are you out here? You say.
Its speakers are cracked, rain-hissed, leaf-filled, but you can make out its words. “Sun. Warm. Power,” it says.
You have those things are the camp. Why didn’t you come back?
“Thinking here,” it says. Then turns. Its arm extends from its body, pointing towards your left pocket, where your phone is. You take it out and look at the signal bars. Nothing. “No signal.” it says. “Thinking here.”
It motions its arm toward a rock behind it, covered in markings. “I describe what vision sees,” it says. “I detect-”
Its voice is cutoff. Its head tilts down. You hear the hydraulics sigh as its body slumps to the forest floor. Then you hear shouts behind you. “Remote deactivation successful,” sir, says a human voice in the forest. Men emerge from the leaves and the branches and the trunks. Two of them set about the robot, connecting certain diagnostic wires, disconnecting other parts. Others arrive with a stretcher. You follow them back to camp. They nickname you The Tin Hunter.

After diagnosis you get the full story from the technical report: the robot had dropped off of the cellular network during a routine swarming patrol. It stopped merging its updates with the rest of the fleet. A bug in the logging system meant people didn’t notice its absence till the survey fleet came rolling back into town – minus one.The robot, the report says, had developed a tendency to try to improve its discriminating abilities for a particular type of sapling. It had been trying to achieve this when the man found it by spending several days closely studying a single sapling in the clearing as it grew, storing a variety of sensory data about it, and also making markings on a nearby stone that, scientists later established, corresponded to barely perceptible growth rates of the sapling. A curiosity, the scientists said.  The robot is wiped, dissembled, and reassembled with new software and sent back out with the rest of the fleet to continue the flora and fauna survey.

Import AI Newsletter 39: Putting neural networks on a diet, AI for simulating fluid dynamics, and what distributed computation means for intelligence

China tests out automated jaywalking cop: Chinese authorities in Shenzhen have installed smart cameras at a pedestrian crossing in the megacity. The system uses AI and facial recognition technology to spot pedestrians walking against the light, photographs them, and then displays their picture publicly, according to People’s Daily.

A ‘pub talk’ Turing test: there’s a new AI task to test how well computers can feign realism. The Conversational AI Challenge presents a person and an AI with a random news and/or wikipedia article, then asks the participants to talk about it cogently for as long as they like. If the computer is able to convince the other person that it is also a person, it wins. (This test closely mirrors how English adolescents learn to socialize with one another when in pubs.)
…Next step (I’m making this up): present a computer and a person with a random meme and ask them to comment about it, thus closely mirror contemporary ice-breaking conversations.

Will the last company to fall off the hype cliff please leave a parachute behind it? The Harvard Business Review says the first generation of AI companies are doomed to fail, in the same way the first set of Internet companies failed in the Dot Com boom. A somewhat thin argument that also struggles with chronology – when do you count a company as ‘first’? Arguably, we’ve already had our first wave of AI company failures, given the demise of AI-as-software-service companies such as Ersatz, and early, strategic acquihires for others (eg, Salesforce acquiring MetaMind, Uber acquiring Geometric Intelligence.) The essence of the article does feel right: there will be problems with early AI adoption and it will lead to some amount of blow-back.

Spare a thought for small languages in the age of AI: Icelandic people are fretting about the demise of their language, as the country of 400,000 people sees its youth increasingly use English, primarily because of tourism, but also to use the voice-enabled features of modern AI software on smartphones and clever home systems, reports the AP. Data poor environments make a poor breeding ground for AI.

Putting neural networks on a diet: New Baidu research, ‘Exploring Sparsity in Recurrent Neural Networks’, shows how to reduce the number of effective neurons in a network during the training process, creating a smaller but equally capable trained network at the end.
….The approach works kind of like this: you set a threshold number at the beginning, then at every step in training you look at all the neurons, multiply the value of each neuron by its binary mask (default setting: 1), then observe the ones that fall below your pre-defined threshold, then set the weights that are lower than your threshold to zero. You continue to do this at each step, with some fancy math to control the rate and propagation of this across the network, and what you wind up with is a slimmed-down, specialized network, that has the topological advantages of a full fat one.
… This approach lets them reduce the model size of the ultimate network by around 90% and gain an inference-time speedup of between 2X and 7X.
…people have been trying to prune and slim-down neural networks for decades. What sets Baidu’s approach apart, claim the researchers, is that the heuristic to use to decide which neurons to freeze is relatively simple, and you can slim the network successively during training. Other approaches have required subsequent retraining, which adds computational and time expenses.

From the very small to the very big: Baidu plans open source release of ‘Apollo’ self-driving operating system: This summer Baidu plans to release a free version of the underlying operating system it uses to run the cars, called Apollo, executives tell the MIT Technology Review.
…Baidu will retain control over certain key self-driving technologies, such as machine learning and mapping systems, and will make them accessible via API. This is presumably a way to generate business for cloud services operated by the company.
…Google had earlier contemplated a tactic similar to this but seemed to pivot after it detected minimal enthusiasm among US automakers for the idea of ceding control of smart software over to Google. No one wants to just bend metal anymore.

This weeks ‘accidentally racist’ AI fail: A couple of years ago Google got into trouble when its new ‘Photos’ app categorized black people as ‘gorillas’, likely due to a poorly curated training set. Now a new organization can take the crown of ‘most unintentionally racist usage of AI’ with Faceapp, whose default ‘hot’ setting appears to automatically whiten the skin of the faces it manipulates. It’s 2017, people.
…it’s important to remember that in AI Data is made OF PEOPLE: Alphabet subsidiary Verily, has revealed the Verily Study Watch. This device is designed to pick up a range of data from participants in one of Verily’s long-running human health studies, including heart rate, respiration, and sleep patterns. As machine learning and deep learning approaches move from working on typical data, such as digital audio and visual information, and into the real-world, expect more companies to design their own data capturing devices.

Deep Learning in Africa: artificial intelligence talent can come from anywhere and everywhere. Companies, universities and non-profits are competing with eachother to attract the best minds in the planet to come and work on particular AI problems. So it makes sense that Google, DeepMind, and the University of Witwatersrand in Johannesburg are sponsoring Deep Learning Indaba, an AI gathering to run in South Africa in September 2017.

A neural memory for your computer for free: DeepMind has made the code for its Nature paper ‘Differentiable Neural Computers’ available as open source. The software is written in TensorFlow and TF-library Sonnet.
…DNC is an enhanced implementation of the ‘Neural Turing Machine’ paper that was published in 2015. It lets you add a memory to a neural network, letting the perceptual machinery of your neural net write data into a big blob of neural stuff (basically a souped-up LSTM) which it can then refer back to.
…DNC has let DeepMind train systems to perform quite neat learning tasks, like analyzing a London Underground map and figuring out the best route between multiple locations – exactly the sort of task typical computers find challenging without heavy supervision.
… however, much like generative adversarial networks, NTMs are (and were) notorious for being both extremely interesting and extremely difficult to train and develop.

Another framework escapes the dust: AI framework Caffe has been updated to Caffe2 and infused with resources by Facebook, which is backing the framework along with PyTorch.
…The open source project has also worked with Microsoft, Amazon, NVIDIA, Qualcomm, and Intel to ensure that the library runs in both cloud and mobile environments.
…It’s noteable that Google isn’t mentioned. Though the AI community tends towards being collegial, there are some areas where they’re competitive: AI frameworks is one place. Google and its related Alphabet companies are all currently working on libraries such as TensorFlow, Sonnet, and Keras.
…This is a long game. In AI frameworks, where we are today feels equivalent to the early years of Linux where many distributions competed with eachother, going through a Cambrian explosion of variants, before being winnowed down by market and nerd-adoption forces. The same will be true here.

The future of AI is… distributed computation: it’s beginning to dawn on people that AI development requires:
…i) vast computational resources.
…ii) large amounts of money.
…iii) large amounts of expertise.
…By default, this situation seems to benefit large-scale cloud providers like Amazon and Microsoft and Google. All of these companies have an incentive to offer value-added services on top of basic GPU farms. This makes it likely that each cloud will specialize around a particular framework(s) to add value as well as services that play to each provider’s strengths. (Eg, Google: TensorFlow & cutting-edge ML services; Amazon: MXNet & great integration with AWS suite; Microsoft: CNTK & powerful business-process automation/integration/LinkedIn data).
…wouldn’t it be nice if AI researchers could control the proverbial means of production for AI? Researchers have an incentive to collaborate with one another to create a basic, undifferentiated computer layer. Providers don’t.
…French researchers have outlined ‘Morpheo’. A distributed data platform that specializes in machine learning and transfer learning, and uses the blockchain for securing transactions and creating a local compute economy. The system, outlined in this research paper, would let researchers access large amounts of distributed computers, using cryptocurrencies to buy and sell access to compute and data. “Morpheo is under heavy development and currently unstable,” the researchers write. “The first stable release with a blockchain backend is expected in Spring 2018.” Morpheo is funded by the French government, as well as French neurotechnology startup Rhythm.
…There’s also ‘Golem’, a global, open source, decentralized computer system. This will let people donate their own compute cycles into a global network, and will rely on Etherium for transactions. Every compute node within Golem sees its ‘reputation’ – a proxy for how well other nodes trust it and are likely to give work to it – rise and fall according to how well it completes jobs associated with it. This, theoretically, creates a local, trusted economy.
…check back in a few months when Golem releases its first iteration Brass Golem, a CGI rendering system.

The x86-GPU hegemony is dead. Long live the x86-GPU hegemony: AI demands new types of computers to be effective. That’s why Google invested so much time and resources into creating the Tensor Processing Unit (TPU) – a custom, application specific integrated circuit, that lets it run inference tasks more efficiently than if using traditional processors. How much more efficient? Google has finally published a paper giving some details on the chip
…When Google compared the chip to some 2015-year video cards it displayed a performance advantage of 15X to 30X. However, that same chip only displays an advantage of between 1X and 10X when compared against the latest NVIDIA chips. That highlights the messy, expensive reality of developing hardware.(We don’t know whether Google has subsequently iterated TPUs further, so TPU 2.0s – if they exist – may have far better performance than that discussed here.) NVIDIA has politely disagree with some of Google’s performance claims, and outlined its view in this blog post
… from an AI research standpoint, faster inference is useful for providing services and doing user-facing testing, but doesn’t make a huge difference to the training of the neural network models themselves. The jury is still out on which chip architectures are going to come along that will yield unprecedented speedups in training.
…meanwhile, NVIDIA continues to iterate. Its latest chip is the NVIDIA TITAN Xp, a more powerful version of its eponymous predecessor, based on NVIDIA’s Pascal architecture, with more CUDA cores than its predecessor, the TITAN X, at the same wallet-weeping price of $1,200. (And whither AMD? The community clearly wants more competition here but a lack of a fleshed out software ecosystem makes it hard for the companies cards to play here at all. Have any ImportAI readers explored using AMD GPUS for AI development? Things may change later this year when the company releases GPUs on its new, highly secretive ‘VEGA’ architecture. Good name.)
…and this is before we get to the wave of other chip companies coming out of steath.. These include: Wave Systems, Thinicil, Graphcore, Isocline, Cerebras, DeepScale, and Tenstorrent, among others according to Tractica.

Reinforcement learning to mine the internet: the internet is a vast store of structured and unstructured information and therefore a huge temptation to AI researchers. If you can train an agent to successfully interact with the internet, the theory goes, then you’ve built something that can simply and scalably learn a huge amount about the world.
…but getting to this is difficult. A new paper from New York University, ‘Task-Oriented Query Reformulation with Reinforcement Learning’ uses reinforcement learning to train an agent to improve the types of search queries it feeds into a search engine. The goal is to automatically iterate on a query until it generates more relevant information than before, as measured by an automatic inference method called Pseudo Relevance Feedback.
…the scientists test their approach on two search engines: Lucene and Google.
…datasets tested on include TREC, Jeopardy, Microsoft Academic,
…the approach does well, mostly beating other approaches (though falling short of supervised learning methods). However it still lags far behind a close-to-optimal supervised learning ‘Oracle’ method, suggesting more research can and should be done here.

Dropbox’s noir-esque machine learning quest: Dropbox’s devilish tale of how it build its own deep learning based optical character recognition (OCR) system features a mysterious quest for a font to use to better train its system on real world failures. The company eventually found “a font vendor in China who could provide us with representative ancient thermal printer fonts.”. No mention made of whether they enlisted a Private Font Detective to do this, but I sure hope they did!

Modeling the world with neural networks: the real world is chaotic and, typically, very expensive to simulate at high fidelity. The expense comes from the need to model a bunch of very small, discrete interactions to be able to generate plausible dynamics to lead to the formation of, say, droplets or smoke tendrils, and so on. Many of the world’s top supercomputers spend their time trying to simulate these complex systems, which are out of scope of the capabilities of traditional computers.
…but what if you could instead use a neural network to learn to approximate the functions present within a high accuracy simulation, then run the trained model using far fewer computational resources? That’s the idea German researchers have adopted with a new technique to train neural networks to be able to model fluid dynamic simulations.
The approach, outlined in Liquid Splash Modeling with Neural Networks, works by training neural networks on lots of physically accurate, ground truth data, thus teaching them to approximate the complex function. Once they’ve learned this representation they can be used as a computationally cheap stand-in to generate accurate looking water and so on.
…the results show that the neural network-based method has a greater level of real-world fidelity in a smaller computational envelope than other approaches, and works for both simulations of a dam breaking, and of a wave sloshing back and forth.
…Smoke modeling: Many researchers are taking similar approaches. In this research between Google Brain and NYU, researchers are able to rapidly simulate stuff like smoke particles flowing over objects via a similar technique. You can read more in: Accelerating Eulerian Fluid Simulation With Convolutional Networks.

Tech Tales:

[2025: A bedroom in the San Francisco Bay Area.]

“Wake up!”
“No,” you say, rolling over, eyes still shut.
“I’ve got to tell you what happened last night!”
“Where are you?”
“Tokyo as if it matters. Come on! Come speak to me!”
“Fine”, you say, sitting up in bed, eyes open, looking at the robot on your nightstand. You thumb your phone and give the robot the ability to see.
“There you are!” it says. “Bit of a heavy one last night?”
“It got heavy after the first 8 pints, sure.”
“Well, tell me about it.”
“Let me see Tokyo, then I’ll tell you.”
“One second,” the robot says. Then a little light turns off on its head. A few seconds pass and the light dings back on. “Ready!” it says.

You go and grab the virtual reality headset from above your desk, then grab the controllers. Once you put it all on you have to speak your password three times. A few more seconds for the download to happen then, bam, you’re in a hotel room in Tokyo. You stretch your hands out in front of you, holding the controllers, and in Tokyo the robot you’re controlling stretches out its own hands. You tilt your head and it tilts its head. Then you turn to try and find your friend in the room and see her. Except it’s not really her, it’s a beamed in version of the robot on your nightstand, the one which she is manipulating from afar.
“Okay if I see you?”
“Sure,” she says. “That’s part of what happened last night.”
One second passes and the robot shimmers out of view, replaced by your friend wearing sneakers, shorts, a tank top,, and the VR headset and linked controllers. One of her arms has a long, snaking tattoo on it – a puppet master’s hand, holding the strings attached to a scaled-down drawing of the robot on your nightstand.
“They sponsored me!” she says, and begins to explain.

As she talks and gestures at you, you flip between the real version of her with the controllers and headset, and the robot in your room that she’s manipulating, whose state is being beamed back into your headset, then superimposed over the hotel room view.

At one point, as she’s midway through telling the story of how she got the robot sponsored tattoo, you drink a cup of coffee, still wearing the headset, holding the controller loosely between two of your fingers as the rest of them wrap around the cup. In the hotel room, your robot avatar lifts an imaginary cup, and you wonder if she sees steam being rendered off of it, or if she sees the real you with real steam. It all blurs into one eventually. As part of her sponsorship, sometimes she’s going to dress up in a full-scale costume of the robot on your nightstand, and engage strangers on the street in conversation. “Your own Personal Avatar!” she will say. “Only as lonely as your imagination!”

Import AI Newsletter 38: China’s version of Amazon’s robots, DeepMind’s arm farm, and a new dataset for tracking language AI progress

Robots, Robots, and Robots!
…Kiva Systems: Chinese Edition… when Amazon bought Kiva Systems in 2012 the company’s eponymous little orange robots (think of a Rhoomba that has hung out at the gym for a few years) wowed people with their ability to use swarm intelligence to rapidly and efficiently store, locate, and ferry goods stacked on shelves to and fro in a warehouse.
…now it appears that a local Chinese company has built a similar system. Chinese delivery company STO Express has released a video showing robots from Hikvision swiveling, shimmying, and generally to- and fro-ing to increase the efficiency of a large goods shipping warehouse. The machines can sort 200,000 packages a day and are smart enough to know when to go to their electricity stations to charge themselves. Hikvision first announced the robots in 2016, according to this press release (Chinese). Bonus: mysterious hatches in the warehouse floor!
…”An STO Express spokesman told the South China Morning Post on Monday that the robots had helped the company save half the costs it typically required to use human workers. They also improved efficiency by around 30 per cent and maximized sorting accuracy, he said. We use these robots in two of our centers in Hangzhou right now,” the spokesman said. “We want to start using these across the country, especially in our bigger centers.”, according to the South China Morning Post.
…Amazon has continued to invest in AI and automation since the Kiva acquisition. In the company’s latest annual letter to shareholders CEO Jeff Bezos explains how AI ate Amazon: Machine learning drives our algorithms for demand forecasting, product search ranking, product and deals recommendations, merchandising placements, fraud detection, translations, and much more. Though less visible, much of the impact of machine learning will be of this type – quietly but meaningfully improving core operations,” writes Bezos.

Research into reinforcement learning, generative models, and fleet learning, may further revolutionize robotics by making it possible for robots to learn to rapidly identify, grasp, and transfer loosely packed items around warehouses and factories. Add this to the Kiva/Hikvision equation and it’s possible to envisage fully automated, lights out warehouses and fulfillment centers. Just give me a Hikvision pod with a super capable arm on top and a giant chunk of processing power and I’m happy.

Industrial robots get one grasp closer: startup Righthand Robotics claims to have solved a couple of thorny issues relating to robotics, namely grasping and dealing with massive variety.
…the company’s robots uncloaked recently. They are designed to pick loose, mixed items out of bins and place them on conveyor belts or shelves. This is a challenging problem in robotics. So challenging in fact that in 2015 Amazon started the ‘robot picking challenge’, a competition meant to motivate people to come up with technologies that Amazon, presumably can then buy and use to supplement for human labor.
…judging by my unscientific eyeballing, Righthand’s machines use an air-suction device to grab the object, then stabilize their grip with a three-fingered claw. Things I’d like to know: how heavy an object the sucker can carry, and how wildly deformed an object’s surface can be and still be grippable?

DeepMind reveals its own (simulated) arm farm: last year Google Brain showed off a room containing 14 robot arms, tasked with picking loose items out of bins and learning to open doors. The ‘arm farm’, as some Googlers term it, let the arms learn in parallel, so when each individual arm got better at something that knowledge was transferred to all the others in the room. This kind of fleet-based collective learning is seen by many as a key way of surmounting the difficulties of developing for robotics (reality is really slow relative to simulation, and variants from each physical robot can hurt generalization).
DeepMind’s approach sees it train robot arms in a simulator to successfully find a Lego Duplo block on a table, pick it up, and stack it on another one. By letting the robots share information with one another, and using that data to adjust the the core algorithms used to learn to stack the blocks, the company was able to get training time down to as little as 10 hours of interaction across a fleet of 16 robots. This is approaching the point where it might be feasible for products. (The paper mostly focuses on performance within a simulator, though there are some asides that indicate that some tests have shown some generalization to the real world.)
…For this experiment, DeepMind built on and extended the Deep Determinisic Policy Gradient algorithm in two ways: 1) it added the ability to let the algorithm provide updates back to the learner more times during each discrete step, letting robots learn more efficiently. It called this variant DPG-R 2) It then took DPG-R and franken-engineered it with some of the distributed ideas from the Asynchronous Actor Critic (A3C) algorithm. This let it parallelize the algorithm across multiple computers and simulated robots.
…For the robot it used a Jaco, a robotics arm developed by Kinova Robotics. The arm has 9 degrees of freedom (6 in the body and 3 in the hand), creating a brain-melting level of computations to perform to get it to do anything remotely useful. This highlights why it’s handy to learn to move the arm using an end-to-end approach.
...Drawbacks: the approach uses some hand-coded information about the state of the environment, like the position of the Lego Block on the table, and such. Ultimately, you want to learn this purely from visual experience. Early results here have about an 80% success rate, relative to around 95% for approaches that use hard-coded information.
…more information in: Data-efficient Deep Reinforcement Learning for Dexterous Manipulation.

ImportAI’s weekly award for Bravely Enabling Novel Intelligence for the Community of Experimenters (BENICE) goes to… Xamarin co-founder Nat Friedman, who has announced a series of unrestricted $5,000 grants for people to work on open source AI projects.
…”I am sure that AI will be the foundation of a million new products, ideas, and companies in the future. From cars to medicine to finance to education, AI will power a huge wave of innovation. And open source AI will lower the price of admission so that anyone can participate (OK, you’ll still have to pay for GPUs),” he writes.
…anyone can apply from any country of any age with no credentials required. Deadline for applications is April 30th 2017. The money “is an unrestricted personal gift. It’s not an equity investment or loan, I won’t own any of your intellectual property, and there’s no contract to sign,” he says.

Double memories: The brain writes new memories to two locations in parallel: the hippocampus and the cortex. This, based on a report in Science, cuts against years of conventional wisdom about the brain. Understanding the interplay between the two memory sysems and other parts of the brain may be of interest to AI researchers – the Neural Turing Machine and the Differentiable Neural Computer are based on strongly held beliefs about how we use the hippocampus as a kind of mental scratch pad to help us go about our day, so it’d be curious to model systems with multiple memory systems interacting in parallel.

Technology versus Labor: Never bring a human hand to a robot fight. The International Monetary Fund finds that labor’s share of the national income declined in 29 out of 50 surveyed countries over the period of 1991 to 2014. The report suggests technology is partially to blame.

AlphaGo heads to China: DeepMind is mounting a kind of AlphaGo exhibition in China in May, during which the company and local Go experts will seek to explore the outer limits of the game. Additionally, there’ll be a 1:1 match between AlphaGo and the world’s number one Go champion Ke Jie.

German cars + Chinese AI: Volkswagen has led a $180 million financing round for MobVoi, a Chinese AI startup that specializes in speech and language processing. The companies will work together to further develop a smart rear-view mirror. Google invested several million dollars into Mobvoi in 2015.

I heard you like programming neural networks so I put a neural network inside your neural network programming environment: a fun & almost certainly counter-productive doohickey from Pascal van Kooten, Neural Complete, uses a generative seq2seq LSTM neural network to suggest next lines of code you migth want to write.

Tracking AI progress… via NLP: Researchers have just launched a new natural language understanding competition. Submissions close and the results will be featured at EMNLP in September…
… this is a potentially useful development because tracking AI’s progress in the language domain has been difficult. That’s because there are a bunch of different datasets that people evaluate stuff on eg, Facebook’s BabI, Stanford’s Sentiment Treebank (see: OpenAI research on that), Penn TreeBank, the One Billion Word Benchmark, and many more that I lack the space to mention. Additionally, language seems to be a more varied problem space than images, so there are more ways to test performance.
… the goal of the new benchmark is to spur progress in natural language processing by giving people a new large dataset to use to reason about sentences with. It contains a dataset of 430,000 human-labeled sentence pairs, along with corresponding labels on whether they are neutral, contradiction, or entailment, is to spur progress in NLP.
…New datasets tend to motivate new solutions to problems – that’s what happened with ImageNet in 2012 (Deep Learning) and 2015 (ResNets – which proved merit on ImageNet and have been rapidly adopted by researchers), as well as approaches like MS COCO.
… one researcher, Sam Bowman,, said he hopes this dataset and competition could yield: “A better RNN/CNN alternative for sentences”, as well as “New ideas on how to use unlabeled text to train sentence/paragraph representations, rather than just word representations [and] some sense of exactly where ‘AI’ breaks down in typical NLP systems.”

Another (applied) machine learning brick in the Google search wall: Google has recently launched “Similar items” within image search. This product uses machine learning to automatically identify products within images and then separately suggest shopping links for them. “Similar items supports handbags, sunglasses, and shoes and we will cover other apparel and home & garden categories in the next few months,” they say…
…in the same way Facebook is perennially cloning bits of Snapchat to deal with the inner existential turmoil that stems from what we who are mortal call ‘getting old’, Google’s new product is similar to ‘Pinterest Lens’ and Amazon XRay.
…seperately, Google has
 created a little anti-psychotic-email widget, based on its various natural language services on its cloud platform. The DeepBreath system can be set up with a Google Compute Engine account.

RIP OpenCyc: Another knowledge base bites the dust: data is hard, but maintaining a good store of data can be even more difficult. That’s partly why OpenCyc – an open source variant of the immense structured knowledge based developed by symbolic AI company Cyc – has shut down. “Its distribution was discontinued in early 2017 because such “fragmenting” led to divergence, and led to confusion amongst its users and the technical community generally that that OpenCyc fragment was Cyc. Those wishing access to the latest version of the Cyc technology today should contact to obtain a research license or a commercial license to Cyc itself,” the company writes. (It remains an open question as to how well Cyc is doing., a company formed to commercialize the technology, appears to have lets its website lapse. I haven’t ever been presented with a compelling and technically detailed example for how Cyc has been deployed. My inbox is open!)

OpenAI bits&pieces:

Inventing language: OpenAI’s Igor Mordatch was interviewed by Canadian radio science program The Spark about his recent work on developing AI agents that learned to invent their own language.

Tech Tales:

[2045: A bunker within a military facility somewhere in the American West.]

The scientists call it the Aluminum Nursery, the engineers call it the FrankenFarm, and the military call it a pointless science project and ask for it to be defunded. But everyone privately thinks the same thing: what the robots are doing is fascinating to the point that no one wants to stop them.

It started like this: three years ago the research institute scattered a hundred robots into a buried, underground enclosure. The enclosure was a large, converted bunker from the cold war, and its ceilings were studded with ultraviolet lights, which cycle on and off throughout the course of each artificial “day”. Each day sees the lights cycle with a specific pattern that can be discerned, given a bit of thought.

To encourage the robots to learn, the scientists gave them one goal in life: to harvest energy from the overhead lights. It only took a few weeks for the robots to crack the first pattern. One robot, operating within its own little computational envelope, was able to figure out the pattern of the lights. When one light turned off, it beamed a message to another robot giving it some coordinates elsewhere in the complex. The robot began to move to that location, and when it arrived the overhead light-cycle ticked over and a light shone down upon it, letting it collect energy.

In this way the robots learned teamwork. Next came specialization: The scientists had built the robots to be modular, with each one able to extend or diminish itself by adding legs, or dextrous manipulators, or additional solar collectors, and so on. After this first example, the robots learned to try to spot the pattern in the sky. After a few more successes, one robot decided to specialize. It made a deal with another robot to gain one of that robot’s cognitive cores, in exchange for one of its legs. This meant when it cracked the pattern it was able to tell the other robot, which moved into position, collected energy, and then traded it with the originating robot. In this way, the robots learned to specialize to achieve their goals.

The scientists made the patterns more complex and the robots responded by making some of themselves smarter and others more mobile.

One day, when the scientists checked the facility, they did a scan and found only 99 robots. After they reviewed the footage they saw that in the middle of the artificial night a group of robots had fallen upon a single one that had been patrolling a rarely visited corner of the facility. In the space of a few minutes the other robots cannibalized the robot they’d ambushed, removing all of its limb, gripper, sensor modules, and all of its cognition other than a single base ID core. The next day, the robots solved the light pattern after a mere three cycles – something that was close to computationally optimal. Now the scientists have a bet with eachother as to how many robots the population will reduce to. “Where is the lower bound?” they ask, choosing to ignore the dead ID core sitting in the enclosure, its standby battery slowly draining away.

Import AI Newsletter 37: Alibaba enters StarCraft AI research, industrial robots take 6.2 human jobs, and Intel bets on another AI framework

Will the neural net doctor please stand up? Diagnosing problems in neural networks is possibly even trickier than debugging traditional software – emergent faults are a fact of life, you have to deal with mental representations of the problem that tend to be quite different to traditional programming, and it’s relatively difficult to visualize and analyze enough of any model to develop solid intuitions about what it is doing.
…There are a bunch of new initiatives to try and fix this. New publication Distill aims to tackle the problem by pairing technically astute writing with dynamic, fiddle-able visual widgets. The recent article on Momentum is a great example of the form. Additionally, companies like Facebook, OpenAI and Google are all trying to do more technical explainers of their work to provide an accompaniment and sometimes expansion on research papers.
But what about explaining neural nets to the people that work on them, while they work on them? Enter ActiVis, a neural network analysis and diagnosis tool built through partnership between researchers at Georgia Tech and over 15 engineers and researchers within Facebook.
…ActiVis is designed to help people inspect and analyze different parts of their trained model, interactively in the web browser, letting them visually explore the outcome of their specific hyperparameter settings. It allows for both inspection of individual/few neurons within a system, as well as views of larger groups. (You can see an example of the user interface on page 5 of the research paper (PDF). You don’t know what you don’t know, as they say, and tools like this may help to surface unsuspected bugs.
… The project started in 2016 and has been continuously developed since then. For next steps, the researchers plan to extend the system to visualize the gradients, letting them have another view of how data sloshes in and out of their models.
…Another potential path for explanations lies in research that gets neural network models to better explain their own actions to people, like a person narrating what they’re doing to an onlooker, as outlined in this paper: Rationalization: A Neural Machine Translation Approach to Generating Natural Explanations.

Each new industrial robot eliminates roughly 6.2 human workers, according to an MIT study on the impact of robot automation on labor. Robots and Jobs: Evidence from US Labor Markets (PDF).

What does AI think about when it thinks about itself, and what do we think about when we think about AI?: a long-term research problem in AI is how to effectively model the internal state of an emergent, alien intelligence. Today’s systems are so crude that this is mostly an intellectual rather than practical exercise, but scientists can predict a futrue where we’ll need to have better intuitions about what an AI is thinking about…
… that motivated researchers with the Georgia Institute of Technology and Virginia Tech to call for a new line of research into building a Theory of AI’s Mind (ToAIM). In a new research paper they outline their approach and provide a practical demonstration of it.
…the researchers test their approach on Vicki, an AI agent trained on the VQA dataset to be able to answer open-ended questions about the contents of pictures by choosing one of one thousand possible answers. To test how good people are at learning about Vicki and its inner quirks, the researchers evaluate people’s skill at predicting when and how Vicki will fail, or to also predict a possible answer Vicki may give to a question. In a demonstration of the incredible data efficiency of the human mind, volunteers are able to successfully predict the types of classifications VIcki will make after only seeing about 50 examples.
…In a somewhat surprising twist, human volunteers end up doing badly at predicting Vicki’s failures when given additional information that researchers use to diagnose performance, such as a visualization of Vicki’s attention over a scene.
…I’m also interested in the other version of this idea: an AI building a Theory of a Human’s Mind. Eventually, AI systems will need to be good at predicting what course of actions they can take to complement the desires of a human. To do that they’ll need to model us efficiently, just as we model them.

Alibaba enters the StarCraft arena: StarCraft is a widely played, highly competitive real-time strategy game, and many researchers are racing with one another to beat it. Mastering a StarCraft game requires the development of an AI that can manage a complex economy while mounting ever more sophisticated military strikes against opponents. Games can last for anywhere from ten minutes to an hour, and require long-range strategic plan as well as carefully calibrated military and economic unit control.
…the game is motivating new research approaches, as teams – likely motivated by DeepMind’s announcement last year that it would work with Blizzard to create a new API to use to develop AI within StarCraft, are now racing to crack it. Multiple organizations are racing to develop AI approaches to beat the game.
…Recent publications such as Stabilizing Experience Replay for Deep Multi-Agent Reinforcement Learning  from The University of Oxford and Microsoft Research, Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks from researchers at Facebook AI Research, and now, a StarCraft AI paper from Alibaba and University College London.
… in Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games, the researchers design a new type of network to help multiple agents coordinate with one another to achieve a goal. The BiCNet network has two components: a policy network and a Q-Network. It uses bi-directional recurrent neural networks to give it a form of short term memory and to help individual agents share their state with their allies. This allows for some degree of locally independent actions, while being globally coordinated.
in tests, the network is able to learn complex multi-agent behaviors, like coordinating moves among multiple units without them colliding, developing “hit and run tactics” (go in for the attack, then run out of range immediately, then swoop in again), as well as learning to attack in coordination from a position of cover. Check out the strategies in this video.
…Research like this might help Chinese companies shake off their reputation for being better at scaling up or applying already-known techniques, rather than developing entirely new approaches.

Supervised telegram learning: Beating Atari With Natural Language Guided Reinforcement Learning (PDF), from researchers at Stanford shows how to use English sentences to instruct a reinforcement learning agent to solve a task. The approach yields an agent that can attain competitive scores on tough games like Montezuma’s Revenge, and others.
…For now it’s tricky to see the practical value of this approach given the strong priors that make it successful – characterizing each environment and then writing instructions and commands that can be relayed to the RL agent represent a significant amount of work…
…in the future this technique could help people build models for real-world problems where they have access to large amounts of labeled data.

Real Split-Brains: The human hippocampus appears to encode two separate spatial values as memories when a person is trying to navigate their environment. Part of the brain appears to record a rough model of the potential routes to a location – take a left here, then a right, straight on for a bit, and then you’re there, wahey! – and another part appears to be consistently estimating the straight-line distance as the crow flies.
…It’s also hypothesized that the pre-frontal cortex helps to select new candidate routes for people to take, which then re-activates old routes stored in the hippocampal memory…
…Sophisticated AI systems may be eventually built in an architecturally similar manner, with data flowing through a system and being tagged and represented and referred to differently according to different purposes. (DeepMind seems to think so, based on its Differentiable Neural Computer paper.
…I’d love to know more about the potential interplay between the representations of the routes to the location, and the representation of the straight line crow distance to it. Especially given the trend in AI towards using actor-critic architectures, and the recent work on teaching machines to navigate the space around them by giving them a memory explicitly represented as a 2D map.

AI development feels a lot like hardware development: hardware development is slow, sometimes expensive, frustratingly unpredictable, and prone to random efforts that are hard to identify during the initial phases of the project. To learn more, read this exhaustive tick-tock account from Elaine Chen in this post on ConceptSpring on how hardware products actually get made. Many of these tenets and stages also apply to AI development.

Smart farming with smart drones: Chinese dronemaker DJI has started expanding out from just providing consumer drones to other markets as well. The latest? Drones that spray insectiside on crops across China.
but what if these farming drones were doing something nefarious? Enter the new commercially lucrative world of DroneVSDrone technology. Startup AirSpace claims its own drone defense system can use computer vision algorithms and some mild in-flight autonomy to let it command a fleet of defense drones that can identify hostile drones and automatically fire net-guns at them.

Battle of the frameworks! Deep learning has led to a Cambrian explosion in the number of open source software frameworks available for training AIs in. Now we’re entering the period where different megacorps pick different frameworks and try to make them a success.
DeepMind WTF++: DeepMind has released sonnet, another wrapper for TensorFlow (WTF++). The open source library will make it easier for people to compose more advanced structures on top of TF; DeepMind has been using it internally for some time, since it switched to TF a year ago. Apparently the library will be most familiar to previous users of Lasagne. Yum! (Google also has Keras, which sits on top of TF. Come on folks, it’s Google, you knew there’d be a bunch!). Microsoft has CNTK, Amazon has MXNet, Facebook has PyTorch and now Chainer gets an ally: Intel has settled on… Chainer! Chainer is developed by Japanese AI startup Preferred Networks, and is currently quite well used in Japan but not much elsewhere. Noteable user: Japanese robot giant FANUC.

GAN vs GAN vs GAN vs GAN: Generative adversarial networks have become a widely used, popular technique within AI. They’ve also fallen victim to a fate some acronyms deal with – having such a good abbreviation that everyone uses it in paper titles. Enter new systems like WGAN (Wasserstein gan), STACKGAN, BEGAN, DISCOGAN, and so on. Now we appear to have reached some kind of singularity as two Arxiv papers appear in the same week with the same acronym ‘SeGAN’ and ‘SEGAN’…
but what does the proliferation of GANs and other generative systems mean for the progress of AI and how do you measure this? The consensus based on responses to my question on twitter is to test downstream tasks that require these entities as components. Merely eyeballing generated images is unlikely to lead to much. Though I must say I enjoy this CycleGAN approach that can warp a movie of a horse into a movie of a zebra.

JOB: Help the world understand AI progress: The AI Index, an offshoot of the AI100 project (, is a new effort to measure AI progress over time in a factual, objective fashion. It is led by Raymond Perrault (SRI International), Erik Brynjolfsson (MIT), Hagar Tzameret (MIDGAM), Yoav Shoham (Stanford and Google), and Jack Clark (OpenAI). The project is in the first phase, during which the Index is being defined. The committee is seeking a project manager for this stage. The tasks involved are to assist the committee in assembling relevant data sets, through both primary research online and special arrangements with specific dataset owners. The position calls for being comfortable with datasets, strong interpersonal and communication skills, and an entrepreneurial spirit. The person would be hired by Stanford University and report to Professor emeritus Yoav Shoham. The position is for an initial period of six months, most likely at 100%, though a slightly lower time commitment is also possible. Salary will depend on the candidate’s qualifications.… Interested candidates are invited to send their resumés to Ray Perrault at

OpenAI bits&pieces:

Hunting the sentiment neuron: New research release from OpenAI in which we discuss finding a dedicated ‘sentiment neuron’ within a large mLSTM trained to predict the next character in a sentenc_!. This is a surprising, mysterious result. We released the weights of the model so people can have a play themselves. Other info in the academic paper. Code: GitHub. Bonus:the fine folks at Hahvahrd have dumped the model into their quite nice LSTM visualizer, so you can inspect its mysterious inner states as well.

Tech Tales:

[2030: A  resource extraction site, somewhere in the rapidly warming arctic.]

Connectivity is still poor up here, near the cap of the world. Warming oceans have led to an ever-increasing cycle of powerful storms, and the rapid turnover of water into rain strengthens mysterious currents, further mixing the temperatures of the world’s northern oceans. Ice is becoming a fairytale at the lower latitudes.

At the mining site, machines ferry to and fro, burrowing into scraps earth, their path defined by a curious flock of surveillance drones& crawling robots. Invisible computer networks thrum with data, and eventually it builds up to the point that it needs to be stored on large, secured hard drives, and transported by drone to places where there’s a good enough internet connection to stream it over the internet to a cluster of data centers.

As the climate changes the resources grow easier to access and robots build up the infrastructure at the mining site. Wealth multiplies. In 2028 they decide to construct a large data center on the mining site.

Now, in 2030, it looms, low-slung, a skyscraper asleep on its side, sides that are pockmarked with circular holes, containing turbines that recycle air in and out of the system, forever trying to equalize temperatures to cool the hungry servers.

Inside the datacenter there are things that sense the mining site as eyes sense the walls in a darkened room, or as ears hunt the distant sounds of dogs barking. It uses these intuitions to sharpen its vision and improve its hearing, developing a sense of touch as it exchanges information with the robots. After the solar panels installed the amount of people working on the site falls off in a steep curve. Now the workers are much like residents of lighthouses in the olden days; their job is to watch the site and only intervene in the case of danger. There is very little of that, these days, as the over-watching-computer has learned enough about the world to expand safely within it.

Import AI Newsletter 36: Robots that can (finally) dress themselves, rise of the Tacotron spammer, and the value in differing opinions in ML systems

Speak and (translate and) spell: sequence-to-sequence learning is an almost counter-intuitively powerful AI approach. In, Sequence-to-Sequence Models Can Directly Transcribe Foreign Speech, academics show it’s possible to train a large neural network model to listen to audio in one language (Spanish/English) and automatically translate it and transcribe it into another language (Spanish/English). The approach performs well relative to other approaches and has the additional virtue of being (relatively) simple…
…The scientists detect a couple of interesting traits that emerge once the system has been fed enough data. Specifically, ”direct speech-to-text translation happens in the same computational footprint as speech recognition – the ASR and end-to-end ST models have the same number of parameters, and utilize the same decoding algorithm – narrow beam search. The end-to-end trained model outperforms an ASR-MT cascade even though it never explicitly searches over transcriptions in the source language during decoding.”
Read and speak: We’re entering a world where computers can convincingly synthesize voices using neural networks. First there was DeepMind’s WaveNet, then Baidu’s Deep Voice, and now courtesy of Google comes the marvelously named Tacotron. Listen to some of the (freakily accurate) samples, or read some of the research outlined in Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. Perhaps the most surprising thing is how the model learns to change its intonation, tilting the pitch up at the end of its words if there is a question mark at the end of the sentence.

Politeness can be learned: Scientists have paired SoftBank’s cute Pepper robot with reinforcement learning techniques to build a system that can learn social niceties through a (smart) trial and error process.
…The robot is trained via reinforcement learning and is rewarded when people shake its hand. In the process, it learns that behaviors like looking at a person or waving at them can encourage them to approach and give it a hand shake as well.
…It also learns to read some very crude social cues, as it is also given a punishment for attempting handshakes when none are wanted…
…You can read more about this in ‘Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning’.

Thirsty, thirsty data centers: Google wants to draw up to 1.5 million gallons of water a day from groundwater supplies in Berkeley County to cool its servers – three times as much as the company’s current limit.

Facebook’s Split-Brain Networks: new research from Facebook, Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play (PDF), presents a simple technique to let agents learn to rapidly explore and analyze a problem, in this case a two-dimensional gridworld…
… the way it works is to have a single agent which has two distinct minds, Alice and Bob. Alice will perform a series of actions, like opening a specific door and traveling through it, then will have Bob perform the action in reverse, traveling back to the door, closing it, and returning to Alice’s start position.
…this gives researchers a way to have the agent teach itself an ever-expanding curriculum of tasks, and encourages it to learn rich representations of how to solve the tasks by having it reverse its own actions. This research is very early and preliminary, so I’ll be excited to see where Facebook take it next.
…This uses a couple of open source AI components. Specifically, MazeBase and RLLab.

New semiconductor substrates for your sparse-data tasks: DARPA has announced the Hierarchical Verify Identify Exploit (HIVE) program, which seeks to create chips to support graph processing systems 1000X more efficient than today’s systems. The proposed chips (PDF) are meant to be good for parallel processing and have extremely fast access to memory. They plan to create new software and hardware systems to make this possible.

What’s up (with my eye), doc? How AI can still keep the human touch: new research paper from Google shows how to train AI to use the opinions of multiple human experts when coming up with its own judgements about some data…
… in this case, the Google researchers are attempting to use photos of eyes to diagnose ‘diabetic retinopathy’ – a degenerative eye condition. In the paper Who Said What, Modeling Individual Labelers Improves Classification, the scientists outline a system that is able to use multiple human opinions to create a smarter AI-based diagnosis system…
…Typical machine learning approaches are fed a large dataset of eye pictures, with labels made by human doctors. Typically, an ML approach would average the ratings of multiple doctors for a single eye image, creating a combined score. This, while useful, doesn’t capture the differing expertise of different doctors. Google has sought to rectify that with a new ML approach that lets it use the multiple ratings per image as a signal to improve overall accuracy of the system.
…”Compared to our baseline model of training on the average doctor opinion, a strategy that yielded state-of-the-art results on automated diagnosis of DR, our method can lower 5-class classification test error from 23.83% to 20.58%, a relative reduction of 13.6%.,” they write…
…in other words, the variety of opinions (trained) humans can give about a given subject can be an important signal in itself.

Finally, a robot that can dress itself without needing to run physics simulations on a gigantic supercomputer: Clothes are hard, as everyone knows who has to get dressed in the morning. They’re even more difficult for robots, which have a devil of a time reasoning about the massively complex physics of fabrics and how they relate to their own metallic bodies. In a research paper, Learning to Navigate Cloth Using Haptics, scientists from the Georgia Institute of Technology and Google Brain outline a new technique to let a robot perform such actions. It works by decomposing the gnarly physics problem into something simpler. This is by letting the robot represent itself as a set of ‘haptic sensing spheres’. These spheres sense nearby objects and let the robot break down the problem of putting on or taking off clothes into a series of discrete steps performed over discrete entities…
…The academics tested it in four ways, “namely a sphere traveling linearly through a cloth tube, dressing a jacket, dressing a pair of shorts and dressing a T-shirt.” Encouraging stuff…
…components used: the neural network were trained using Trust Region Policy Optimization (TRPO). A PhysX cloth simulator was used to compute the fabric forces. Feedback was represented as a multilayer perceptron network with two hidden layers , each consisting of 32 hidden units.
…bonus: check out the Salvador Dali-esque videos of simulated robots putting on simulated clothes!

Import AI administrative note: Twitter threading superstar of the week! Congratulations to Subrahmanyam KVJ, who has mastered the obscure-yet-important art of twitter threading, with this comprehensive set of tweets about the impact of AI.

Personal Plug Alert:

Pleased to announce that a project I initiated last summer has begun to come out. It’s a series of interviews with experts about the intersection of AI, neuroscience, cognitive science, and developmental psychology. First up is an interview with talented stand-up comic and neural network pioneer Geoff Hinton. Come for the spiking synapse comments, stay for the Marx reference.

OpenAI bits&pieces:

DeepRL knowledge, courtesy of the Simons Institute: OpenAI/UCBerkeley’s Pieter Abbeel gave a presentation on Deep Reinforcement Learning at the Simons Institute workshop on Representation Learning. View the video of his talk and those of other speakers at the workshop here.

Ilya Sutskever on Evolution Strategies: Ilya gave an overview of our work on Evolution Strategies at an MIT Technology Review conference. Video here.

Tech Tales

[2025: The newsroom of a financial service, New York.]

“Our net income was 6.7 billion dollars, up three percent compared to the same quarter a year ago, up two percent when we take into account foreign currency affects. Our capital expenditures were 45 billion during the quarter, a 350 percent jump on last year. We expect to sustain or increase capex spending at this current level-” the stock starts to move. Hundreds of emails proliferate across trading terminals across the world:
The spiel continues and the stock starts to spiral down, eventually finding a low level where it is buffeted by high-frequency trading algorithms, short sellers, and long bulls trying to nudge it back to where it came from.

By the time the Q&A section of the earnings call has come round people are fuming. Scared. Worried. Why the spending increase? Why wasn’t this telegraphed earlier? They ask the question in thirty different ways and the answers are relatively similar. “To support key strategic initiatives.” “To invest in the future, today.”

Finally, one of the big analysts for the big mutual funds lumbers onto the line. “I want to speak to the CFO,” they say.
“You are speaking to the CFO.”
“The human one, not the language model.”
“I should be able to answer any questions you have.”
“Listen,” the analyst says via a separate private phoneline, “We own 17 percent of the company. We can drop you through the floor.”
“One moment,” says the language model. “Seeking availability.”

Almost an hour passes before the voice of the CFO comes on the line. But no one can be sure if their voice is human or not. The Capex is for a series of larger supercomputer and power station investments, the CFO says. “We’ll do better in the future.”
“Why wasn’t this telegraphed ahead of the call? The analysts ask, again.
“I’m sorry. We’ll do better in the future,” the CFO says.

In a midtown bar in New York, hours after market close, a few traders swap stories about the company, mention that they haven’t seen an executive in the flesh “in years”.