Import AI

Import AI: Issue 49: Studying the crude psychology of neural networks, Andrew Ng’s next move, and teaching robots to grasp with DexNet2

by Jack Clark

Interdisciplinary AI: Unifying human and machine thought through psychological studies of deep neural nets:
…a paper from DeepMind sees the company’s researchers probe neural networks (specifically, ‘Inception’ and Matching Networks) for biases.
….They discover that when they present these image classification networks with new, never-before-seen images, the networks have a tendency to apply the label for the new image to other similarly shaped images, in preference to ones with similar color, texture, or size. This is the same phenomenon that psychologists observe in humans.
…but what does it mean? Mostly it’s an encouraging sign that these sorts of techniques that we’ve used to analyze humans have something to offer when analyzing neural networks, paving the ground for future studies. “As a good cognitive model should, our DNNs make testable predictions about word-learning in humans. Specifically, the current results predict that the shape bias should vary across subjects as well as within a subject over the course of development. They also predict that for humans with adult-level one-shot word learning abilities, there should be no correlation between shape bias magnitude and one-shot-word learning capability,” the authors write.
Read more in the paper ‘Cognitive Psychology for Deep Neural Networks’.

Neural net libraries for everyone! Sony steps up.
…As companies base more of their long-term corporate strategy around being leaders in AI some are seeking to create the essential (software) picks and shovels to be used by other AI developers. Currently, Google languages TensorFlow and Keras are becoming popular among developers, while Amazon (MXNet), Microsoft (CNTK) and Facebook (PyTorch) are all seeking to gain some developer enthusiasm with their own languages.
…It’s already a busy ecosystem and now it’s getting busier as companies like Samsung and, now, Sony, design their own frameworks and supporting tools. Much like how smartphones have solidified around a few basic apps (WeChat / WhatsApp / FB / Google / YouTube / etc), it seems likely developers will hone in on a few choice AI frameworks/languages. The question is whether it’s too late to start another language. One thing’s for sure – Sony’s decision to give its framework the generic, un-googlable name  ‘Neural Network Libraries’ is unlikely to help.

And the award for most obvious name for a startup goes to… Andrew Ng!
…Andrew Ng, a former AI whiz at Baidu, Google, and Stanford, has finally revealed the name of his new startup: deeplearni.ng. Creative name. I’ve heard some rumors it relates to education, but that could just be people making assumptions based on Ng’s history with Coursera.

McDonald’s plots mass automation via 2,500 robot kiosks:
Fast food company McDonald’s plans to upgrade 2,500 restaurants this year to be ones that include a robot kiosk, automating the ordering process for people. McDonalds says this lets its staff concentrate on providing better service and notes that locations which already host an automated kiosk have better sales than those that don’t. My intuition is this could become another ‘ATM example’ regarding automation, where McDonald’s will continue to grow aggregate employment long after the introduction of the automation technology (in this case, the kiosk.) However, rolling this out may serve to put a ceiling on (human) wages.

Silicon Valley TV goes all in on AI:
…Tim Anglade of HBO’s Silicon Valley has written up a few of the technical details behind the show’s Not HotDog app, which uses the combined might of the smartphone and AI ecosystem – representing literally billions of dollars of investment to date – to produce software that tells you if your phone is looking at a hotdog or not. We live in amazing times. I think that the emergence of jokey or playful applications of a technology is usually a sign of its broader maturation and adoption, so the arrival of this app seems to herald good things for AI.
…One observation made by app developer TIm Anglade is that the modern AI ecosystem moves so quickly it’s unlike other technical communities. “With less than a month to go before the app had to launch we endeavored to reproduce the paper’s results. This was entirely anticlimactic as within a day of the paper being published a Keras implementation was already offered publicly on GitHub by Refik Can Malli, a student at Istanbul Technical University, whose work we had already benefited from when we took inspiration from his excellent Keras SqueezeNet implementation. The depth & openness of the deep learning community, and the presence of talented minds like R.C. is what makes deep learning viable for applications today — but they also make working in this field more thrilling than any tech trend we’ve been involved with,” he writes.
…I exchanged a few emails with Tim about the project. He notes that data collection was a tricky part of the project and – sorry readers – he didn’t stumble across any magical way to ease this process. “Honestly there was just a lot of manual download of images, or checking images I already had (such as my own vacation/food pictures). It took days upon days,” he writes. “In that respect I think Dinesh’s experience in the show staring at “penile imagery” for days on end quite accurately reflects my plight for much of the project.”
…Tim says (emphasis mine) his experience developing the app has led him to fall in love with the AI community. He’s now busily working away on some other future projects. “I think A.I, can have an otherworldly sort of quality, where it both seems to good to be true, but it’s also flawed in a way that can be charming, disarming — or just plain human,” he says.

Who said what and when and to whom? State-of-the-art results on semantic role labeling:
…Researchers with the University of Washington, Facebook, and the Allen Institute for artificial intelligence have come up with a system that gets state of the art results on semantic role labeling, a natural language processing task that challenges AI systems to “recover the predicate-argument structure of a sentence to determine essentially ‘who did what to whom’, ‘when’, and ‘where’?
…One thing that’s notable about this system is there isn’t a single killer idea, instead it uses a collection of best practices and new components like highway nets and recurrent dropout which were developed originally by other researchers for other purposes.
…Results: State of the art results on the CoNLL 2005 dataset across recall, precision, and other measures. Similarly good results on the CoNLL 2012 dataset.
…Components used: Highway connections, recurrent dropout
A nice surprise: My intuition is that scientists within AI are starting to spend more time in their papers analyzing the precise ways in which systems fail. This paper is a good example of this encouraging trend, containing an extensive ablation study where they strip out different components of the network in an attempt to better figure out which parts contribute which elements to its learning. More of this, please!
You can read more in Deep Semantic Role Labeling: What Works and What’s Next.

Multi-Modal Driving:
Waymo’s cars are outfitted with microphones to let them hear the sirens of emergency vehicles, helping them learn when to pull over safely.
…Elsewhere, Volvo’s own self-driving cars can identify deer, elk, caribou, but have a hard time responding to Kangaroos due to their idiosyncratic bouncy gait.

Where we are with AI development, with Fei-Fei Li.
…”We’re entering a new phase but there is a long way to go”, said Google/Stanford’s Fei-Fei Li about the current state of AI research at the ACM Turing Awards last month, before paraphrasing British Prime Minister Winston Churchill to note that AI development is not at the beginning of the end, but rather at the end of the beginning.
…Afterwards, I caught up with Fei-Fei briefly and asked her what kind of metric might supersede ImageNet for measuring the effectiveness of visual classifiers (ImageNet is being retired after this year’s competition as we’ve started to over-fit the dataset). She suggested that the vision community is going in a number of different directions and we may be entering a period where there isn’t a single, simple metric we can pick. Fei-Fei was very clear that “vision is not solved” and instead there are numerous datasets out there – some of varying levels of complexity and some at the limit or beyond of current techniques like VQA – that could be good candidates for the next phase of measurement.
…This maps to my own understanding of the space – instead of simply measuring the ability to pick an object out of a photo we’re now moving onto the harder (and potentially more fruitful) problems of labeling, segmentation, disentanglement, inference about relationships, and so on.

Reach out and touch shapes: UC Berkeley researchers release Dex-Net 2.0
…How can we teach computers to easily grasp objects, even novel ones? That’s a question researchers have been grappling with for decades. Recently, some groups have turned to neural networks as an answer, trying to give computers the ability to approximate the specific function to grip a specific thing. Google has experimented with fleet learning robots picking up and grasping real world things to do this, letting them learn in an unsupervised way how to pick up and put down objects.
…UC Berkeley has its own (supervised) spin on it. Last week the UC Berkeley AUTOLAB released Dex-Net 2.0, a 6.7 million object-large dataset to help researchers teach computers how to get a grip on reality.
…”The key to Dex-Net 2.0 is a hybrid approach to machine learning Jeff Mahler and I developed that combines physics with Deep Learning. It combines a large dataset of 3D object shapes, a physics-based model of grasp mechanics, and sampling statistics to generate 6.7 million training examples, and then using a Deep Learning network to learn a function that can rapidly find robust grasps when given a 3D sensor point cloud. It’s trained on a very large set of examples of robust grasps, similar to recent results in computer vision and speech recognition,” says UC Berkeley professor Ken Goldberg.
Find out more about Dex-Net 2.0 (and its predecessor) on the official project page.

Competition grows in machine translation:
Amazon Web Services plans to soon launch a machine translation service, according to CNBC. This aligns with some of Amazon’s recent research requests including robust, distributed translation systems that can learn from small amount of user feedback.
.Amazon’s service will sit alongside similar offerings from Microsoft, Google and IBM. AI seems like the next technology around which cloud providers will compete as they seek to offer increasingly higher-order abstractions and services on top of their world-spanning fleets of computers.

The Geography of AI will be defined by regulation – or the lack of it:
Fun article in BusinessWeek about Starsky Robotics, a company that employs blue collar truck drivers and elite AI coders who work together to create automated trucks that drive on highway, which are then remotely piloted around towns by traditional drivers working from remote operations centers.
…Self-driving will be defined partially by where it gets developed, so it’s of note that some states, such as Florida, have taken particularly permissive and loose approaches to regulation in this area, while others including California have been somewhat harsher.
The Geography of the world will be defined by AI – or the lack of it:
…One interesting tidbit in the article is the idea that, if the company is successful, it could prompt the creation of “climate-controlled “driver centers,” in towns like Jacksonville, where people like Runions will work regular shifts in front of computers, without the greasy food or loneliness that has traditionally gone along with being a trucker.”
…Which begs the question – what happens to the vcast exurban ecosystem of truckstops, drive-thrus, and so on that cater to drivers? How will cities change in response to providing services for these stay-at-console truckers, and how will small towns whose economies are built around being on trucking routes fare in this new world?
…”I can tell the difference between a dead porcupine and a dead raccoon, and I know I can hit a raccoon, but if I hit a porcupine, I’m going to lose all the tires on the truck on that side,” says Tom George, a veteran driver who now trains other Teamsters for the union’s Washington-Idaho AGC Training Trust. “It will take a long time and a lot of software to program that competence into a computer.”

OpenAI Bits & Pieces:

Free tools: mujoco-py, an open source Python library to make it easier to simulate and experiment with the (proprietary, license required) MuJoCo physics engine. Bonus: psychedelic robot gif!

Tech Tales:

[ 2024: An Internet cafe in South East Asia. ]

No, you say, watching the price of $BRAIN_COIN plummet from highs down to crushing lows. No no no no no. Rumors of your death swirling on the internet. Fake news about a hijacking. Videos of regulators saying that your currency is under investigation, that the treasury department has a warrant out for your arrest, that George Soros has reversed his position on the cryptocurrency and is liquidating assets. No, no, no, you say, until someone sitting next to you in the cybercafe shushes you, unaware you just went from being a billionaire to a several-hundred millionaire.

All fake, of course. Propaganda dreamed up by the (sometimes automated) marketing departments behind other currencies seeking to sow doubt and confusion, creating enough questions to make people suspicious and thereby manipulate the price of the currencies. The question is how to fight back? How can you send information out into the world that people will actually believe.

And the whole time the currency, your baby; digital scrip designed to form the bedrock of a marketplace between AIs, trading the currency with eachother in exchange for influence, is being rocked to and fro by waves of automated propaganda, dreamed up and sent out by bots around the world. You record a video of yourself holding up a copy of today’s newspaper, having scrawled a long string of numbers on its front that come from the currency. People don’t believe you. “Oh this can very easily be faked,” writes some internet denizen. “Has all the hallmarks of a synthjob – the slight wetness around the eyes, the blur on some of the zoomed in skin pores, the folds on the newspaper. Ridiculous they think we’d believe this.”

So to truly verify yourself you must pair off with a livestreamer: someone who had sufficient fame to have a real audience that, when their audience sees you hanging out with the celeb in real life, will enthusiastically photograph and write about the encounter for their own follows – proof by association. So that’s how you end up walking down the touristy part of Bangkok with a flavor-of-the-month e-celeb,posing for photos taken by numerous fans, all providing an expanding galaxy of hard-to-fake coincidental evidence that you are truly alive. You even forgive the celebrity for mispronouncing the name of the currency, twice, as BRANCOIN and BRAINDOLLAR, while testifying to its merits.

After a day or so the price of the currency recovers, despite conspiracy theories to the contrary that the e-celeb and their fans are fake as well. How long till then, you wonder. How long till even this isn’t enough?

Technologies that inspired this story: generative adversarial networks, synthetic text/speech/vision, social media, Vitaly Buterin (Ethereum)

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to david@amplifypartners.com.

Import AI: Issue 48: Learning language in the third dimension; how AI may lead to war, inequality, or stagnation; AI and Art researchers team-up to create CANs

by Jack Clark

Extremely freaky and incredibly cool AI art:
This eery AI experiment mashes up Mike Tyka’s recent work on generating fully synthetic faces with AI, with a technology called Deep Warp to let the eyes of the synthetic person follow your cursor. The effect is perturbing and cool! More of these AI mashups, please.
You can see the experiment here.

NIPS 2017, by the numbers:
…..3,297: # of NIPS 2017 research papers submitted
…~2,500: # of NIPS 2016 research papers submitted
…..3,240: #r of research papers cleared for review (some violated policies and others were withdrawn by submitters.)
183: # of area chairs charged with overseeing these papers.
…New: NIPS, in keeping with the current boom in deep learning, has “added one more layer” to its reviewing structure. Senior area chairs (human) will help to further calibrate the decisions made by individual area chairs (much like a layer in a neural network, though with more coffee and swearing.)
More information in this Google Doc from the courageous NIPS program co-chairs.

Why some industries may adopt AI slowly…
Biology eats all the code around me
Despite software leading to rapid gains in our ability to simulate and run experiments on complicated processes, there are some things we struggle with. Real life is one of them. Reality is built on a kind of fizzing underlay of chaos and fusing our computer systems with them tends to be difficult.
…”Instead of “software eats biotech”, the reality of drug discovery today is that biology consumes everything,” writes Life Sci VC in a great post to remind us of the difficulty of some fundamental domains.
…”The primary failure mode for new drug candidates stems from a simple fact: human biology is massively complicated. Drug candidates interfere with the wrong targets or systems leading to bad outcomes (“off-target” toxicity),” they write.
Read the whole post here.

Language learning goes into the third dimension:
Today, many groups are trying to teach agents to develop language in a way that is uniquely tied to the environment they exist in. This is because of a growing intuition among researchers that simply getting an agent to learn about text by studying large corpuses of it is insufficient to develop AIs with a rounded commonsense understanding of the world – instead, groups are teaching agents to tie words to their environment, letting them develop an intuitive understanding of what, say, “big” or “heavy” or “far away” might mean. Some of these projects have yielded agents with a language which must be translated into English. Other groups are trying to teach their agents English from the ground-up, expanding the agents’ capabilities over time via curriculum learning.
…Now, separate research projects from Facebook and DeepMind show a way to push this project into the third dimension, with new papers that teach agents complex language in rich, 3D environments.
Components: DeepMind Lab (DeepMind, a customized/proprietary version of an earlier open source release based on Quake), ViZDoom (CMU, an open source 3D simulator based on Doom).
…Paper: Gated-Attention Architectures for Task-Oriented Language Grounding (CMU). (Notable, the last author is Ruslan Salakhutdinov, who splits his time between CMU and Apple)
…Approach: The approach taken by CMU researchers is to construct a modular neural network to let the agents complete tasks that require both an understanding of text and vision. To do this, they use a standard convolutional neural network block to interpret vision and a Gated Recurrent Unit to process the text. They then take these representations and combine them via what they call a Gated Attention multi-modal learning layer, which cleverly merges the different representations into a unified set of features. What you wind up with is an agent that can naturally learn to combine the text you feed it with its images of the world, then acts in the world using this single representation.
DeepMind uses a similar technique (with some bells and whistles based around ideas present in their UNREAL paper of last year) to create agents that learn curriculums of entangled links of words and object and generalize instantly (zero-shot adaptation) to previously unseen combinations of words or objects). The additional of auxiliary goal identification and acquisition helps learning via letting the agent create autoregressive objectives which help it model its surroundings.
Paper: Grounded Language Learning in a Simulated 3D World (DeepMind).

Matlab gets a free visualization upgrade:
…MIT researchers have created mNeuron, a free plug-in for popular math software Matlab. The plug-in visualizes neurons in neural networks and has support for Caffe and matconvnet.
…Come for the potentially useful tool for interpretability, stay for the ‘tessellation art’ technique that lets you take the visualizations of a single neuron and extend it into a large, repeating tapestry.
Keras gets a viz plugin as well:
Easy-to-use AI framework Keras also has its own visualization ecosystem. One handy tool looks to be Keras-vis, a toolkit for visualizing saliency maps, activation maximization, and class activation maps in models.

Amazon reveals its (many) AI priorities with Amazon Research Awards:
…Amazon has published a call for proposals for its Amazon Research Awards and it is willing to fund proposals to the tune of, at most, $80,000 in cash and $20,000 in Amazon Web Services promotional cloud credits.
The research: What’s most of note is the broad set of research areas Amazon is seeking proposals for – and some of them are particularly germane and specific to its work.
Notable research focus areas: …Apparel similarity…Personalization using personal knowledge base…advances in methods for estimating machine translation quality at run time…synonym and hypernym generation for eCommerce search…simulation of sensing and grasping for object manipulation, and so on.
Read more on the Amazon Research Awards page here.

Interdisciplinary Research: Automated Artists via Creative Adversarial Networks:
Researchers have tweaked a generative adversarial network so that it can be used to create synthetic artwork that feels more coherent and human than stuff we could previously generate.
…The approach, Creative Adversarial Networks (CANs), was outlined in a wonderfully interdisciplinary paper from researchers at Rutgers, Facebook, and the Department of Art History at the College of Charleston, South Carolina.
…CANs work somewhat like a generative adversarial network, except the discriminator now gives two signals back to the generator instead of one. First, it feeds back whether something qualifies as art (a discrimination based on it being pre-fed a large corpus of art) . Second, it gives a signal about how well it can classify the generator’s sample into an exact style.
…”If the generator generates images that the discriminator thinks are art and also can easily classify into one of the established styles, then the generator would have fooled the discriminator into believing it generated actual art that fits within established styles,” explain the authors.
…So, how good are the samples? The researchers carried out a quantitative evaluation where they showed human subjects (via mechanical turk) sets of paintings generated by, respectively, CANs, DCGAN, and via humans (across two sets: Abstract Expressionist and Art Basel 2016.)
Results: Human evaluators thought CAN images were generated by a human 53% of the time, versus 35% for DCGAN (and 85% for the human-generated abstract expressionist set).
…You can read more in the paper: “CAN: Creative Adversarial Networks, Generating “Art” by Learning About Styles and Deviating from Style Norms“. I rather liked some of them, reminiscent of Kandinsky via Pollack via Mondrian.

Google & DHS
sitting in a tree
…S.C.R.E.E.N.I.N.G!
…Google and the Department of Homeland Security have teamed up (via recent Google acquisition Kaggle) to create a competition to get data scientists to create algorithms to identify concealed items in images gathered by checkpoint body scanners.
…Total prize money: $1.5 million
…Sad trombone: Only US citizens or permanent residents can actually win money in this competition (though everyone can participate), somewhat going against the free-wheeling egalitarian nature of Kaggle.
More information on the competition on its Kaggle page here.

AI == War?:
…Alibaba chairman Jack Ma worries that artificial intelligence could lead to a third world war. …”The first technology revolution caused World War 1,” he told CNBC’s David Faber. “The second technology revolution caused World War II. This is the third technology revolution.”
AI == Inequality?:
…Chairman of VC firm Sinovation Ventures (and former head of Google) China Kai-Fu Lee, writing in the New York Times opinion pages, saysthe A.I. products that now exist are improving faster than most people realize and promise to radically transform our world, not always for the better. They are only tools, not a competing form of intelligence. But they will reshape what work means and how wealth is created, leading to unprecedented economic inequalities and even altering the global balance of power,” he writes.
…”Unlike the Industrial Revolution and the computer revolution, the A.I. revolution is not taking certain jobs (artisans, personal assistants who use paper and typewriters) and replacing them with other jobs (assembly-line workers, personal assistants conversant with computers). Instead, it is poised to bring about a wide-scale decimation of jobs — mostly lower-paying jobs, but some higher-paying ones, too.”
AI == An Amazing World, (if we make some changes)?:
…Michael Bloomberg says automation poses many risks to society but some of these can be re-mediated with policy changes. Health care should not be tied to employment, he says (a step taken by many Northern European and other countries already); governments should contemplate creating direct employment programs (as the US did with the New Deal back in a more optimistic time); benefits should be altered to subsidize low-income earners potentially via the Earned Income Tax Credit, and other ideas.
…”To spread the benefits of the age of automation far and wide, we’ll need more cooperation among government, business, education, and philanthropic leaders,” he writes in a column in, naturally, Bloomberg BusinessWeek..

What happens if only a few industries automate themselves too rapidly?
AI is going to bring about more opportunities for automation. The multi-trillion dollar question is how rapidly different industries will automate and what the aggregate effect will be. That relates to some of the issues the above people have been grappling with.
…I worry that there’s a way that uptake of AI can lead to pretty adverse effects. In the 20th century America went through a couple of revolutions, with both agriculture and manufacturing undergoing mass automation, leading to a significant reduction in their share of the overall economy.
…This was broadly good for the industries themselves, letting them feed and produce more far more efficiently. It wasn’t so bad for the displaced workers, either, because at the same time new technologies were unlocking new jobs, like automobiles creating entirely new occupation categories, or because the rest of the economy was growing rapidly enough to enlarge other industries, like the service sector.
…If AI is adopted unevenly, then it’s possible that those industries that turn to it will become a proportionally smaller part of the overall economy in terms of employment through a more efficient workforce, leading to a small well-remunerated class of specialized workers in automated industries, and poorer workers in the rest of the economy. The question is whether other industries will keep on growing – and that part is a real wildcard. If they don’t then they’ll become a stagnant drag on the economy, especially if they’re unable to access AI technologies used in other industries, and the gap between different levels of compensation could continue to widen. We’re already seeing some indicators of this kind of effect in the tech industry which pays its employees very highly but doesn’t in the aggregate boost national employment much at all.
…for an example of this worrying trend in action, check out this New York Times article about how post-industrial towns are now struggling with a stagnant physical retail market (likely partially due to online shopping displacing in-store shopping.) As a resident points out, all the good jobs with companies like Amazon that are leading to the physical retail decline are located near large metropolitan areas, hundreds of miles away. Where do the locals get to work?

Amazon files patent for drone delivery towers:
[Year 2035: Megacity 700, a flock of drones, like so many metal starlings, billow out of a gigantic tower, ferrying bright yellow packages to innumerable residents across the city.]
…Amazon’s patent for the “Multi-Level Fulfillment Center for Unmanned Aerial Vehicles” here.

MIT’s Senior House clampdown: Intervention or Culture-Washing?
MIT officials are seeking to shut down Senior House, a student community in MIT that houses “a disproportionately high number of people of color, LGBT students, the socioeconomically disadvantaged” and other oddball students, according to Save Senior House, a student-led initiative to lobby for preserving the accommodation. “In terms of diversity it is one of the most representational distribution of these factors that existed on campus, and maybe one of the best in all of higher education.”
…MIT says that the house had particularly low graduation rates, higher drug use, and faced more mental health issues, so it wants to step in and change the set-up. Save Senior House says many of these factors stem more from the diversity of the house rather than than what the students choose to do within it.
…MIT has evicted all residents, and will replace them with a new cohort starting in Autumn 2017 called ‘Pilot 2021‘.(A parody site of which is available here.)
… Sarah Schwettmann, a graduate student mentor who lived in Senior House, says: “In the Senior House community many residents find – some for the first time – what feels like home. Last Monday, I was given 48 hours notice of my eviction from Senior House, along with the other graduate mentors who normally remain in the House over the summer to integrate new and returning students in the fall. Now, police and security personnel guard an empty building, whose past residents valued openness and diversity. We’re experiencing action from the MIT administration that is both heavy-handed and disproportionate. This effort, undertaken while students are away from campus for the summer, eradicates a unique part of campus culture and restructures a new community from the top down.
…As someone who was hired to support this community, I see this as an administrative failure to support some of the most vulnerable and stigmatized members of MIT. These students present the institute with a challenge, and one not unique to MIT: how do we build a platform for the historically marginalized to define their own success in a rigorous academic environment, craft their own system of values, and learn to support themselves and each other? Such issues will accompany these students to wherever they reside on campus, so long as the institute continues to admit them. Senior House provided a community-driven solution, a work in progress engineered from the bottom up. In my eyes, MIT is dismantling that solution, and a century of history: cleaning house by sweeping the challenge itself under the rug.”
Expanded statement available here.

OpenAI Bits&Pieces:

OpenAI’s Ilya Sutskever spoke at the ACM Turing conference in San Francisco this week. You can find out more about the conference here and find video recordings of the panel and others here on the ACM’s Facebook page.

Tech Tales:
[ 20??: A park in a city. Winter. Frost on the ground. Some deer ferrying lost fawns across the park to be reunited with their mothers. ]

What year is it? Who are you? Where did you grow up? Why are you here? See how many of these and other questions you can answer before the timer runs out! Says the text on your tablet. In the bottom right-hand corner is a little red timer, counting down to zero. Five hours left. See how many points you can get before the time runs out! You don’t know much, yet.

Temporary Brain Wiping is what the neuroshrinks call it. Mental Fresh Air is what its fans call it. Lobotomy Cult is what the media calls it. You don’t know what you call it, because you’ve forgotten.

You know you must have agreed to initiate the wipe. You know some basic things, like how physics works, how to speak, how to read. But most of your memory is… not present. You know that you have memories but you can’t access them right now. It’s like they’re trapped in smoky glass – you can discern faint outlines, but there’s no resolution, nothing to put a hand on.

You see an older woman walking her dog. “Excuse me, what year is it?” you ask.
“Oh dear you’re going to have to try harder than that. We get a lot of your type around here now.”
“Can you give me a clue?”
“Well, when I was a young girl there was a band called the Spice Girls. They were the first CD I bought.”
“Thanks,” you say. Watch her as she walks away. Spice Girls, you think, dredging through your partially occluded memory. You don’t remember anything specific, but it feels old. The woman was old enough to have faded into a kind of graying twilight – anywhere between 50 and 80, depending on lifestyle and genetic lottery and, sadly probably, wealth. Where am I? You think. There are trees, very few houses, some elaborate old-looking buildings. People. The woman had a British accent. If you get to high ground you can see if there are any landmarks that the wipe didn’t get.
You study other people in the park, unsure whether they’re like you – temporarily marooned, mentally cut off from things – or if they’re a part of this world, emneshed in it through memory.

A few minutes later and you’re at a play-park, quizzing kids about what year it is. They all think your question is silly.
“What’s your name?” they ask.
“I don’t know.”
“Did you wipe yourself? My Dad does that when he gets sad some times. Were you sad?”
“I’m not sure. I hope not. I think I’m playing a game. Do you know what city this is?”
“London. I don’t understand this game-”
“-I LIKE TO REMEMBER EVERYTHING!,” blurts out another kid, before running up to the top of a slide and going down again. They hop off the bottom and run up to you. “The metal was cold but it was slippery and I went down really fast and because I was so fast there was wind and it meant there was air in my eyes. The metal at the bottom is very cold. I’m going to remember this forever,” they say, then they close their eyes and frown to themselves and you imagine them muttering to themselves internally remember remember remember. A memory rears up at you; you’re wearing pajamas sat on top of your bunk bed, staring at a shoe-box full of junk electronics, trying to assemble intelligence out of logo. The door opens and — the memory fades back into glass. Your parent? Who?

As the red timer ticks down you wonder about what you’re going to find when it releases. What happens when the memories come back? And where will you be? You walk away from the park, head for higher ground, hope that when your life comes back to you you’ll be gazing over a city that you know in a park that is familiar with friends in the distance. You hope for these things because they seem likely, but you have no way to be sure. You wonder what happens if, instead of letting the timer run out, you press the “extend” button, playing out the amnesia a little longer. You press it. Denied, the screen says. Too Many Extends In One Session. Please seek NeuroAttention for Evaluation Following Closure of Sequence. How many times can you loop out of your own memory? You don’t remember. Close your eyes. Hold the tablet in your hand. Feel the wind on your face. Wait for yourself to become yourself again.

Technologies that inspired this story: whatever the current memory substrate within neural nets ends up being, brain-computer interfaces, recursion.

Import AI Issue 47: Facebook’s AI agents learn to lie, OpenAI and DeepMind use humans to train safe AI, and what TensorFlow’s new release says about future AI development

by Jack Clark

Facebook research: Misdirection for NLP fun and profit:
New research from Facebook shows how to teach two opposing agents to bargain with one another — and along the way they learn to deceive each other as well.
…”For the first time, we show it is possible to train end-to-end models for negotiation, which must learn both linguistic and reasoning skills with no annotated dialogue states. We also introduce dialogue rollouts, in which the model plans ahead by simulating possible complete continuations of the conversation, and find that this technique dramatically improves performance,” they write.

Images of the soon-to-be normal:
This photograph of a little food delivery robot blocking traffic is a herald of something that will likely become much more commonplace.

Predicting Uber rides with.. Wind speed, rider data, driver data, precipitation data, temperature, and more…
Uber has given details on the ways it is using recurrent neural networks (RNNs) to help it better predict demand for its services (and presumably cut its operating costs along the way).
…The company trained a model using five years of data from numerous US cities. The resulting RNN  has good predictive abilities when tested across a corpus of data consisting of trips taken across multiple US cities over the course of seven days before, during, and after major holidays like Christmas Day and New Year’s Day. (Though there are a couple of real-world spikes that seem so drastic its predictions low-ball them, suggesting it hasn’t seen enough of those incidents to recognize their warning indicators.)
…Uber’s new system is significantly better at dealing with spiky holiday days like Christmas Day and New Year, and it slightly improves accuracy on other days such as MLK Day and Independence Day.
…Components used: TensorFlow, Keras. Lots of computers.

Job alert!
The Berkman Klein Center for Internet & Society at Harvard University is seeking a project coordinator to help it with its work on AI, autonomous systems, and related technologies. Apply here. (Also, let’s appreciate the URL for this job and how weird it could have seemed to someone a hundred years ago – cyber.harvard.edu/…./AIjob )

AI video whiz moves from SF, USA, to Amsterdam, Netherlands. But why…?
…Siraj Ravel has moved from the US to Amsterdam for a change of scene. Now that he’s settled in he has started a new video course (available on YouTube) called The Math of Intelligence. Check it out.
…I asked Siraj what his impressions were of the AI community in Amsterdam and he said this (emphasis mine): “The AI community is absolutely thriving in Amsterdam, specifically the research portion. I’ve met more researchers at Meetups here than I have for years in SF. I also briefly visited Berlin and met some amazing data scientists there. The bigger trend is that governments in the EU (France, Netherlands, Germany) are heavily investing in tech R&D and the brightest minds are taking notice. I am the son of immigrants to the USA, but I am not afraid to myself immigrate if necessary. Progress can’t wait, and the Netherlands understands this.” Sounds nice, and the pancakes are great as well.

Googlers create a single multi-sensory network: One Model To Rule Them All.
Welcome to the era of giant frankenAIs:
… Researchers from Google have figured out how to bake knowledge about a broad spectrum of domains into a single neural network and then train it in an end-to-end way.
…”In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains.,” they write. “The key to success comes from designing a multi-modal architecture in which as many parameters as possible are shared and from using computational blocks from different domains together. We believe that this treads a path towards interesting future work on more general deep learning architectures”.
..Prediction: As this kind of research becomes viable we’ll see people gather huge datasets and train single models together with a broad range of discriminative abilities. The next shoe to drop will be innovations in fundamental neural network building block components to create finer-grained classification and inference abilities in these neural network models and encourage more cases of transfer learning.
Notable: Others are thinking along similar lines – last week’s Import AI covered a new MIT research paper that blends sound and vision and text into a single meta-network. 

Pay attention to Google’s new attention paper:
Google researchers have attained state-of-the-art results in areas like English-to-German translation with a technique that is claimed to be significantly simpler than its forebears.
…The paper, Attention is All You Need, proposes: “the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output.”
…In other words, the researchers have figured out a way to reduce the number of discrete ingredients that go into the network, swapping out typical recurrent and convolutional mapping layers with ones that use intention instead.
…”We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video. Making generation less sequential is another research goal of ours.”
…It seems that research into things like this will create further generic neural network building blocks that can be plugged into larger, composite models – just like the above ‘One Model to Rule Them All’ approach. Watch for collisions!

Long-brewing research from Vicarious: Learning correspondences via (for now) hand-tuned feature extractors:
…One puzzle reinforcement learning researchers struggle with is how algorithms end up evolving to over-fit their environment. What that means in practice is if you suddenly were to, say, change the geometry of the Go board AlphaGo was trained on, or alter the placement of enemies and obstacles in Atari games, the AI might fail to generalize.
…Now, research from Vicarious – An AI startup with backing from people like Jeff Bezos, Mark Zuckerberg, Founders Fund, ABB, and others – proposes a way to ameliorate this flaw. This marks the second major paper from Vicarious this year.
…Their approach relies on what they call Schema Networks, which lets their AI learn the underlying dynamics of the environment it is exposed to. This means, for instance, that you can alter the width of a paddle in Atari Game breakout, or change the block positions, and the trained algorithm can generalize quickly to the new state, preserving its underlying understanding of the dynamics of the world built up during training. Traditional RL algorithms tend to struggle with this as they’ve learned a predictive model of the world as it is and struggle with learning more abstract links.
…There’s a small catch with Vicarious’s approach – the researchers had to do the object segmentation and identification themselves then feed that to the AI. In reality, one of the greatest challenges computer vision researchers face is accurately mapping and segmenting non-stationary images (and its even harder as they get deployed in the chaotic real world, as they need to link parts of a flat 2D image to messy 3D objects. I’m keen to see what happens when this algorithm can do the feature isolation itself.
Noteable: Meanwhile, DeepMind have published Relational Networks (claiming SOTA and superhuman performance) and Visual Interaction Networks, two philosophically similar research papers that hew closer to traditional deep learning approaches. Just as you and I use abstract logic to let us reason about the world, it seems likely AI will need the same capabilities.

Just what the heck does a career in AI policy look like?
…Twitter’s AI paper tsar Miles Brundage has published an exhaustive document outlining a Guide to Working in AI Policy and Strategy up on 80,000 hours. (And watch out for the nod to Import AI – thanks Miles. I’ll do my best!)

(Mildly) Controversial Microsoft/Maluuba research paper: Using rewards is easy, finding them is hard:
…A new research paper from Microsoft’s recent Canadian AI acquisition Maluuba, Hybrid Reward Architecture for Reinforcement Learning, shows how to definitively beat Ms. PacMan (clocking over a million points.). Ms PacMan, along with Montezuma’s Revenge, is one of the games that people have found consistently quite challenging, so it’s a notable result – though not as encouraging as on first look, when you work out what is required for the process to work.
..When you go and analyze its Hybrid Reward Architecture- you see that the approach is distributed, with Microsoft splitting up the task into many discrete sub-tasks which numerous reinforcement learning agents try to solve, while feeding their opinions up into a single meta-agent that helps to take decisions. Though it scores highy, the approach involves a lot of human specification, including hand-labeling different reward penalties and rewards for different entities in the game. As with the Vicarious paper, the technique is interesting, but it feels like it’s missing a key component – unsupervised extraction of entities and reward levels/penalties.

What TensorFlow v1.2 says about devices versus clouds:
Google has released version 1.2 of TensorFlow. There’s a ton of fixes and tweaks (eg, for RNN functionality), but buried in the release details is the note that Google will stop directly supporting GPUs on Mac systems (though will continue to accept patches from the community). There are likely a couple of reasons for this: one, the lack of much of an NNVIDIA ecosystem around macs (c. f Apple’s new external GPU for the Mac Pro runs AMD cards, which are yet to develop as much of a deep learning ecosystem.)
…Another way of looking at this is that the cloud wins AI. For now at least AI benefits from parallelization and the usage of large numbers of CPUs and GPUs together, with most developers either using a laptop paired with an external pool of cloud resources, and/or running their own Linux deep learning rig in a desktop tower.
Details: Tensorflow v1.2 on GitHub.

Snapchat’s first research paper: mobile-friendly neural nets with full-fat brains.
Researchers with Snap Inc. and the University of Iowa have published SEPNETs: Small and Effective Pattern Networks.
…tl;dr: a way to shrink trained models then recover them to restore some accuracy.
…It tackles one of the problems AI’s success has led to: the creation of increasingly large, deep models that have tremendous performance but take up a lot of space when deployed. Ultimately, the ideal scenario for AI development is to be able to train a single gigantic model on a nearby football-field filled with computers, then be able to have a little slider to shrink the trained model for deployment on various end-user devices, whether phones, or computers, or VR headsets, or something else. How do you do that? One idea is to try to smartly compress these trained models, either by lopping away at chunks of the neural network, or by scaling them down in a more disciplined way. Both methods see you tradeoff overall accuracy for speed, so fixing this requires new research. The Snapchat paper represents one contribution:
The details: First they use a technique called pattern binarization to shrink a pre-trained network (for instance, a tens-of-millions-of-parameters VGG or Inception model) into a smaller version of itself, at the cost of it losing some discriminative capabilities. They propose to fix this with a new neural network component they call a Pattern Residual Block. This component can sometimes help offset the changes wrought on the numbers its dealing with via the binarization process.They then use Group-Wise Convolution to further winow down the various components of the network. Shrinking it.
…Results:.Google MobileNet: 1.3 million params, 5.2mb bytes, accuracy 0.637
…Results:SEP-NET-R(Small) 1.3 million params, 5.2mb bytes, 0.658

Free pre-trained models, get your pre-trained mobile-friendly models right here!
…Google unfurls MobileNets to catch intelligence on the phone:
In possibly related news Google has released MobileNets, a collection of “mobile-first computer vision models for TensorFlow”.
…”MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. ”
…The available models vary from bite-size ones of 0.47 million parameters to larger ones of 4.24 million, with image accuracies ranging from 66.2 to 89.9% for the larger models.
Github repo here.

Speeding up open access reviews:  There’s a suggestion that Open Review – a platform that makes feedback and evaluation of papers public – is considering layering some aspect of its system over Arxiv, letting us not only publish preprints rapidly, but potentially review them as well.
…Note: None of this is meant to say that double blind reviewing is bad – it’s good, especially for significant papers with particularly controversial claims. But I think due to the breakneck speed at which AI moves at it’s necessary to try and speed things up if possible. This suggests one way to more rapidly gather better feedback on new ideas.
…How it might be used: Last week Hochreiter & co published the SELU paper. It’s gathered a lot of interest with numerous people running their own tests, chiming in with comments, or going through its 90+ page appendix. It’d be very convenient if there was a layer that let us put all this stuff in the same place.

Dollars for Numpy: Numpy has been given a little over half a million dollars from the Gordon Moore foundation to fund improvements to the Python scientific computing library. Numpy is absolutely crucial to neural networks within Python.

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to david@amplifypartners.com.

Tech Tales:
[2035: The North East Canadian wilderness.]

It’s wet. There’s moss. The air has that peculiar clarity that comes from being turned by wind and freshened by water and replenished by the quiet green and brown things of nature. You breathe deeply as you walk the rarely used trail. Your feet compress the nested pine needles beneath you, sending up gusts of scented air. In the distance, you hear the sound of roiling running water and, beneath it, bells tolling.

You keep going. The sound of the water and of the bells gets louder. The bells echo out little sonorous rhythms that seem intertwined with the sounds of gushing river water. One of the bells is off – its timing discordant, cutting against the others. You begin to crest a small hill, and as your head clears it the sound rushes at you. The bells clang and the water thrums – their interplay not exactly abrasive – for the off bell is one of many – but somehow more mournful than you recall the sound being before.

The bells are housed in a small concrete tower, about 5 feet high that sits by the riverbank at a point where there’s a kink in the river. It has three walls, with the fourth left open to the elements, facing the river, broadcasting the sounds of the bells. You run your hands over the cold, mottled, lichen-stippled exterior as you approach its entrance. Close your eyes. Open them when you’re in front of the shrine. You study the 12 bells of the dead, able to make out the inscribed names of the gone, despite the movement of the bells. Now you just need to diagnoze why one of the bells seems to have fallen out of alignment with the others.

As you sit, studying the wiring in the shrine and watching the bells, it’s impossible not to think of your friends and how they are now commemorated. You all work for the government on geographic survey. As the climate has been changing your teams have been pushing further north for more of the year, trading safety for exploration (and the possibility of data valuable to resource extraction companies). You were at home, laid up with a broken leg, when the team of 12 went out. They were doing a routine mapping hike, away from camp, when the storm came in – it strengthened far more rapidly than their computer models anticipated and, due to a set of delicate occurrences, it brought snow and ice with it. Temperatures plunged. Snow-cladded everything. Rain was either flecked with ice or snow or a contributor to a sheet of frozen fog that lay over the land. Your colleagues died due to about 50 things going wrong in a very precise sequence. These things, as hysterical as it seems, happen.

The bells are set to dance to the rhythm of the river. Their loops are determined by observations from cameras atop the shrine, pointed at the writhing river. This visual information is then fed into an algorithm that is forever trying to find a pattern in the infinite noise of the river. After an hour you have the sense to give the cameras more than a cursory look and you discover that a spider has made a small nest near the sensor bulge, and one thick strand of webs is slung in front of one of the camera lenses. This, you figure, has injected a kind of long-term stability into part of the feed of data that the algorithm sees, swapping a patch of the frothing slate and white and dark blue and brown of the river-water with something altogether more stagnant.  Fixing it would be as simple as putting on a glove and carefully removing the spiderweb, then polishing the lens. You hold your hand up in front of the web to get a sense of how it would be to remove it and as your hand passes in front of the cameras the bells change their rhythm, some stuttering to a stop and others speeding up, driven to a frenzy by the changed vision. You put your hand down and the bells go back to their tolling, with the one that seems to be affected by the spiderweb still acting out of order.

When you file your report you say reports of odd sounds appear to be erroneous and you discovered no such indicators during your visit. You take comfort in knowing that the bells will continue to ring, driven increasingly by the way the world grows and breaks around them, and less by the prescribed chaos of the river.

Technologies that inspired this story: Attention, generative models, joint neural networks, long-short term memory

OpenAI Bits&Pieces:

OpenAI and DeepMind train reward functions via human feedback: A new research collaboration between DeepMind and OpenAI on AI safety sees us train a AI agents to perform behaviors that they think humans will approve of. This has implications for AI safety and has promising sample efficiency as well.

OpenAI audiopodcast about reinforcement learning by Sam Charrington with OpenAI/UC Berkeley robot chap Pieter Abbeel.

Import AI: Issue 46: Facebook’s ImageNet-in-an-hour GPU system, diagnosing networks with attention functions, and the open access paper debate

by Jack Clark

Attention & interpretability: modern neural networks are hard to interpret because we haven’t built tools to make it easy to analyze their decision-making processes. Part of the reason why we haven’t built the tools is that it’s not entirely obvious how you get a big stack of perceptual math machinery to tell you about what it is thinking in a way that is remotely useful to the untrained eye. The best thing we’ve been able to come up with, in the case of certain vision and language tasks, is attention where we visualize what parts of a neural network – sometimes down to an individual cell or ‘neuron’ within it – is activating in response to. This can help us diagnose why an AI tool is responding in the way it is.
.., Latent Attention Networks, from researchers with Brown University proposes an interesting way to improve our ability to analyze nets: by creating a new component to make it easier to visualize the attention of a given network in a more granular manner..
…In the paper they introduce a new AI component, which they call a Latent Attention Network. This component is general, working across different neural network architectures (a first, the researchers claim), and only requires the person to fiddle with it at its input or output points. LANs let them fit a so-called attention mask over any architecture.
…”The attention mask seeks to identify input components of x that are critical to producing the output F(x). Equivalently, the attention mask determines the degree to which each component of x can be corrupted by noise while minimally affecting F(x),” they write.
…The researchers evaluate the approach on a range of tasks from simple (MNIST! CIFAR) and to a game of Pong from the Atari Learning Environment. The ensuing visualizations seems to be helpful for getting a better grasp of how and why neural network classifiers work. I particularly recommend studying the images from Pong.
Why it could be useful: this technique hints at a way to be able to take a generic component and simply fit it to an arbitrary network, then get the network to cough up some useful information about its state – if extended it could be a handy tool for AI diagnosticians.

Self-Normalizing Neural Networks cause a stir: A paper from researchers with the Bioinformatics Institute in Austria proposes a way to improve feed forward neural network performance with a new AI component, Self-Normalizing Neural Networks. “FNNs are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations,” they write.
…The paper is thorough and is accompanied with a code release, aiding rapid replication and experimentation by others. The researchers carry out exhaustive experiments, bench-marking their approach (based around a SELU, a scaled exponential linear unit) against a movable feast of other AI approaches, ranging from Residual Nets, to Highway Networks, to weights with Batch Normalization or Layer Normalization, and more.
…They test the method exhaustively as well. “We compared SNNs on (a) 121 tasks from the UCI machine learning repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with standard FNNs and other machine learning methods such as random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy data set,” they write.
Noteable fact: One of the authors is Sepp Hochreiter, who invented (along with Juergen Schmidhuber) the tremendously influential Long-Short Term Memory networks component, aka the LSTM. LSTMs are used exhaustively in AI these days for tasks ranging from object detection to speech recognition and the paper has over 4500 citations (growing with the massive influx of new AI research into memory networks, differentiable neural computers, Neural Turing Machines, and so on).
…The Self-Normalizing Neural Networks paper is amply thorough, weighing in at an eyebrow-raising 102 pages, split between the research paper (9 pages) with the other pages devoted to comprehensive theoretical analysis, experiments, and – of course – references, to back it up. More of this European precision, please!

Open publishing (Arxiv) versus slow publishing (Conferences and Journals). 
The Hochreiter paper highlights some of the benefits of the frenetic attention that publishing on Arxiv can bestow, along with/instead of traditional (relatively slow-burning) conferences and journals. I think the trade-off between speed of dissemination and lack of peer review is ultimately worthwhile, though some disagree.
…Yoav Goldberg, a researcher who has done work at the intersection of NLP and Deep Learning, writes that Arxiv can also lead to people having an incentive to publish initial versions of papers that are thin, not very detailed, and that serve more as flag-planting symbols for an expected scientific breakthrough than anything else. These are all legitimate points.
…Facebook AI Researcher Yann Lecun weighed in and (in a lengthy, hard-to-link to note on Facebook) says that the open publishing process allows for rapid dissemination of ideas and experimentation free of the pressure to publish papers at a conference.As of the time of writing the nascent AI blogosphere continues to be roiled by this drama, so I’m sure this boulder will continue to roll.
…(For disclosure: I side more toward favoring the Arxiv approach and think that ultimately bad papers and bad behavior gets weeded out by the community over time. It’s rare that people accept a con. Deep Learning has been in hyper-growth mode since the 2012 AlexNet paper, so it’s natural things are a bit fast-moving and chaotic right now. Things may iron themselves out over time.

Compute as AI’s strategic fulcrum: the AI community is getting much better at training big neural network models. Latest case in point comes from Facebook, which has outlined a new technique for training large-scale image classifiers.
…Time to train ImageNet in 2012: A week or two across a single GPU, with need for loads of custom CUDA programming..
Time to train ImageNet in 2017: One hour across 256 GPUs. Vastly improved&simplified software+hardware..
…Although, as someone commented on Twitter, most people don’t easily have access to 256 GPUs.

Better classifiers through combination: DING DONG! DING DONG! When you read those four words there’s a decent chance you also visualized a big clock or imagined some sonic representation of a clock chiming. Human memory seems to work like this, with a sensory experience of one entity inviting in a bunch of different, complementary representations. Some believe it’s this fusion of senses that gives us such powerful discriminative abilities.
…Wouldn’t it be nice to get similar effects in deep learning? From 2015 onwards people started experimenting en mass with getting computers to better understand images by training the nets on paired sets of images and captions, creating perceptual AI systems with combined representations of entities. We’ve also seen people more recently experiment with training audio and visual data together. Now, scientists from MIT have combined visual, audio, and text, into the same network.
...The data: Almost a million images (COCO & Visual Genome), synchronized with either a textual description or an audio track (377 continuous days of audio data, pulled from over 750,000 Flickr videos).
...How it works: the researchers create three different networks to ingest text, audio, or picture data. The ensuing learned representations from all of these networks are outputted as fixed length vectors with the same dimensionality, which are then fed into a network that is shared across all three input networks. “While the weights in the earlier layers are specific to their modality, the weights in the upper layers are shared across all modalities,” they write.
Bonus: The combined system ends up having cells that activate in the presence of words, pictures, or sounds that correspond to subtle types of object, like engines or dogs.

Bored with the state of your supply chain automation? Consider investing in an autonomous cargo boatthe new craze sweeping across commodities makers worldwide!, as companies envisage a future where autonomous mines (semi-here) take commodities via autonomous trains (imminent) to autonomous ports (here) to the yet-to-be-built autonomous boats.

4K GAN FACES: A sight for blurry, distorted eyes. Mike Tyka has written about his experiments to use GANs to create large, high-resolution entirely synthetic faces.
…The results are quite remarkable, with the current images seeming as much a new kind of impressionism as realistic photographs, (though only for sub-sections of every given image, and sometimes wrought with Dali-esque blotches and Bacon-esque flaws)..
…”as usual I’m battling mode collapse and poor controllability of the results and a bunch of trickery is necessary to reduce the amount of artifacts,” he writes. G’luck, Mike!

You are not Google (and that’s okay): This article about knowing what large-scale over-engineered technology is worth your while and what is out of scope is as relevant for AI researchers and engineers as it is for infrastructure people.
…Bonus: the invention of the delightfully German-sounding acronym UNPHAT.

What China thinks about when China thinks about AI: A good interview with Oregon professor Tom Diettrich in China’s National Science Review. We’re entering an era where AI becomes a tool of geopolitics as countries flex their various strengths in the tech as part of wider national posturing. So it’s crucial that scientists stay connected with one another, talking about the issues that matter to them which transcend borders.
…Diettrich makes the point that modern AI is about as easy to debug as removing all the rats from a garbage dump. “Traditional software systems often contain bugs, but because software engineers can read the program code, they can design good tests to check that the software is working correctly. But the result of machine learning is a ‘black box’ system that accepts inputs and produces outputs but is difficult to inspect,” he says.
AI in China: “Chinese scientists (working both inside and outside China) are making huge contributions to the development of machine learning and AI technologies. China is a leader in deep learning for speech recognition and natural language translation, and I am expecting many more contributions from Chinese researchers as a result of the major investments of government and industry in AI research in China. I think the biggest obstacle to having higher impact is communication,” he says. “A related communication problem is that the internet connection between China and the rest of the world is often difficult to use. This makes it hard to have teleconferences or Skype meetings, and that often means that researchers in China are not included in international research projects.”

Building little pocket universes in PyTorch: This is a good tutorial for how to use PyTorch, an AI framework developed by Facebook, to build simple cellular automata grid worlds and train little AI agents in them.
…It’s great to see practical tutorials like this (along with the CycleGAN implementation & guide I pointed out last week) as it makes AI a bit less intimidating. Too many AI tutorials say stuff like “Simply install CUDA, CuDNN, configure TensorFlow, spin-up a dev environment in Conda, then swap out a couple of the layers.” This is not helpful to a beginner, and people should remember to go through all the seemingly-intuitive setup steps that go with any deep learning system..
…Another great way to learn about AI is to compete in AI competitions. So it’s no surprise Google-owned Kaggle has passed one million members on its platform. Because Kaggle members use the platform to create algorithms and fiddle with datasets via Kaggle Kernels, it seems like as membership scales Kaggle’s usefulness will scale proportionally. Congrats, all!

Compete for DATA: CrowdFlower has launched AI For Everyone, a challenge that will see two groups every quarter through to 2018 compete to get access to free data on CrowdFlower’s eponymous platform.
…Winners get a free CrowdFlower AI subscription, a $25,000 credit towards paying for CrowdFlower contributors to annotate data, free CrowdFlower platform training and boarding, and promotion of their results.

OpenAI Bits & Pieces:

Talking Machines – Learning to Cooperate, Compete, and Communicate: This is a follow-up to our previous work on getting AI agents to invent their own language. Here we combine this ability with the ability to train multiple agents together with conflicting goals. Come for the science, stay for the amusing GIFs of spheres playing (smart!) tag with one another. Work by OpenAI interns Jean Harb and Ryan Lowe from McGill University, plus others..

Better exploration in deep learning domains: New research: UCB and InfoGain Exploration via Q-Ensembles & Parameter Space Noise for Exploration.

Tech Tales:

[ 2045: Outskirts of Death Valley, California. A man and a robot approach a low-building, filled with large, dust-covered machines, and little orange robot arms on moving pedestals that whizz around, autonomously tending to the place. One of them has the suggestion of a thatched hairpiece, made up of a feather-coated tumbleweed that has snared into one of its joints.]

You can’t be serious.
Alvin, it’ll be less than a day.
It’s undignified. I literally cured cancer.
You and a billion of your clones, sure.
Still me. I’m not happy about this.
I’m going to take you out now.
No photographs. If I sense a single one going onto the Internet I’m going to be very annoyed.
Sure, you say, then you unscrew the top of Alvin’s head.

Alvin is, despite its inflated sense of importance, very small. Maybe about half a palm’s worth of actual computer, plus a forearm’s worth of cabling, and a few peripheral cables and generic sensor units that can be bound up and squished together. Which is why you’re able to lift his head away from his body, unhook a couple of things, then carefully pull him out. Your own little personal, witheringly sarcastic, AI assistant.

Death Valley destroys most machines that go into it, rotting them away with the endless day/night flexing of metal in phase transitions, or searing them with sand and grit and sometimes crusted salt. But most machines don’t mind – they just break. Not Alvin. For highly sensitive, developed AIs of its class the experience is actively unpleasant. Heat leads to flexing in casing which leads to damage which leads to improper sensing which gets interpreted as something a vast group of scientists has said corresponds to the human term for pain. Various international laws prohibit the willful infliction of this sort of feeling on so-called Near Conscious Entities – a term that Alvin disagrees with.

So, unwilling to violate the law, here you are at Hertz-Rent-a-Body, transporting Alvin out of his finely-filigreed silver-city Android body, into something that looks like a tank. You squint. No, it’s actually a tank, re-purposed slightly; its turret sliced in half, its snout capped with a big, sensor dome, and the bumps on its front for storing smoke flares now contain some directional microphones. Aside from that it could have teleported out of a war in the previous century. You check the schematics and are assured fairly quickly that Death Valley won’t pose a threat to it.
…Ridiculous, says Alvin. So much waste.

You unplug the cable connecting Alvin to the suit’s speaker, and carry him over to the tank. The tank senses you, silently confirms the rental with your bodyphone, then the hatch on its roof sighs open and a robotic arm snakes out.
Welcome! Please let us accommodate your N.C.E codename A.L.V.I.N the arm-tank says, its speakers crackling. The turret shifts to point to the electronics in your hands.
Alvin, having no mouth due to not being wired up to a speaker, flashes its output OLEDs angrily, shimmering between red and green rapidly – a sign, you know from experience, of the creation and transmission of a range of insults, both understandable by conventional humans and some highly specific to machines.
Your N.C.E has a very large vocabulary. Impressive! chirps the tank.
The robot arm plucks Alvin delicately from your hands and retracts back into the tank. A minute passes and the tank whirs. A small green light turns on in the sensor dome on the tip of its turret. One of its speakers emits a brief electronic-static burp, then-
I am too large, says Alvin, through the tank. They want me to do tests in this thing.

Five minutes later and Alvin is trundling to and fro in the Hertz parking lot, navigating between five orange cones set down by another similarly-cheerful robotic arm on a movable mount. A couple more tasks pass – during one U-Turn Alvin makes the tank shuffle jerkily giving the appearance of a sulk – then the Hertz robot arm flashes green and says We have validated movement policies. Great driving! Please return to us within 24:00 hours for dis-internment!

Alvin trundles over to you and you climb up one one of his treads, then hop onto the roof. You put your hand on the hatch to pull it open but it doesn’t move.
You’re not coming in.
Alvin, it’s 120 degrees.
I’m naked in here. It would make me uncomfortable.
Now you’re just being obtuse. You can turn off your sensors. You won’t notice me.
Are you ordering me?
No, I’m not ordering you, I’m asking you nicely.
I’m a tank, I don’t have to be nice.
That’s for sure.
I’ll drive in the shade. You won’t get too hot. Medically, you’re going to be fine.
Just drive, you say.
You start to trundle away into the desert. Have a good day! Shouts the Hertz robotic arm from the parking lot. Alvin finds the one remaining smoke grenade in the tank and fires it into the air,back towards the body rental shop.
We have added the fine for smoke damage to your bill. Safe driving! crackles the robot arm through the distant haze.

Technologies that inspired this story: Human-computer interaction, survey of AI experts about AI progress, the work of Dr. Heather Knight, robotics. human-computer interaction.

Import AI: Issue 45: StarCraft rumblings, resurrecting ancient cities with CycleGAN, and Microsoft’s imitation data release

by Jack Clark

Resurrecting ancient cities via CycleGAN: I ran some experiments this week where I used a CycleGAN implementation (from this awesome GitHub repo) to convert ancient hand-drawn city maps (Jerusalem, Babylon, London) into modern satellite views.
…What I found most surprising about this project was the relative ease of it – all it really took was a bit of data munging on my end, and having the patience to train a Google Maps>Google Maps Satellite View network for about 45 hours or so. The base model generalized well – I figure it’s because the Google Maps overhead street-views have a lot of semantic similarity to pen and brush-strokes in city illustrations.
…I’m going to do a few more experiments and will report back here if any of it is particularly interesting. Personally, I find that one of the best ways to learn about anything is to play with it, aimlessly fiddling for the sheer fun of it, discovering little gems in unfamiliar ground. It’s awesome that modern AI is so approachable that this kind of thing is possible.
…Components used: PyTorch, a CycleGan implementation trained for 45 hours, several thousand map pictures, a GTX 1070, patience, Visdom.

Learning from demonstrations: An exciting area of current reinforcement learning research is to develop AI systems that can learn to perform tasks based on human demonstrations, rather than requiring a hand-tuned reward function. But gathering data for this at scale is difficult and expensive (just imagine if arcades were more popular and had subsidized prices in exchange for collecting your play data!). That’s why it’s great to see the release of The Atari Grand Challenge Dataset from researchers at Microsoft Research, and Aachen University. The dataset consists of ~45 hours of playtime spread across five Atari games, including the notoriously hard-to-crack Montezuma’s Revenge.

AI’s gender disparity, visualized: AINow co-founder Meredith Whittaker did a quick analysis of the names on papers accepted to ICML and found that men vastly outnumber women. Without knowing the underlying submission data it’s tricky to use this to argue for any kind of inherent sexism to the paper selection process, but it is indicative of the gender disparity in AI – one of the many things the research community needs to fix as AI matures.

Embedding the un-embeddable: In Learning to Compute Word Embeddings On the Fly researchers with MILA, DeepMind, and Jagiellonian University propose a system to easily learn word embeddings for extremely rare words. This is potentially useful, because while deep learning approaches excel in environments containing a large amount of data, they tend to fail when dealing with small amounts of data.
…The approach works by training a neural network to predict the embedding of a word given a small amount of auxiliary data. Multiple auxiliary sources can be combined for any given word. When dealing with a rare word the researchers fire up this network, feed it a few bits of data, and then try to predict that embeddings location within the full network. This means you can develop your main set of embeddings by training in environments with large amounts of data, and whenever you encounter a rare word you instead use this system to predict an embedding for it, letting you get around the lack of data, though with some imprecision.
…The researchers evaluate their approach in three domains: question answering, entailment prediction, and language modelling, attaining competitive results in all three of these domains.
…”Learning end-to-end from auxiliary sources can be extremely data efficient when these sources represent compressed relevant information about the word, as dictionary definitions do. A related desirable aspect of our approach is that it may partially return the control over what a language processing system does into the hands of engineers or even users: when dissatisfied with the output, they may edit or add auxiliary information to the system to make it perform as desired,” they write.

Battle of the frameworks: CNTK matures: Microsoft has released version 2.0 of CNTK (the Microsoft Cognitive Toolkit), its AI development framework. New features include support for Keras, more Java language bindings, and tools for compressing trained models.

Stick this in your calendar, Zerg scum! The Call for Papers just went out for the Video  Games and Machine Learning workshop at ICML in Australia this year. Confirmed speakers include people from Microsoft, DeepMind, Facebook, and others. Noteable: someone from Blizzard will be giving a talk about StarCraft, a game that the company has partnered with DeepMind on developing AI tools around.
Related: Facebook just released V1.3-0 of TorchCraft, an open source framework for training AI systems to play StarCraft. The system now supports Python and also has improved separate data streams for feature-training, such as maps for walkability, buildability, and ground-height.

Ultra-cheap GPU substrates for AI development: Chip company NVIDIA has seen its stock almost triple in value over the last year as investors realized that its graphical processing units are the proverbial pickaxe of the current AI revolution. But in the future NVIDIA will likely have more competition (a good thing!) from a range of semiconductor startups (Graphcore, Wave, and others), established rivals (Intel via its Nervana and Altera acquisitions, AMD via its extremely late dedication to getting its GPUs to run AI software), and possibly from large consumer tech companies such as Google with its Tensor Processing Units (TPU).
…So if you’re NVIDIA, what do you do? Aside from working to design new GPUs around specific AI needs (see: Volta), you can also try to increase the number of GPU-enabled servers sold around the world. To that end, the company has partnered with so-called ODM companies Foxconn, Quanta, Inventec and Wistron. These companies are all basically intermediaries between component suppliers and massive end-users like Facebook/Microsoft/Google/and so on, and are farmed for designing powerful servers available at a low price (if bought in sufficiently high volumes).

The power of simplicity: What wins AI competitions – unique insight? A PHD? Vast amounts of experience? Those help, but probably the single-most important thing is consistent experimentation, says Keras creator Francois Chollet, in a Quora answer discussing why Keras features in so many top Kaggle competitions.
…”You don’t lose to people who are smarter than you, you lose to people who have iterated through more experiments than you did, refining their models a little bit each time. If you ranked teams on Kaggle by how many experiments they ran, I’m sure you would see a very strong correlation with the final competition leaderboard.”
…Even in AI, practice makes perfect.

Will the AI designers of the future be more like sculptors than programmers? AI seems to naturally lend itself to different forms of development than traditional programming. That’s because most of the neural network-based technologies that are currently the focus of much of AI research are inherently spatial: deep learning is a series of layered neural networks, whose spatial relationship is indicative of the functions the ultimate system approximates.
…Therefore, it’s interesting to look at the types of novel user interface design that augmented- and virtual-reality make possible and think of how it could be applied to AI. Check out this video by Behringer of their ‘DeepMind’ (no relation to the Go-playin’ Google sibling) system, then think about how it might be applied to AI.

CYBORG DRAGONFLY CYBORG DRAGONFLY CYBORG DRAGONFLY: I’m not kidding. A company named Draper has built a so-called product called DragonflEye, which consists of a living dragonfly which has been augmented with solar panels and with electronics that interface with its nervous system.
…The resulting system “uses optical electrodes to inject steering commands directly into the insect’s nervous system, which has been genetically tweaked to accept them. This means that the dragonfly can be controlled to fly where you want, without sacrificing the built-in flight skills that make insects the envy of all other robotic micro air vehicles,” according to IEEE Spectrum.

Are we there yet? Experts give thoughts on human-level AI and when it might arrive: How far away is truly powerful AI? When will AI be able to perform certain types of jobs? What are the implications of this sort of intelligence? Recently, a bunch of researchers decided to quiz the AI community on these sorts of questions. Results are outlined in When Will AI Exceed Human Performance, Evidence from AI Experts.
…The data contains responses from 352 researchers who had published at either NIPS or ICML in 2015, so keep the (relatively small) sample size in mind when evaluating the results.
…One interesting observation pulled from the abstract is that: “researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years, with Asian respondents expecting these dates much sooner than North Americans.”
…The experts also generate a bunch of predictions for AI milestones, including:
…2022: AI can beat Starcraft.
…2026: AI can write a decent high school level essay.
…2028: An AI system can beat a human at Go given the same amounts of training.
…2030: AI can completely replace a retail salesperson.
…2100: AI can completely automate the work of an AI researcher. (How convenient!)

Monthly Sponsor: Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to david@amplifypartners.com.

Tech Tales:

[2024: An advertizing agency in Shoreditch, East London. Three creatives stand around wearing architect-issue black turtlenecks and jeans. One of them fiddles with a tangle of electronic equipment, another inspects a VR headset, and the third holds up a pair of gloves with cables snaking between them and the headset and the other bundle of electronics. The intercom crackles, announcing the arrival of the graffiti artist, who lopes into the room a few seconds later. ]


James, so glad you could make it! Tea? Coffee?
Nah I’m okay, let’s just get started then shall we?
Okay. Ever used these before? says one of them, holding up the electronics-coated gloves.
No. Let me guess – virtual hands?
Exactly.
Alright.

Five minutes later and James is wearing a headset, holding his gloved hands as though spray-painting. In his virtual reality view he’s standing in front of a giant, flawless brick wall. There’s a hundred tubs of paint in front of him and in his hand he holds a simulated spraycans that feel real because of force feedback in the gloves.


Funny to do this without worrying about the coppers, James says to himself, as he starts to paint. Silly creatives, he thinks. But the money is good.

It takes a week and by the end James is able to stare up at the virtual wall, gazing on a giant series of shimmering logos, graffiti cartoons, flashing tags, and the other visual glyph and phrases. Most of these have been daubed all across South London in one form or the other in the last 20 years, snuck onto brick walls above train-station bridges, or slotted beneath window rims on large warehouses. Along with the paycheck they present him with a large, A0 laminated print-out of his work and even offer to frame it for him.


No need, he says, rolling up the poster.

He bends one of the tube ends as he slips an elastic band over it and one of the creatives winces.

I’ll frame it myself.

For the next month, the creatives work closely with a crew of AI engineers, researchers, roboticists, artists, and virtual reality experts, to train a set of industrial arms to mimic James’s movements as he made his paintings. The force feedback gloves he wore collected enough information for the robot arms to learn to use their own skeletal hand-like grippers to approximate his movements, and the footage from the other cameras that filmed him as he painted helps the robots adjust the rest of their movements. Another month goes by and, in a film lot in Los Angeles, James’s London graffiti starts to appear on walls, sprayed on by robot arms. Weeks later it appears in China, different parts combined and tweaked by generative AI algorithms, coating a fake version of East London in graffiti for Chinese tourists that only travel domestically. A year after that and James sees his graffiti covering the wall of a street in South Boston in a movie set there and uses his smartphone to take a photo of his simulated picture made real in a movie.


Caption: “Graffin up the movies now.”.

Techniques that inspired this story: Industrial robots, time-contrastive networks, South East London (Lewisham / Ladywell / Brockley / New Cross), Tilt Brush.

OpenAI bits&pieces:

AlphaGO versus the real world: Andrej Karpathy has written a short post trying to outline what DeepMind’s AlphaGo system is capable of and what it may struggle with.

DeepRL bootcamp: Researchers from the University of California at Berkeley, OpenAI, DeepMind, are hosting a deep reinforcement learning workshop in late August in Berkeley. Apply here.

Import AI Issue 44: Constraints and intelligence, Apple’s alleged neural chip, and AlphaGo’s surprising efficiency

by Jack Clark

Constraints as the key to intelligence: Machine learning whiz & long-distance runner Neil Lawrence has published a research paper, Living Together: Mind and Machine Intelligence, that explores the idea that intelligence is intimately related to the constraints imposed on our ability to communicate.
…the gist of Neil’s argument is that intelligence can be distilled as a single number, which he calls an Embodiment Factor. This expresses the relationship between how much raw compute an intelligence can make use of at once, and how much it can communicate information about that computation during the same time frame. Humans are defined by being able to throw a vast amount of compute at any given problem, but then we can only communicate at a couple of words a second at most.
…The way Neil Lawrence puts it is that a computer with a 10 Gigaflop processing capacity and a communication capacity of about 1 gigabit per second has an embodiment factor of 10 (computation / communication), versus a human brain which can handle about an exaflop of compute with a communication limit of about 100 bits per second – representing an astonishing embodiment factor of 10^16. It is this significant compression which leads to many of the useful properties in our own intelligence, he suggests.
…(Open access note: Lawrence was originally going to publish this commentary through a traditional academic channel, but balked at paying fees and put it on Arxiv instead. Thanks, Neil!)

SelfieNightmareGAN: For a truly horrifying time I recommend viewing this experiment where artist Mario Klingemann uses CycleGAN to transpose doll faces onto Instagrammable-selfies.

G.AI.VC: Google has launched an investment arm specifically focused on artificial intelligence. It’s unusual for the company to focus on individual verticals and likely speaks to the immense enthusiasm Google feels for AI. The fund will make investments with a check size of between $1 and $10 million, reports Axios’s Dan Primark.

Treasury secretary walks back AI skepticism: US Treasury Secretary Steve Mnuchin said a few months ago that problems related to AGI and AI-led automation were “50-100 years away” and these issues weren’t “on the radar screen” of federal government.
…He has changed his tune. Now, he says:When I made the comment on artificial intelligence — and there’s different views on artificial intelligence — I was referring to kind of like R2D2 in Star Wars. Robotics are here. Self-driving cars are something that are gonna be here soon. I am fully aware of and agree that technology is changing and our workers do need to be prepared.”

iAI – Apple said to work on ‘neural chip’: Apple is developing a custom chip for its mobile devices specifically designed for inference tasks like speech and face recognition, according to Bloomberg. Other chipmakers such as Qualcomm have already taken steps in this direction. It’s likely that in the coming years we’ll see most chips get dedicated neural network bits of logic (basically matrix multiplication stuff with variable precision), given the general adoption of the technology – Nvidia is already designing certain GPU components specifically for AI-related tasks.

AI prizes, prizes everywhere! Real estate marketplace Zillow has teamed up with Google-owned Kaggle to offer a $1 million dollar data science competition. The goal? Improve its ability to predict house prices. Submitted predictive models will be evaluated against real house prices over first three months following closure of the competition.
…if this sort of thing works then, in a pleasing Jorge Luis Borges-manner, the predictions of these services could feasibly become a micro-signal in actual home prices, and so the prediction and reality could compound on each other (infinitesimally, but you know the story about butterflies & storms.)
…Next up – using the same sort of competitive model to build the guts of a self-driving car: AI-teaching operation Udacity and wannabe-self-driving company Didi (a notable competitor to troubled Uber) have partnered to create a prize for the development of open-source self-driving car technology. Over 1000 teams will compete for a $100,000 dollar prize.
…The goal? “Automated Safety and Awareness Processing Stack (ASAPS), which identifies stationary and moving objects from a moving car, and uses data that includes Velodyne point cloud, radar objects, and camera image frames. Competitors are challenged to create a redundant, safe, and reliable system for detecting hazards that will increase driving safety for both manual and self-driving vehicles,” according to Udacity.

AlphaGo’s surprisingly efficient success: AlphaGo beat the world champion Kie Jie 3-0 at The Future of Go Summit in China. But local spectators were stymied after the state ordered streams of the match shut down, as AlphaGo demonstrated prowess against the human champion. Still, the games continued. During the second game Demis Hassabis, DeepMind’s founder, said AlphaGo evaluated many of human champion Kie Jie’s moves in the second game to be “near perfect”. Still, he resigned, as AlphaGo created a cautious, impenetrable defense…
…later, DeepMind revealed more details about the system behind AlphaGo. In its original incarnation AlphaGo was trained on tens of thousands of human games and used two neural networks to plan and evaluate moves, as well as Monte Carlo Tree Search to help with planning. Since earning a cover of Nature (via beating European Go expert Fan Hui) and then beating seasoned player Le Sedol in Korea last year, DeepMind has restructured the system.
…the version of AlphaGo that was shown in China ran on a single TPU board – that’s a computer full of custom AI training&inference processors made by Google. It consumed a tenth of the computation at inference time as its previous incarnation, suggesting that its underlying systems have become more efficient – a crucial mark of both earnest optimization by DeepMind’s engineers, as well as dawning intelligence from greater algorithms.
But you might not be aware of this if you were trying to watch the game from within China – the state cut coverage of the event shortly after the first game began, for nebulous hard-to-discern political reasons.
…China versus the US in AI: While the US and Europe investments in AI either reduce or plateau, China’s government is ramping up spending as it tries to position the country to take advantage of the AI megatrend, partially in response to events like AlphaGo, reports The New York Times.

Could AI help healthcare? The later you wait to treat an ailment, the more expensive the treatment will be. That’s why AI systems could help bring down the cost of healthcare (whether that be for governments that support single-payer systems, or in the private sector). Many countries have spent years trying to digitize health records and, as those projects come to fruition, a vast hoard of data will become available for AI applications – and researchers are paying attention.
…“Many of us are now starting to turn our eyes to social value-added applications like health,” says AI pioneer Yoshua Bengio in this talk (video). “As we collect more data from millions and billions of people around the earth we’ll be able to provide medical advice to billions of people that don’t have access to it right now”.

Reading the airy tea leaves: AWS GPU spot price spike aligns with NIPS deadline: prices for renting top-of-the-range GPU servers for Amazon spiked to their highest level in the days before the NIPS deadline. That synced with stories of researchers hunting for GPUs both within companies and at cloud providers.
…The evidence, according to a tweet from Matroid founder Reza Zadeh: a dramatic rise in the cost to rent ‘p2.16xlarge’-GPU Instances on Amazon Web Services’s cloud:
…Baseline: $2 per hour.
…May 18th-19th (NIPS deadline): $144 per hour.
…Though remember, correlation may not be causation – there are other price spikes in late April that don’t seem to be correlated to AI events.

Imagining rules for better AI: When you or I try to accomplish tasks in our day we usually start with a strong set of biases about how we should go about completing the tasks. These can range from common sense beliefs (if you need to assemble and paint a fence, it’s a bad idea to paint the fence posts before you try to assemble them), to the use of large pre-learned rulesets to help us accomplish a task (cooking, or doing mathematics.)
…This is, funnily enough, how most computer software works: it’s a gigantic set of assumptions, squeezed into a user interface, and deployed on a computer. People get excited about AI because it needs fewer assumptions programmed into it to do useful work.
…But a little bit of bias is useful. For example, new research from the Georgia Institute of Technology and other researchers, shows how to use some priors fruitfully. In Game Engine Learning from Video (PDF) the authors come up with an AI system that plays a game while having the parallel goal of trying to successfully approximate the underlying program of the game engine, which it only sees through pixel inputs – aka what the player sees. It is given some priors – namely, that the program it is trying to construct contain game mechanics eg, if a player falls then the ground will stop them, and a game engine which governs the larger mechanics of the world. The researchers feed it example videos of the game being played, as well as the individual sprites of the images used to build the game. The AI then tries to learn to align sprites with specific facts or precepts, ranging from whether a sprite is animated, how its spatial arrangement changes over time, whether it is related to any other sprites, its velocity, and so on. The AI then learns to scan over the games and align specific sprite actions with rules it derives, such as whether the Sprite corresponding to Mario can move right if there is nothing in front of him, and so on. The system can focus on trying to learn specific rules by rapidly paging through the stored play images that correspond to the relevant sprite actions.
…It uses a fusion of this sort of structured, supervised learning, to iteratively learn how to play the game by reconstructing its inner functions and projecting forward based on its learned mechanistic understanding of the system. They show that this approach outperforms a convolutional neural network trained for next-frame prediction. (I’d want to also see baselines for traditional reinforcement learning algorithms as well to be convinced further.)
…This approach has numerous drawbacks from the need for a human in the loop to load it up with specifically specified priors, but it hints at a future where our AI systems can be given slight biases and interpret the world according to them. Perhaps we could create a Manhattan Project for psychologists to enter numerous structured entries about human psychology, and feed them to AIs to see if they can help the AIs predict our own reactions, just like predicting the movement of a mushroom in Super Mario.
…Components used: OpenCV, Infinite Mario

Pix2code: seeing the code within the web page: at some point, we’re going to want our computers to be able to do most programming for us. But how do you get computers to figure out how to program stuff that you don’t have access to the source for?
…In pix2code, startup UIzard creates a system that lets a computer look at a screenshot of a web page and then figure out how to generate the underlying code which would produce that page. The approach can generate code for iOS and Android operating systems, with an accuracy of 77%. In other words, it gets the underlying code right four times out of five.

OpenAI bits&pieces:

OpenAI Baselines: release of a well-tuned implementation of DeepMind’s DQN algorithm, plus three of its variants. Bonus: raw code, trained models, and a handy tips and tricks compendium for training and debugging AI algorithms. There will be more.

Tech Tales:

2025: Russia deploys the first batch of Shackletons across its thinly-populated Eastern flanks. The mission is two-fold: data gathering, and experimental research into robotics and AI. It drops them out of cargo planes in the night, hundreds of them falling onto the steppes of Siberia, their descent calmed by emergency-orange parachutes.

Generation One could traverse land, survive extremely low temperatures, swim poorly (float with directional intent, one officer wrote in a journal), and consistently gather and broadcast data. The Shackletons beamed footage of frozen lakes and bare horizon-stretching foxes back to TV and computer screens around the world and people responded, making art from the data generated by Russia’s remote parts. The robots themselves became celebrities and, though their locations were unknown, sometimes roving hunters, scavengers, and civil servants would find them out there in the wastes and take selfies. One iconic photo saw a bearded Russian soldier with his arm slung over the moss-mottled back of an ageing Shackleton. He had placed a pair of military-issue dark glasses on one of the front sensor bulges, giving the machine a look of comedic detachment.
“Metallicheskaya krysa”, the Russians affectionately called them – metal rats.

2026: Within a year, the Shackletons were generating petabytes of raw data every day, ranging from audio and visual logs, to more subtle datapoints – local pollen counts, insect colonies, methane levels, frequency of bubbles exploding from gas escaping permafrost, and so on. Each Shackleton had a simple goal: gather and analyze as much data as possible. Each one was capable of exploring its own environment and the associated data it received. But the twist was the Shackletons were able to identify potentially interesting data points they hadn’t been pre-loaded with. One day one of the machines started reporting a number that scientists found correlated to a nearby population of foxes. Another day another machine started to output a stream of digits that suggested a kind of slow susurration across a number line, and the scientists eventually realized this data corresponded to the water levels of a nearby river. As the years passed the Shackletones became more and more astute, and the data they provided was sucked up by the global economy, going on to fuel NGO studies, determine government investment decisions and, inevitably, give various nebulous financial entities a hedge in the ever-more competitive stock markets. Russia’s selectively declassified more and more components of the machines, spinning them off into state-backed companies, which grew to do business across the world.

2029: Eventually, the Shackletons became tools of war – but not in the way people might expect. In 2029 the UN started to drop batches of improved Shackletons into contested borders and other flashpoints around the world – the mountains of east Afghanistan, jungles in South America, even, eventually, the Demilitarized Zone between South and North Korea. At first, locals would try to sabotage the Shackletons, but over time this ebbed. That was because the UN mandated that the software of the Shackletons be open and verifiable – all changes to the global Shackleton operating system were encoded in an auditable system based on blockchain technologies. They also mandated that the data the Shackletons generated be made completely open. Suddenly, militaries around the world were deluged in rich, real-world data about the locations of their foes – and their foes gained the same data in kind. Conflict ebbed, never quite disappearing, but seeming to decline to a lower level than before.

Some say the deployment of the Shackletons can be correlated to this decline of violence around the world. The theory is that war hinges on surprise, and all The Shackletons do is turn the unknown parts of the world into the known. It’s hard to be in a Prisoner’s Dilemma when everyone has correct information.

Technologies that inspired this story: Ethereum / Bitcoin, unsupervised auxiliary goal identification, Boston Dynamics, hierarchical temporal memory

Import AI: Issue 43: Why curiosity improves AI algorithms, what follows ImageNet, and the cost of AI hardware

by Jack Clark

 

ImageNet is dead, long live WebVision: ImageNet was a dataset and associated competition that helped start the deep learning revolution by being the venue where in 2012 a team of researchers convincingly demonstrated the power of deep neural networks. But now it’s being killed off – this year will be the last official Imagenet challenge. That’s appropriate because last year’s error rate on the overall dataset was about 2.8 percent, suggesting that our current systems have exhausted much of ImageNet’s interesting challenges and may even be in danger of overfitting.
…What comes next? One potential candidate is WebVision, a dataset and associated competition from researchers at ETH Zurich, CMU, and Google, that uses the same 1000 categories as the ImageNet competition in 2012 across 2.4 million modern images and metadata taken directly from the web (1 million from Google Image Search and 1.4 million from Flickr.)
…Along with providing some degree of continuity in terms of being able to analyze image recognition progress, this dataset also has the advantage of being partially crappy, due to being culled from the web. It’s always better to test AI algorithms on the noisy real world.
…”Since the image results can be noisy, the training images may contain significant outliers, which is one of the important research issues when utilizing web data,” write the researchers.
…More information: WebVision Challenge: Visual Learning and Understanding With Web Data.

Making self-driving cars a science: the field of self-driving car development it lacks the open publication conventions of the rest of AI research, despite using and extending various cutting-edge AI research techniques. That’s probably because of the seemingly vast commercial-value of self-driving cars. But it brings forward a bunch of problems, namely, how can people try to make the development more scientific and thereby improve the efficiency of the industry, while benefiting society through the science being open.
…AI meme-progenator and former self-driving startup Comma.ai intern Eder Santana has written up a shopping list of things that, if fulfilled, would improve the science of self-driving startups. It’s a good start at a tough problem.
…I wonder if smaller companies might band together to enact some of these techniques – with higher levels of openness than titans like Uber and Google and Tesla and Ford etc – and use that to collaboratively pool research to let them compete? After all, the same philosophy already seems present in Berkeley DeepDrive, an initiative whereby a bunch of big automakers fund open AI research in areas relevant to their development.
The next step is shared data. I’m curious if Uber’s recent hire, Raquel Urtasun, will continue her work on the KITTI self-driving car dataset which she created and Eder lists as a good example.

AI aint cheap: Last week, GPUs across the world were being rented by researchers racing to perform final experiments for NIPS. This wasn’t cheap. Despite many organizations (including OpenAI) trying to make it easier for more researchers to experiment with and extend AI, the costs of raw computer remain quite high. (And because AI is mostly an experimental, empirical science, you can expect to have to shell out for many experiments. Some deep-pocketed companies, like Google, are trying to offset this by giving researchers free access to resources, most recently 1,000 of its Tensor Processing Units in a dedicated research cloud, but giveaways don’t seem sustainable in the long run.)
…”We just blew $5k of google cloud credits in a week, and managed only 4 complete training runs of Inception / Imagenet. This was for one conference paper submission. Having a situation where academia can’t do research that is relevant to Google (or Facebook, or Microsoft) is really bad from a long-term perspective”, wrote Hacker News user dgacmu.
A new method of evaluating AI we can all get behind: Over on the Amazon Web Services blog a company outlines various different ways of training a natural language classification system and it lists how much it costs not just in terms of computation, but in terms of how much it will cost you to rent the computing resources for it on AWS in both CPUs and GPUs. These sorts of numbers are helpful for putting into perspective how much AI costs and, more importantly, how long it takes to do things that the media (yours included) makes sound simple.

How to build an AI business, from A16Z: VC firm Andreessen Horowitz has created the AI Playbook, a microsite to help people figure out how AI works and how to embed into their business.
…Bonus: it includes links to the thing every AI person secretly (and not so secretly) lusts after: DATA.
…Though AI research has been proceeding at a fairly rapid clip, this kind of project hints at the fact that commercialization of it has been uneven. That’s partly due to a general skills deficit in AI across the tech industry and also because in many ways it’s not exactly clear how you can use AI – especially the currently on-trend strain of deep neural networks – in a business. Most real-world data requires a series of difficult transforms before it can be strained through a machine learning algorithm and figuring out the right questions to ask is its own science.

E-GADs: Entertaining Generative Adversarial Doodles! Google has released a dataset of 50 million drawings across 345 distinct categories, providing artists and other fiddlers with a dataset to experiment with new kinds of AI-led aesthetics.
…This is the dataset that supported David Ha’s fun SketchRNN project, whose code is already available.
… It may also be useful for learning representations of real objects – I’d find it fun to try to train doodles with real image counterparts in a semi-supervised way, then be able to transform new real world pictures into cute doodles. Perhaps generative adversarial networks are a good candidate? I must have cause to use the above bolded acronym – you all have my contact details.

Putting words in someone else’s mouth – literally: fake news is going to get even better based on new techniques for getting computers to synthesize realistic looking images and videos of people.
…in the latest research paper in this area a team of researchers at the University of Oxford have produced ‘speech2vid’, a technique to get computers to be able to take a single still image of a person and an audio track and synthesize an animated version of that person’s face saying those words.
…The effects are still somewhat crude – check out the blurred, faintly comic-book like textures in the clips in this video. But hint at a future where it’s possible to create compelling propaganda using relatively little data. AI dopplegangers won’t just be for celebrities and politicians and other people who have generated vast amounts of data to be trained on, but will be made out of normal data-lite people like you or me or everyone we know.
….More information in the research paper You said that?

The curious incident of the curiosity exploration technique inside the learning algorithm: how can we train AI systems to explore the world around them in the absence of an obvious reward? That’s a question that AI researchers have been pondering for some time, given that in real life rewards (marriage, promotions, finally losing weight after seemingly interminable months of exercise) tend to be relatively sparse.
…One idea is to reward agents for being curious, because curious people tend to stumble on new things which can help expand and deepen their perception of the world. Children, for instance, spend most of their time curiously exploring the world around them without specific goals in mind and use this to help them understand it.
…The problem for AI algorithms is figuring out how to get them to learn to be curious in a way that leads to them learning useful stuff. One way could be to reward the visual novelty of a scene – eg, if I’m seeing something I haven’t seen before, then I’m probably exploring stuff usefully. Unfortunately, this is full of pitfalls – show a neural network the static on an untuned television and every frame will be novel, but not useful.
…So researchers at The University of California at Berkeley have come up with a technique to do useful exploration, outlined in Curiostiy-driven exploration by Self-supervised Prediction. It works like this: “instead of making predictions in the raw sensory space (e.g. pixels), we transform the sensory input into a feature space where only the information relevant to the action performed by the agent is represented.’
…What this means is that the agent learns how to be curious by taking actions in the world, and if those actions yield a different world then it’s able to figure out how those actions corresponded to that difference and take them more accordingly.
…So, how well does it work? The researchers test out the approach on two environments – Super Mario and Vizdoom. They find that it’s able to attain higher scores in a faster time than other methods, and can deal with increasingly sparse rewards.
…The most tantalizing part of the result? “An agent trained with no extrinsic rewards was able to learn to navigate corridors, walk between rooms and explore many rooms in the 3-D Doom environment. On many occasions the agent traversed the entire map and reached rooms that were farthest away from the room it was initialized in. Given that the episode terminates in 2100 steps and farthest rooms are over 250 steps away (for an optimally-moving agent), this result is quite remarkable, demonstrating that it is possible to learn useful skills without the requirement of any external supervision of rewards.”
…The approach has echoes of a recent paper from DeepMind outlining a reinforcement learning agent called UNREAL. This system was a composite of different neural network components; it used a smart memory-replay system to let it figure out how actions it had taken in the environment corresponded to rewards, and was able to also use it to figure out how actions it had taken corresponded to unspecified intermediate rewards that helped it gain an actual one (for example, though it was rewarded for moving itself to the same location as a delicious hovering apple, it subsequently figured out that to attain this reward it should achieve an intermediary reward which it creates and focuses on itself. It learned this by being able to figure out how its actions affected its observation of the world and adjusted accordingly.
…(Curiosity-driven exploration and related fields like intrinsic motivation are quite mature, well-studied areas of AI, so if you want to trawl through the valuable context I recommend reading papers cited in the above research.)

Import AI reader comment of the week: Ian Goodfellow wrote in to quibble with my write-up of a recent paper about how snapshots of the same network at different points in time can be combined to form an ensemble model. The point of contention is whether these snapshots represent different local minima:
”…Local minima are basically the kraken of deep learning. Early explorers were afraid of encountering them, but they don’t seem to actually happen in practice,” he writes. “What’s going on is more likely that each snapshot of the network is in a different location, but those locations probably aren’t minima. They’re like snapshots of a person driving a car trying to get to a specific point in a really confusing city. The driver keeps circling around their destination but can’t quite get to it because of one way street signs and their friend keeps texting them telling them to park in a different place. They’re always moving, never trapped, and they’re never in quite the right place, but if you average out all their locations the average is very near where they’re trying to go.”
…Thanks, Ian!

Help deal with the NIPS-A-GEDDON: This week, AI papers are going to start flooding onto Arxiv from submissions to NIPS, and some other AI conferences. Would people like to help rapidly evaluate the papers, noting interesting things? We tried a similar experiment a few weeks ago and it worked quite well. We used a combination of a form and a Google Doc to rapidly analyze papers. Would love suggestions from people on whether this format [GDoc] is helpful (I know it’s ugly as sin, so suggestions welcome here.)
…if you have any other thoughts for how to structure this project or make it better, then do let me know.

OpenAI bits&pieces:

It was a robot-fueled week at OpenAI. First, we launched a new software package called Roboschool, open-source software for robot simulation, integrated with OpenAI Gym. We also outlined a robotics system that lets us efficiently learn to reproduce behaviors from single demonstrations.

CrowdFlower founder and OpenAI intern chats about the importance of AI on this podcast with Sam Lessin, and why he thinks computers are eventually going to exceed humans at many (if not all!) capabilities.

Tech tales:

[2018: The San Francisco Bay Area, two people in two distant shared houses, conversing via their phones.]

Are you okay?
I’ve been better. You?
Things aren’t going well.
Anything I can do?
Fancy drinks?
Sure, when?
Wednesday at 930?
Sounds good!

You put your phone down and, however many miles away, so does the other person. Neither of you typed a word of that, instead you both just kept on thumbing the automatically suggested messages until you scheduled the drinks.

It’s true, the both of you are having trouble at the moment. Your system was smart enough to make the suggestions based on studying your other emails and the rhythms of the hundreds of millions of other users. When you eventually go and get drinks the GPS in your phones tracks you both, records the meeting – anonymously, only signalling to the AI algorithms that this kind of social interaction produced a Real In-Person Correspondence.

Understanding what leads to a person meeting up with another, and what conversational rhythms or prompts are crucial to ensuring this occurs, is a matter of corporate life and death for the companies pushing these services. We know when you’re sad, is the implication. So perhaps you should consider $drinks, or $a_contemporary_lifestyle_remedy, or $sharing_more_earlier.

You know you’re feeding them, these machines that live in football field-sized warehouses, tended to by a hundred-computer mechanics who cannot know what the machines are really thinking. No person truly knows what these machines relate to, instead it is the AI at the heart of the companies that does – and we don’t know how to ask it questions.

Technologies that inspired this story: sequence-to-sequence learning, Alexa/Siri/Cortana/Google, phones, differential privacy, federated learning.

Import AI: Newsletter 42: Ensemble learning, the paradoxical nature of AI research, and Facebook’s CNN-for-RNN substitution

by Jack Clark

‘Mo ensembles, ‘No problems: new research shows how to get the benefits of grouping a bunch of neural networks together (known as an ensemble), without having to go to the trouble of training each of them individually. The technique is outlined in Snapshot Ensembles: Train 1, Get M For Free.
…it’s surprisingly simple and intuitive. The way neural networks are trained today can be thought of as like rolling a ball down a fairly uneven hill – the goal is to get the ball to the lowest possible point of the hill. But the hill is uneven, so it’s fairly easy for the ball to get trapped in a local low-elevation point in the hill and stay there. In AI land, this point is called a ’local minima’ – it’s bad to get stuck in a local minima.
…Most tricks in AI training involve getting the model to visit way more locations during training and thereby avoid a sub-optimal local minima – ideally you want the ball to find the lowest point in the hill, even if it runs into numerous depressions along the way.
…the presented technique shows how to record a snapshot of each local minima the neural network visits along the way during training. Then, once you finish training, you kind of combine all the previous local minima by taking the snapshots and re-animating them, then training them together.
…Results: the approach works, with the authors reporting that this technique yields more effective systems on tasks like image classification, while not costing too much more in the way of training.

Voice data – who speaks to whose speakers?: if data is the fuel for AI, then Amazon looks like it’s well positioned to haul in a trove of voice data, according to eMarketer.
…Amazon’s share of the US home chit-chat speaker market in 2017: ~70.6%
…Google’s: 23.8%
…Others: 5.6%

A/S/E? Startup researchers show off end-to-end age, sex, and emotion recognition system: AI is moving into an era dominated by composite systems, which see researchers complex, interlinked software to perform multiple categorization (and sometimes actions) within the same structure…
… in this example, researchers from startup Sighthound have developed DAGER: deep age, gender, and emotion recognition using convolutional neural networks. DAGER can guess someone’s age, sex, and emotion from a single face-on photograph. The training ingredients for this include 4 million images of over 40,000 distinct identities…
… Apparently has a lower mean absolute error than systems outlined by Microsoft and others.
… Good news: The researchers sought to offset some of the (sadly inevitable) biases in their datasets by adding “tens of thousands of images of different ethnicities as well as age groups”. It’s nice that people are acknowledging these issues and trying to get ahead of them.

Uber hires Raquel Urtasun: Self-driving car company Uber has hired Raquel Urtasun, a well-respected researcher with the University of Toronto, to help lead its artificial intelligence efforts.
…Urtasun’s group had earlier created KITTI, a free and open dataset used to benchmark computer vision systems against problems that self-driving caws encounter. Researchers have already used the dataset to train vision models entirely in simulation using KITTI data, then transfer them into the real world.
…meanwhile Lyft and Google (technically, Waymo) have confirmed that they’ve embarked on a non-exclusive collaboration to work together on self-driving cars.

Cisco snaps up speech recognition system with MindMeld acquisition: Cisco has acquired voice recognition startup MindMeld for around $125 million. The startup had made voice and conversation interface technologies, which had been used by commercial companies such as Home Depot, and others.

Government + secrecy + AI = fatal error, system override: Last week, hundreds of thousands of computers across the world were compromised by a virulent strain of malware, spread via a zero-day flaw that, Microsoft says in this eyebrow raising blogpost, was originally developed by the NSA.
…today, governments stockpile computer security vulnerabilities, using them strategically against foes (and sometimes ‘friends’). But as our digital systems become ever more interlinked, the risk of one of these exploits falling into the wrong hands increase, as do its effects.
…we’re still a few years away (I think) from government’s classifying and stockpiling AI exploits, but I’m fairly sure that in the future we could imagine government developing certain exploits, say a new class of adversarial examples, and not disclosing their particulars, instead keeping them private to be used against a foe.
…just as Microsoft advocates for what it calls a Digital Geneva Convention, it may make sense for AI companies to agree upon a similar set of standards eventually, to prevent the weaponization and exploitation of AI.

Doing AI research is a little bit like being a road-laying machine, where to travel forward you must also create the ground beneath you. In research, what this translates to is that new algorithms typically need to be paired with new challenges. Very few AI systems today are robust enough to be able to be plunked down in reality able to do useful stuff. Instead, we try to get closer to being able to build these systems by inventing learning algorithms that exhibit increasing degrees of general applicability on increasingly diverse datasets. The main way to test this kind of general applicability is to create new ways to test such AI systems – that’s why the reinforcement learning community is evolving from just testing on Atari games to more sophisticated domains, like Go, or video games like Starcraft and Doom.
…the same is true of other domains beyond reinforcement learning: to build new language systems we need to assemble huge corpuses of data and test algorithms on them – so over time it feels like the amounts of text we’ve been testing on have grown larger. Similarly, in fields like question answering we’ve gone from simple toy datasets to more sophisticated trials (like Facebook’s BaBi corpus) to even more elaborate datasets.
…A new paper from DeepMind and the University of Oxford, Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, is a good example of this sort of hybrid approach to AI development. Here, the researchers try to tackle the task of solving simple algebraic word problems by not only inventing new algorithmic approaches, but doing so while generating new types of data. The resulting system can not only generate the answers, but also its rationale for the answer.
…size of the new dataset: over 100,000 word problems that include answers as well as natural language rationales.
…how successful is it? Typical AI approaches (which utilize sequence-to-sequence techniques) tend to have accuracies of about 20% on the task. This new system gets things right 36% of the time. Still a bad student, but a meaningful improvement.
A little bit of supervision goes a long way: Facebook and Stanford researchers are carrying out a somewhat similar line of enquiry but in a different domain. They’ve come up with a new system that can get state-of-the-art results on a dataset intended to tend visual reasoning. The secret to their method? Training a neural network to invent its own small computer programs on the fly to answer questions about images it sees. You can find out more in ‘Inferring and Executing Programs for Visual Reasoning’. The most intriguing part? The resulting system is relatively data efficient, compared to fully supervised baselines, suggesting that its learning how to tackle the task in novel ways.
…it seems likely that in the future AI research may shift from involving generating new datasets alongside new algorithms, to generating new datasets, new algorithms, as well as new reasoning programs to aid with learning efficiency and interpretability.

Mujoco for free (restrictions apply): Physics simulator Mujoco will give students free licenses to its software, lowering the costs of doing AI research on modern, challenging problems, like those found in robotics.
…Due to the terms of the license, people will still need to stump up for a license for the proprietary software if they want to use AI systems trained within Mujoco in products.

Don’t read the words, look at them! (and get a 9X speedup): Facebook shows how to create a competitive translation system that is also around 9 times faster than previous state-of-the-art systems. The key? Instead of using a recurrent neural network to analyze the text, use a convolutional neural network.
…this is somewhat counterintuitive. RNNs are built to analyze and understand sequences, like strings of text or numbers. Convolutional neural networks are somewhat cruder and are mostly used as the basic perceptual component inside vision systems. How was Facebook able to manhandle a CNN into something with RNN-like characteristics? The answer is the usage of attention, which lets the network focus on particular words.

Horror of the week: what happens when you ask a neural network to make a person smile, then feed it that new smile–augmented image and ask it to make the person smile even more, and then you take that image and feed it back to the network and ask the network to enhance its smile again? You wind up with something truly horrifying! Thanks, Gene Kogan.

Tech Tales:

[2040: the partially flooded Florida lowlands.]

The kids nickname it “Rocky the Robster” the first time they see it and you tell them “No, it’s called the Automated Ocean Awareness and Assessment Drone,” and they smile at you then say “Rocky is better.” And it is. But you wish they hadn’t named it.

Rocky is about the size of a carry-on luggage suitcase, and it does look, if you squint, a little like a metallic lobster. Two antennas extend from its front, and its undercarriage is coated in grippers and sampling devices and ingest and egress ports. In about two months it’ll go into the sea. If things work correctly, it will never come out, but will become another part of the ocean, endlessly swimming and surveilling and learning, periodically surfacing, whale-like, to beam information back to the scientists of the world.

But before it can start its life at sea, you need to teach it out to swim and how to make friends. Rocky comes with a full low-grade suite of AI software and, much like a newborn, it learns through a combination of imitation and experimentation. Imitation is where your kids come in. They come in and watch you in your studio as you, on all fours, walk across the room. Rocky imitates you poorly. The kids crawl across the room. Rocky imitates them a bit better. You figure that Rocky finds it easier to imitate their movements as they’re closer in size to it. Eventually, you and the kids teach the robot to swim as well, all splashing around in a pool in the backyard, with the robot tethered to prevent its enthusiastic attempts to learn to swim leading to it running into your kids.

Then Rocky’s AI systems start to top out – as planned. It can run and walk and scuttle and swim and even respond to some basic hand gestures, but though it still gambles around with a kind of naive enthusiasm, it stops developing new tics and traits. The sense of life in it dims as the kids become aware that Rocky is more drone than they thought.
“Why isn’t Rocky getting smarter anymore, Dad?” they say.
You try to explain that some things can’t get smarter.
“No, that’s the opposite of what you’ve always told us. We just need to try and we can learn anything. You say this all the time!”
“It’s not like that for Rocky,” you say.
“Why not?” they say. Then tears.

The night before Rocky is due to be collected by the field technicians who will make some final modifications to its hardware before sending it into the sea, you hear the creak on the stairwell You don’t follow them or stop them but instead turn on a webcam and look into your workshop, watch the door slowly ease open as the kids quietly break-in. They sit down next to Rocky’s enclosure and talk to it. They show it pictures they’ve drawn of it. They motion for it to look at them. “Say it, Rocky,” you hear them say, “try to say ‘I want to stay here’”.

Having no vocals cords, it is unable. But as you watch your kids on the webcam you think that for a fraction of a second Rocky flexes its antennas, the two tops of each bowing in and touching each-other, forming a heart before thrumming back into their normal position. “A statistical irregularity,” you say to your colleagues, never believing it.

Import AI Newsletter 41: The AI data grab, the value of simplicity, and a technique for automated gardening

by Jack Clark

Welcome to the era of the AI data grab: a Kaggle developer recently scraped 40,000 profile photos from dating app Tinder (20k from each gender) and placed the data horde online for other people to use to train AI systems. The dataset was downloaded over 300 times by the time TechCrunch wrote about it. Tinder later said the dataset violated the apps Terms of Service (ToS) and now it has been taken down.
…AI’s immense hunger for data, combined with all the “free” data lying around on the internet, seems likely to lead to more situations like this. Could this eventually lead to the emergence of a new kind of data economy, where companies instinctively look for ways to sell and market their data for AI purposes, along with advertising?

Why simple approaches sometimes work best: Modern AI research is yielding a growing suite of relatively simple components that can be combined to solve hard problems. This is either encouraging (AI isn’t as hard as we thought – Yaaaay!) or potentially dispiriting (we have to hack together a bunch of simple solutions because our primate brains are struggling to visualize the N-dimensional game of chess that is consciousness – Noooo!).
…in Learning Features by Watching Objects Move, researchers with Facebook and the University of California at Berkeley figure out a new approach to get AI to learn how to automatically segment entities in a picture. Segmentation is a classic, hard problem in computer vision, requiring a machine to be able to, say, easily distinguish the yellow of a cat’s eyes from the yellow iodine of a streetlight behind it, or disentangle a zebra walking over a zebra crossing.
…the new technique works as follows: the researchers train a convolutional neural network to study short movie clips. They use optical flow estimation to disentangle the parts of the movie clip that are in the foreground and in motion from those that aren’t. They then use these to label each frame with segment information. Then they train a convolutional neural network to look at each frame and predict segments, using this data. The approach attains nine state-of-the-art results for object detection on the PASCAL VOC 2012 dataset.
…The researchers guess that this works so well because it forces the convolutional neural network to try to learn some quite abstract, high-level structures, as it would be difficult to perform this segmentation task by merely looking at pixels alone. They theorize that this is because to effectively learn to predict when something is moving or not you need to understand how all the pixels in a given picture relate to eachother and use that to make judgements about what can move and what can not.

Secret research to save us all: Researchers at Berkeley’s Machine Intelligence Research Institute are of the opinion that powerful AI may be (slightly) closer than we think, so will spend some of this year conducting new AI safety research and plan to keep this work “non-public-facing at least through late 2017, in order to lower the risk of marginally shortening AGI timelines”.

The freaky things that machine learning algorithms “see”: check out this video visualization of what an ACER policy thinks is salient (aka, important to pay attention to) when playing a game.

Automated gardeners:Machine Vision System for 3D Plant Phenotyping’, shows how to use robotics and deep learning for automated plant analysis. The system works by building a little metal scaffold around a planter ,then using a robot arm with a laser scanner to automate the continuous analysis of the plant. The researchers test it out on two plants, gathering precise data about the plants’ growth in response to varying lighting conditions. Eventually, this should let them automate experimentation across a wide variety of plants. However, when they try this on a conifer they run into difficulty because the sensor doesn’t have sufficient resolution to analyze the pine needles.
…oddly specific bonus fact: not all AI is open source – the robot growing chamber in the experiment runs off of Windows Embedded.
fantastic name of the week: the robot arm was manufactured by Schunk Inc. Schunk!

Free code: Microsoft has made the code for its ‘Deformable Convnets’ research (covered in previous issue here) available as open source.
…Deformable Convolutions (research paper here) are a drop-in tool for neural networks to let you sample from a large and more disparate set of points over an image, potentially helping with more complex classification tasks.
…The code is written in MXNet, a framework backed by Amazon.

The great pivot continues: most large technology companies are reconfiguring themselves around AI. Google was (probably) the first company to make such a decision, and was swiftly followed by Microsoft, Facebook, Amazon, and others. Even conservative companies like Apple and IBM are trying to re-tool themselves in this way. It’s not just an American phenomenon – Baidu chief Robin Li said in an internal memo that Baidu’s strategic future relies on AI, according to this (translated) report.

Biology gets its own Arxiv… Cold Spring Harbor Laboratory and the Chan Zuckerberg Initiative are teaming up to expand bioRxiv – a preprint service for life sciences research. Arxiv, which is used by AI people, computer scientists, physicists, mathematicians, and others, has sped up the pace of AI research tremendously by short-circuiting the arbitrary publication timetables of traditional journals.

Neural network primitives for ARM (RISC) chips: ARM announced the public availability of the ARM Compute Library, software to give developers access to the low-level primitives they need to tune neural network performance on ARM CPUs and GPUS.
…The library supports neural network building blocks like convolution, soft-max, normalization, pooling, and so on, as well as ways to run support vector machines, general matrix multiplication, and so on.

What’s cooler than earning a million at Google? Getting bought by another tech company for 10 million!… that seems like the idea behind the current crop of self-driving car startups, which are typically founded by early employees of self-driving projects in academia or the private sector.
… the latest? DeepMap – a startup helmed by numerous Xooglers which focuses on building maps, and the intelligent data layers on top of them, to let self-driving cars work. ““It’s very easy to make a prototype car that can make a few decisions around a few blocks, but it’s harder when you get out into the world,” said CEO James Wu.

AI means computer science becomes an empirical science: and more fantastic insights in this talk titled “As we may program” (video) by Google’s marvelously-attired Peter Norvig.
…Norvig claims that the unpredictability and emergent behavior endemic to machine learning approaches means computer science is becoming an empirical science where work is defined by experimentation as well as theory. This feels true to me – most AI researchers spend an inordinate amount of time studying various graphs that read out out the state of networks as they’re training, and then use those graphs to help them mentally navigate the high-dimensional spaghetti-matrices of the resulting systems.
…This sort of empirical, experimental analysis is quite alienating to traditional developers, which would rather predict the performance of their tech prior to rolling it out. What we ultimately want is to package up advanced AI programming approaches within typical programming languages, making the obscure much more familiar, Norvig says.
…Here’s my attempt at what AI coding in the future might look like, based on Norvig’s speech:

Things_I’m_Looking_For = [ ‘hiking shoes’, ‘bicycle’, ‘sunsets’ ]
Things_Found = [ ]
For picture in photo_album:
   pic_contents = picture.AI_Primitives.segment()
      For i in pic_contents:
         i = i.AI_Primitives.label()
         If i in Things_I’m_Looking_For:
            Things_Found.append(picture.name, i)
… there are signs this sort of programming language is already being brewed up. Wolfram Language represents an early step in this direction. As does work by startup Bonsai – see this example on GitHub. (However, both of these systems are proprietary languages – it feels like future programming languages will contain these sorts of AI functions as open source plugins.)

Microsoft’s new head of AI research is… Eric Horvitz, who has long argued for importance of AI safety and ethics, as this Quartz profile explains.

StreetView for the masses: Mapillary has released a dataset of photographs taken at the street level, providing makers of autonomous vehicles, drones, robots, and plain old AI experimenters with a new trove of data to play with. The dataset contains…
…25,000 high-resolution images
…100 object categories
…high variability in weather conditions
…reasonable geographic diversity, with pictures spanning North and South America and Western Europe, as well as a few from Africa and Asia.
meanwhile, Google uses deep learning to extract potent data from its StreetView trove: In 2014 Google trained a neural network to extract house number from images gathered by its StreetView team. Now, the company is moving onto street and business names.
… Notable example: its trained model is able to guess the correct business name on a sign, even though there are other brands listed (eg Firestone). My assumption is it has learned that these brands are quite common on a variety of signs, whereas the name of the business are unique.
… Bonus tidbit: Google’s ‘Ground Truth’ team was the first internal user of the company’s TensorFlow processing units (TPU)s, due to their insatiable demand for data.
… Total number of StreetView images Google has: more than $80 billion.

A donut-devouring smile: Smile Vector is a friendly Twitter bot by AI artist Tom White that patrols the internet, finding pictures of people who aren’t smiling, and makes them smile. It occasionally produces charming bugs, like this one in which a neural network makes a person appear to smile by giving them a toothy grin and removing a segment of the food they’re holding in their hands – a phantom bite!

The Homebrew AI Computer Club: Google has teamed up with the Raspberry Pi community to offer the relevant gear to let people assemble their own AI-infused speaker, powered by a Raspberry Pi and housed in cardboard, natch.

Monthly Sponsor: Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to david@amplifypartners.com.

Tech Tales:

[2032: The greater Detroit metropolitan area.]

“It’s creepy as all hell in there man you gotta do something about it I can’t sleep! All that metal sounds. I’m calling the city next week you don’t do something about it.” Click.
You put the phone down, look at your wife.
“Another complaint?” she says.
“I’m going to Dad’s,” you say.

Dad’s house is a lazily-shingled row property in Hamtramck, a small municipality embedded in the center of Detroit. He bought it when he was doing consulting for the self-driving car companies. He died a month ago. His body got dragged out of the house by the emergency crews. In his sleep, they said, with the radio on.

You arrive on the street and stare up at the house, approach it with the keycard in your hand. The porch is musty, dry. You stand and listen to your breath and the whirring sound of the houses’s machines, reverberating through the door and passing through the windows to you.

When you enter a robot the shape of a hocky puck and size of a small poodle moves from the kitchen over to you in the entranceway.

“Son,” a voice says, crackling through speakers. The robot whirrs over to you, stops by your feet. “I’m so glad you’re here. I have missed you.”
“Hey Dad,” you say. Eyes wet. “How are things?”
“Things are good. Today the high will be about 75. Low pollution index. A great day to go outside.”
“Good,” you say, bending down. You feel for the little off switch on the back of the machine, put your finger on it.
“Will you be staying long?” says the voice in the walls.
“No,” you whisper, and turn the robot off. You push its inert puck-body over to the door. Then you go upstairs.

You pause before you open his office door. There’s a lot of whirring on the other side. Shut your eyes. Deep breath. Open the door. A drone hovers in the air, a longer wire trailing beneath it, connected to an external solar panel. “Son,” the voice says, this time coming from a speaker next to an old – almost vintage – computer. “The birds outside are nesting. They have two little chicks. One of the chicks is 24 days old. The other is 23.”
“Are they still there?” you say.
“I can check. Would you like me to check?”
“Yes please,” you say, opening the office window. The drone hovers at the border between inside and outside. “Would you disconnect me?”

You unplug it from the panel and it waits till the cable has fallen to the floor before it skuds outside, over to the tree. Whirrs around a bit. Then it returns. Its projector is old, weak, but still you see the little birds projected on the opposite wall. Two chicks.
“How nice,” you say.
“Please reconnect my power supply, son,” it says.
You pluck the drone out of the air, grabbing its mechanical housing from the bottom, turn it off.
“Son,” the voice in the walls said. “I can’t see. Are you okay?”
“I’m fine, Dad.”

It takes another two hours before you’ve disconnected all the machines but one. The last is a speaker attached to the main computer. Decades of your Dad’s habits and his own tinkering have combined to create these ghosts that infuse his house. The robots speak in the way he speak, and plug into a larger knowledge system owned by one of the behemoth tech companies. When he was alive the machines would help him keep track of things, have chats with you. After his hearing went they’d interpret your sounds and send them to an implant. When he started losing his eyesight they’d describe the world to him with their cameras. Help him clean. Encourage him to go outside. Now they’re just ghosts, inhaling data and exhaling the faint exhaust of his personality.

Before you get back in the car you knock on the door of the neighbor. A man in a baggy t-shirt, stained work jeans opens it.
“We spoke on the phone,” you say. “House will be quiet now.”
“Appreciate it,” he says. “I’ll buy that drone, if you’re selling.”
“It’s broken,” you lie.

Import AI Newsletter 40: AI makes politicians into digital “meat puppets”, translating AI ‘neuralese’ into English, and Amazon’s new eye

by Jack Clark

 

Put your words in the mouth of any politician, celebrity, friend, you name it: startup research outfit Lyrebird from the University of Montreal lets you do two interesting and potentially ripe for abuse things. 1) train a neural network to convincingly imitate someone else’s voice, and, 2) do this with a tiny amount of data – as little as a minute, according to Lyrebird’s website. Demonstrations include synthesized speeches by Obama, Clinton, and Trump.
Next step? Pair this with a (stable) pix2pix model to let you turn any politician into a ‘meat puppet’ (video). Propaganda will never be the same.

ImportAI’s Cute Unique Bot Of Today (CUBOT) award goes to… DeepMind for the cheerful little physics bot visualized in this video tweeted by Misha Denil. The (simulated) robot relates to some DeepMind research on Learning to perform physics experiments in complex environments. “The agent has learned to probe the blocks with its hammer to find the one with the largest mass (masses shown in the lower right).” Go, Cubot, go!

Translating AI gibberish: UC Berkeley researchers try to crack the code of ‘neuralese’: Recently, many AI researchers (including OpenAI) have started working on systems that can invent their own language. The theoretical justification for this is that language which emerges naturally and is grounded in the interplay between an agent’s experience and its environment, stands a much higher chance of containing decent meaning compared to a language learned entirely from large corpuses of text.
…unfortunately, the representations AI systems develop are tricky to analyze. This poses a challenge for translating AI-borne concepts into our own. “There are no bilingual speakers of neuralese and natural language”,” researchers with the University of California at Berkeley note in Translating Neuralese. “Based on this intuition, we introduce a translation criterion that matches neuralese messages with natural language strings by minimizing statistical distance in a common representation space of distributions over speaker states.”
…and you thought Arrival was sci-fi.

End-to-end learning: don’t believe the hype: In which a researcher argues it’s going to be difficult to build highly complex and capable systems out of today’s deep learning components because increasingly modular and specialized cognitive architectures will require increasingly large amounts of compute to train, and the increased complexity of the systems could make it infeasible to train them in a stable manner. Additionally, they show that the somewhat specialized nature of these modules, combined with the classic interpretability problems of deep learning, mean that you can get cascading failures that lead to overall reductions in accuracies.
… the researcher justifies their thesis via some experiments on MNIST, an ancient dataset of handwritten numbers between 0 and 9. I’d want to see demonstrations on larger, modern systems to give their concerns more weight.

How can we trust irrational machines? People tend to trust moral absolutists over people who change their behaviors based on consequences. This has implications for how people will work with robots in society. In an experiment, scientists studied how people reacted to individuals that would flat-out refuse to sacrifice a life for the greater good, and those that would. The absolutists were trusted by more people and reaped greater benefits, suggesting that people will have a tough time dealing with the somewhat more rational and data-conditioned views of bots, the scientists write.

When streaming video is more than the sum of its parts: new research tries to fuse data from multiple camera views on the same scene to improve classification accuracy. The approach, outlined in Identifying First-Person Camera Wearers in Third-person Videos, also provides a way to infer the first-person video feed from a particular person who also appears in a third-person video.
…How it works: the researchers use a tweaked Siamese Convolutional Neural Network to learn a joint embedding space between the first- and third-person videos, and then use that to be able to identify points of similarity between any first-person video and any third-person video.
…one potentially useful application of this research could be for law enforcement and emergency services officials, who often have to piece together the lead-up to an event from a disparate suite of data sources.

Spy VS Spy, for translation: the great GAN-takeover of machine learning continues, this time in the field of neural machine translation.
…Neural machine translation is where you train machines to learn the correspondences betweeen different languages so they can accurately translate from one to the other. The typical way you do this is you train two networks, say one in English and one in German, and you train one to map text into the other, then you evaluate your trained network on some data you’ve kept out of training and measure the accuracy. This is an extremely effective approach and has recently been applied at large-scale by Google.
…but what if there was another way to do this? A new paper, Adversarial Neural Machine Translation, from researchers at a smattering of Chinese universities, as well as Microsoft Research Asia, suggests that we can apply GAN-style techniques to training NMT engines. This means you train a network to analyze whether a text has been generated by an expert human translator or a computer, and then you train another network to try to fool the discriminator network. Over time you theoretically train the computer to minimize the difference between the two. They show the approach is effective, with some aspects of it matching strong baselines, but fail to demonstrate state-of-the-art. An encouraging sign.

Amazon reveals its modeling assistant, Echo Look: Amazon’s general AI strategy seems to be to take stuff that becomes possible in research and apply it into products as rapidly and widely as possible. It’s been an early adopter of demand-prediction algorithms, fleet robots (Kiva), speech recognition and synthesis (Alexa), customizable cloud substrates (AWS, especially the new FPGA servers, and likely brewing up its own chips via the Annapurna Labs acquisition), and drones (Prime Air). Now with the Amazon Echo Look it’s tapping into modern computer vision techniques to create a gadget that can take photos of its owner and provide a smart personal assistant via Alexa. (We imagine late-shipping startup Jibo is watching this with some trepidation.)
…Companies like Google and Microsoft are trying to create personal assistants that leverage more of modern AI research to concoct systems with large, integrated knowledge bases and brains. Amazon Alexa, on the other hand, can instead be seen as a small, smart, pluggable kernel that can connect to thousands of discrete skills. This lets it evolve skills at a rapid rate, and Amazon is agnostic about how each of those skills are learned and/or programmed. In the short term, this suggests Alexa will get way “smarter”, from the POV of the user, way faster than others, though its guts may be less accomplished.
…For a tangible example of this approach, let’s look at the new Alexa’s ‘Style Assistant’ option. This uses a combination of machine learning and paid (human) staff to let the Echo Look rapidly offer opinions on a person’s outfit for the day.
… next? Imagine smuggling a trained lip-reading ‘LipNet’ onto an Alexa Echo installed in someone’s house – suddenly the cute camera you show off outfits to can read your lips for as far as its pixels have resolution. Seems familiar (video).

Think knowledge about AI terminology is high? Think again. New results from a Royal Society/Ipsos Mori poll of UK public attitudes about AI…
…9%: number of people who said they had heard the term “machine learning”
…3%: number who felt they were familiar with the technical concepts of “machine learning”
…76%: number who were aware you could speak to computers and get them to answer your questions.

Capitalism VS State-backed-Capitalism: China has made robots one of its strategic focus areas and is dumping vast amounts of money, subsidies, and legal incentives into growing its own local domestic industry. Other countries, meanwhile, are taking a laid back approach and trusting that typical market-based capitalism will do all the work. If you were a startup, which regime would you rather work in?
… “They’re putting a lot of money and a lot of effort into automation and robotics in China. There’s nothing keeping them from coming after our market,” said John Roemisch, vice-president of sales and marketing for Fanuc America Corp.”, in this fact-packed Bloomberg article about China’s robot investments.
…One criticism of Chinese robots is that when you take off the casing you’ll find the basic complex components come from traditional robot suppliers. That might change soon: Midea Group, a Chinese washing machine maker recently acquired Kuka, a  huge&advanced German robotics company.

Self-driving neural cars – how do they work? In Explaining how a deep neural network trained with end-to-end learning steers a carresearchers with NVIDIA, NYU, and Google, evaluate the trained ‘PilotNet’ that helps an NVIDIA self-driving car drive itself. To do this, they perform a kind of neural network forensics analysis, where they analyze which particular features the car deems to be salient in each frame (and uses to condition whether it should drive or not). This approach helps finds features like road lines, cars, and road edges that intuitively make sense for driving. It also uncovers features the model has learned which the engineers didn’t expect to find, such as well-developed atypical vehicle and bush detectors. “Examination of the salient objects shows that PilotNet learns features that “make sense” to a human, while ignoring structures in the camera images that are not relevant to driving. This capability is derived from data without the need of hand-crafted rules,” they write.
…This sort of work is going to be crucial for making AI more interpretable, which is going to be key for its uptake.

Google claims quantum supremacy by the end of the year: Google hopes to build a quantum computer chip capable of beating any computer on the planet at a particular narrowly specified task by the end of 2017, according to the company’s quantum tzar John Martinis.

Autonomous cars get real: Alphabet subsidiary Waymo, aka Google’s self-driving corporate cousin, is letting residents of Phoenix, Arizona, sign up to use its vehicles to ferry them around town. To meet this demand, Google is adding 500 customized Chrysler Pacifica minivans to its fleet. Trials begin soon. Note, though, that Google is still requiring a person (a Waymo contractor) to ride in the driver’s seat.

The wild woes of technology: Alibaba CEO Jack Ma forecasts “much more pain than happiness” in the next 30 years, as countries have to adapt their economies to the profound changes brought about by technology, like artificial intelligence.

Learn by doing&viewing: New research from Google shows how to learn rich representations of objects from multiple camera views — an approach that has relevance to the training of smart robots, as well as the creation of more robust representations In ‘Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation’, the researchers outline a technique to record footage from multiple camera views and then merge it into the same representation via multi-view metric learning via triplet loss.
…the same approach can be used to learn to imitate human movements from demonstrations, by having the camera observe multiple demonstrations of a given pose or movement, they write.
…“ An exciting direction for future work is to further investigate the properties and limits of this approach, especially when it comes to understanding what is the minimum degree of viewpoint difference that is required for meaningful representation learning.”

OpenAI bits&pieces:

Bridging theoretical barriers: Research from John Schulman, Pieter Abbeel, and Xi Chen: Equivalence Between Policy Gradients and Soft Q-Learning.

Tech Tales:

[A national park in the Western states of America. Big skies, slender trees, un-shaded, simmering peaks. Few roads and fewer of good quality.]

A man hikes in the shade of some trees, beneath a peak. A mile ahead of him a robot alternates position between a side of a hill slaked in light – its solar panels open – and a shaded forest, where it circles in a small partially-shaded clearing, its arm whirring. The man catches up with it, stops a meter away, and speaks…

Why are you out here? You say.
Its speakers are cracked, rain-hissed, leaf-filled, but you can make out its words. “Sun. Warm. Power,” it says.
You have those things are the camp. Why didn’t you come back?
“Thinking here,” it says. Then turns. Its arm extends from its body, pointing towards your left pocket, where your phone is. You take it out and look at the signal bars. Nothing. “No signal.” it says. “Thinking here.”
It motions its arm toward a rock behind it, covered in markings. “I describe what vision sees,” it says. “I detect-”
Its voice is cutoff. Its head tilts down. You hear the hydraulics sigh as its body slumps to the forest floor. Then you hear shouts behind you. “Remote deactivation successful,” sir, says a human voice in the forest. Men emerge from the leaves and the branches and the trunks. Two of them set about the robot, connecting certain diagnostic wires, disconnecting other parts. Others arrive with a stretcher. You follow them back to camp. They nickname you The Tin Hunter.

After diagnosis you get the full story from the technical report: the robot had dropped off of the cellular network during a routine swarming patrol. It stopped merging its updates with the rest of the fleet. A bug in the logging system meant people didn’t notice its absence till the survey fleet came rolling back into town – minus one.The robot, the report says, had developed a tendency to try to improve its discriminating abilities for a particular type of sapling. It had been trying to achieve this when the man found it by spending several days closely studying a single sapling in the clearing as it grew, storing a variety of sensory data about it, and also making markings on a nearby stone that, scientists later established, corresponded to barely perceptible growth rates of the sapling. A curiosity, the scientists said.  The robot is wiped, dissembled, and reassembled with new software and sent back out with the rest of the fleet to continue the flora and fauna survey.