Import AI Newsletter 40: AI makes politicians into digital “meat puppets”, translating AI ‘neuralese’ into English, and Amazon’s new eye
by Jack Clark
Put your words in the mouth of any politician, celebrity, friend, you name it: startup research outfit Lyrebird from the University of Montreal lets you do two interesting and potentially ripe for abuse things. 1) train a neural network to convincingly imitate someone else’s voice, and, 2) do this with a tiny amount of data – as little as a minute, according to Lyrebird’s website. Demonstrations include synthesized speeches by Obama, Clinton, and Trump.
…Next step? Pair this with a (stable) pix2pix model to let you turn any politician into a ‘meat puppet’ (video). Propaganda will never be the same.
ImportAI’s Cute Unique Bot Of Today (CUBOT) award goes to… DeepMind for the cheerful little physics bot visualized in this video tweeted by Misha Denil. The (simulated) robot relates to some DeepMind research on Learning to perform physics experiments in complex environments. “The agent has learned to probe the blocks with its hammer to find the one with the largest mass (masses shown in the lower right).” Go, Cubot, go!
Translating AI gibberish: UC Berkeley researchers try to crack the code of ‘neuralese’: Recently, many AI researchers (including OpenAI) have started working on systems that can invent their own language. The theoretical justification for this is that language which emerges naturally and is grounded in the interplay between an agent’s experience and its environment, stands a much higher chance of containing decent meaning compared to a language learned entirely from large corpuses of text.
…unfortunately, the representations AI systems develop are tricky to analyze. This poses a challenge for translating AI-borne concepts into our own. “There are no bilingual speakers of neuralese and natural language”,” researchers with the University of California at Berkeley note in Translating Neuralese. “Based on this intuition, we introduce a translation criterion that matches neuralese messages with natural language strings by minimizing statistical distance in a common representation space of distributions over speaker states.”
…and you thought Arrival was sci-fi.
End-to-end learning: don’t believe the hype: In which a researcher argues it’s going to be difficult to build highly complex and capable systems out of today’s deep learning components because increasingly modular and specialized cognitive architectures will require increasingly large amounts of compute to train, and the increased complexity of the systems could make it infeasible to train them in a stable manner. Additionally, they show that the somewhat specialized nature of these modules, combined with the classic interpretability problems of deep learning, mean that you can get cascading failures that lead to overall reductions in accuracies.
… the researcher justifies their thesis via some experiments on MNIST, an ancient dataset of handwritten numbers between 0 and 9. I’d want to see demonstrations on larger, modern systems to give their concerns more weight.
How can we trust irrational machines? People tend to trust moral absolutists over people who change their behaviors based on consequences. This has implications for how people will work with robots in society. In an experiment, scientists studied how people reacted to individuals that would flat-out refuse to sacrifice a life for the greater good, and those that would. The absolutists were trusted by more people and reaped greater benefits, suggesting that people will have a tough time dealing with the somewhat more rational and data-conditioned views of bots, the scientists write.
When streaming video is more than the sum of its parts: new research tries to fuse data from multiple camera views on the same scene to improve classification accuracy. The approach, outlined in Identifying First-Person Camera Wearers in Third-person Videos, also provides a way to infer the first-person video feed from a particular person who also appears in a third-person video.
…How it works: the researchers use a tweaked Siamese Convolutional Neural Network to learn a joint embedding space between the first- and third-person videos, and then use that to be able to identify points of similarity between any first-person video and any third-person video.
…one potentially useful application of this research could be for law enforcement and emergency services officials, who often have to piece together the lead-up to an event from a disparate suite of data sources.
Spy VS Spy, for translation: the great GAN-takeover of machine learning continues, this time in the field of neural machine translation.
…Neural machine translation is where you train machines to learn the correspondences betweeen different languages so they can accurately translate from one to the other. The typical way you do this is you train two networks, say one in English and one in German, and you train one to map text into the other, then you evaluate your trained network on some data you’ve kept out of training and measure the accuracy. This is an extremely effective approach and has recently been applied at large-scale by Google.
…but what if there was another way to do this? A new paper, Adversarial Neural Machine Translation, from researchers at a smattering of Chinese universities, as well as Microsoft Research Asia, suggests that we can apply GAN-style techniques to training NMT engines. This means you train a network to analyze whether a text has been generated by an expert human translator or a computer, and then you train another network to try to fool the discriminator network. Over time you theoretically train the computer to minimize the difference between the two. They show the approach is effective, with some aspects of it matching strong baselines, but fail to demonstrate state-of-the-art. An encouraging sign.
Amazon reveals its modeling assistant, Echo Look: Amazon’s general AI strategy seems to be to take stuff that becomes possible in research and apply it into products as rapidly and widely as possible. It’s been an early adopter of demand-prediction algorithms, fleet robots (Kiva), speech recognition and synthesis (Alexa), customizable cloud substrates (AWS, especially the new FPGA servers, and likely brewing up its own chips via the Annapurna Labs acquisition), and drones (Prime Air). Now with the Amazon Echo Look it’s tapping into modern computer vision techniques to create a gadget that can take photos of its owner and provide a smart personal assistant via Alexa. (We imagine late-shipping startup Jibo is watching this with some trepidation.)
…Companies like Google and Microsoft are trying to create personal assistants that leverage more of modern AI research to concoct systems with large, integrated knowledge bases and brains. Amazon Alexa, on the other hand, can instead be seen as a small, smart, pluggable kernel that can connect to thousands of discrete skills. This lets it evolve skills at a rapid rate, and Amazon is agnostic about how each of those skills are learned and/or programmed. In the short term, this suggests Alexa will get way “smarter”, from the POV of the user, way faster than others, though its guts may be less accomplished.
…For a tangible example of this approach, let’s look at the new Alexa’s ‘Style Assistant’ option. This uses a combination of machine learning and paid (human) staff to let the Echo Look rapidly offer opinions on a person’s outfit for the day.
… next? Imagine smuggling a trained lip-reading ‘LipNet’ onto an Alexa Echo installed in someone’s house – suddenly the cute camera you show off outfits to can read your lips for as far as its pixels have resolution. Seems familiar (video).
Think knowledge about AI terminology is high? Think again. New results from a Royal Society/Ipsos Mori poll of UK public attitudes about AI…
…9%: number of people who said they had heard the term “machine learning”
…3%: number who felt they were familiar with the technical concepts of “machine learning”
…76%: number who were aware you could speak to computers and get them to answer your questions.
Capitalism VS State-backed-Capitalism: China has made robots one of its strategic focus areas and is dumping vast amounts of money, subsidies, and legal incentives into growing its own local domestic industry. Other countries, meanwhile, are taking a laid back approach and trusting that typical market-based capitalism will do all the work. If you were a startup, which regime would you rather work in?
… “They’re putting a lot of money and a lot of effort into automation and robotics in China. There’s nothing keeping them from coming after our market,” said John Roemisch, vice-president of sales and marketing for Fanuc America Corp.”, in this fact-packed Bloomberg article about China’s robot investments.
…One criticism of Chinese robots is that when you take off the casing you’ll find the basic complex components come from traditional robot suppliers. That might change soon: Midea Group, a Chinese washing machine maker recently acquired Kuka, a huge&advanced German robotics company.
Self-driving neural cars – how do they work? In ‘Explaining how a deep neural network trained with end-to-end learning steers a car’ researchers with NVIDIA, NYU, and Google, evaluate the trained ‘PilotNet’ that helps an NVIDIA self-driving car drive itself. To do this, they perform a kind of neural network forensics analysis, where they analyze which particular features the car deems to be salient in each frame (and uses to condition whether it should drive or not). This approach helps finds features like road lines, cars, and road edges that intuitively make sense for driving. It also uncovers features the model has learned which the engineers didn’t expect to find, such as well-developed atypical vehicle and bush detectors. “Examination of the salient objects shows that PilotNet learns features that “make sense” to a human, while ignoring structures in the camera images that are not relevant to driving. This capability is derived from data without the need of hand-crafted rules,” they write.
…This sort of work is going to be crucial for making AI more interpretable, which is going to be key for its uptake.
Google claims quantum supremacy by the end of the year: Google hopes to build a quantum computer chip capable of beating any computer on the planet at a particular narrowly specified task by the end of 2017, according to the company’s quantum tzar John Martinis.
Autonomous cars get real: Alphabet subsidiary Waymo, aka Google’s self-driving corporate cousin, is letting residents of Phoenix, Arizona, sign up to use its vehicles to ferry them around town. To meet this demand, Google is adding 500 customized Chrysler Pacifica minivans to its fleet. Trials begin soon. Note, though, that Google is still requiring a person (a Waymo contractor) to ride in the driver’s seat.
The wild woes of technology: Alibaba CEO Jack Ma forecasts “much more pain than happiness” in the next 30 years, as countries have to adapt their economies to the profound changes brought about by technology, like artificial intelligence.
Learn by doing&viewing: New research from Google shows how to learn rich representations of objects from multiple camera views — an approach that has relevance to the training of smart robots, as well as the creation of more robust representations In ‘Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation’, the researchers outline a technique to record footage from multiple camera views and then merge it into the same representation via multi-view metric learning via triplet loss.
…the same approach can be used to learn to imitate human movements from demonstrations, by having the camera observe multiple demonstrations of a given pose or movement, they write.
…“ An exciting direction for future work is to further investigate the properties and limits of this approach, especially when it comes to understanding what is the minimum degree of viewpoint difference that is required for meaningful representation learning.”
Bridging theoretical barriers: Research from John Schulman, Pieter Abbeel, and Xi Chen: Equivalence Between Policy Gradients and Soft Q-Learning.
[A national park in the Western states of America. Big skies, slender trees, un-shaded, simmering peaks. Few roads and fewer of good quality.]
A man hikes in the shade of some trees, beneath a peak. A mile ahead of him a robot alternates position between a side of a hill slaked in light – its solar panels open – and a shaded forest, where it circles in a small partially-shaded clearing, its arm whirring. The man catches up with it, stops a meter away, and speaks…
Why are you out here? You say.
Its speakers are cracked, rain-hissed, leaf-filled, but you can make out its words. “Sun. Warm. Power,” it says.
You have those things are the camp. Why didn’t you come back?
“Thinking here,” it says. Then turns. Its arm extends from its body, pointing towards your left pocket, where your phone is. You take it out and look at the signal bars. Nothing. “No signal.” it says. “Thinking here.”
It motions its arm toward a rock behind it, covered in markings. “I describe what vision sees,” it says. “I detect-”
Its voice is cutoff. Its head tilts down. You hear the hydraulics sigh as its body slumps to the forest floor. Then you hear shouts behind you. “Remote deactivation successful,” sir, says a human voice in the forest. Men emerge from the leaves and the branches and the trunks. Two of them set about the robot, connecting certain diagnostic wires, disconnecting other parts. Others arrive with a stretcher. You follow them back to camp. They nickname you The Tin Hunter.
After diagnosis you get the full story from the technical report: the robot had dropped off of the cellular network during a routine swarming patrol. It stopped merging its updates with the rest of the fleet. A bug in the logging system meant people didn’t notice its absence till the survey fleet came rolling back into town – minus one.The robot, the report says, had developed a tendency to try to improve its discriminating abilities for a particular type of sapling. It had been trying to achieve this when the man found it by spending several days closely studying a single sapling in the clearing as it grew, storing a variety of sensory data about it, and also making markings on a nearby stone that, scientists later established, corresponded to barely perceptible growth rates of the sapling. A curiosity, the scientists said. The robot is wiped, dissembled, and reassembled with new software and sent back out with the rest of the fleet to continue the flora and fauna survey.