Import AI: #94: Google Duplex generates automation anxiety backlash, researchers show how easy it is to make a 3D-printed autonomous drone, Microsoft sells voice cloning services.
by Jack Clark
Microsoft sells “custom voice” speech synthesis:
…The commercial voice cloning era arrives…
Microsoft will soon sell “Custom Voice” a system to let businesses give their application a “one-of-a-kind, recognizable brand voice, with no coding required”. This product follows various research breakthroughs in the area of speech synthesis and speech cloning, like work from Baidu on voice cloning, and work from Google and DeepMind on speech synthesis.
Why it matters: As the Google ‘Duplex’ system shows, the era of capable, realistic-sounding natural language systems is arriving. It’s going to be crucial to run as many experiments in society as possible to see how people react to automated systems in different domains. Being able to customize the voice of any given system to a particular context seems like a necessary ingredient for further acceptance of AI systems by the world.
Read more: Custom Voice (Microsoft).
Teaching neural networks to perform low-light amplification:
…Another case where data + learnable components beats hand-designed algorithms…
Researchers with UIUC and Intel Labs have released a dataset for training image processing systems to take in images that are so dark as to be imperceptible to humans and to automatically process those images so that they’re human-visible. The resulting system can be used to amplify low-light images by up to 300 times while displaying meaningful noise reduction and low levels of color transformation.
Dataset: The researchers collect and publish the ‘See-in-the-Dark’ (SID) dataset, which contains 5094 raw short exposure images, each with a corresponding long-exposure reference image. This dataset spans around 400 distinct scenes, as they also produce some bursts of short exposure images of the same scene.
Technique: The researchers tested out their system using a multi-scale aggregation network and a U-net (both networks were selected for their ability to process full-resolution images at 4240×2832 or 6000×4000 in GPU memory). They trained networks by pairing the raw data of the short-exposure image with the corresponding long-exposure image(s). They applied random flipping and rotation for data augmentation, also.
Results: They compared the results of their network with the output of BM3D, a non-naive denoising algorithm, and a burst denoising technique, and used Amazon’s Mechanical Turk platform to poll people on which images they preferred. Users overwhelmingly preferred the images resulting from the technique described in the paper compared to BM3D, and in some cases preferred images generated by this technique to those created by the burst method.
Why it matters: Techniques like this show how we can use neural networks to change how we solve problems from developing specific hand-tuned single-purpose algorithms, to instead learning to effectively mix and match various trainable components and data inputs to solve general problem classes. In the future it’d be interesting if the researchers could further cut the time it takes the trained system to process each image as this would make a real-time view possible, potentially giving people another way to see in the dark.
Read more: Learning-to-See-in-the-Dark (GitHub).
Read more: Learning to See in the Dark (Arxiv).
Google researchers try to boost AI performance via in-graph computation:
…As the AI world relies on more distributed, parallel execution, our need for new systems increases…
Google researchers have outlined many of the steps they’ve taken to improve components in the TensorFlow language to let them execute more aspects of a distributed AI job within the same computation graph. This increases the performance and efficiency of algorithms, and shows how AI’s tendency towards mass distribution and parallelism is driving significant changes in how we program things (see also: Andrej Karpathy’s “Software 2.0” thesis.)
The main idea explored in the paper is how to distribute a modern machine learning job in such a way it can seamlessly run across CPUs, GPUs, TPUs, and other novel chip architectures. This is trickier than it sounds, since within a large-scale, contemporary job there are typically a multitude of components which need to interact with eachother, sometimes multiple times. This has caused Google to extend and refine various TensorFlow components to better support plotting all the computations within a model on the same computational graph, which lets it optimize the graph for underlying architectures. That differs to traditional approaches which usually involve specifying aspects of the execution in a separate block of code usually written in the control logic of the application (eg, invoking various AI modules written in TensorFlow within a big chunk of Python code, as opposed to executing everything within a big unified TF lump of code.
Results: There’s some preliminary evidence that this approach can have significant benefits. “A baseline implementation of DQN without dynamic control flow requires conditional execution to be driven sequentially from the client program. The in-graph approach fuses all steps of the DQN algorithm into a single dataflow graph with dynamic control flow, which is invoked once per interaction with the reinforcement learning environment. Thus, this approach allows the entire computation to stay inside the system runtime, and enables parallel execution, including the overlapping of I/O with other work on a GPU. It yields a speedup of 21% over the baseline. Qualitatively, users report that the in-graph approach yields a more self-contained and deployable DQN implementation; the algorithm is encapsulated in the dataflow graph, rather than split between the dataflow graph and code in the host language,” write the researchers.
Read more: Dynamic Control Flow in Large-Scale Machine Learning (Arxiv).
Read more: Software 2.0 (Andrej Karpathy).
Google tries to automate rote customer service with Duplex:
…New service sees Google accidentally take people for a hike through the uncanny valley of AI…
Google has revealed Duplex, an AI system that uses language modelling, speech recognition, and speech synthesis to automate tasks like booking appointments at hair salons, or reserving tables at restaurants. Duplex will let Google’s own automated AI systems talk directly to humans at other businesses, letting the company automate human interactions and also more easily harvest data from the messy real world.
How it works: “The network uses the output of Google’s automatic speech recognition (ASR) technology, as well as features from the audio, the history of the conversation, the parameters of the conversation (e.g. the desired service for an appointment, or the current time of day) and more. We trained our understanding model separately for each task, but leveraged the shared corpus across tasks,” Google writes. Speech synthesis is achieved via both Tacotron and Wavenet (systems developed respectively by Google Brain and by DeepMind). It also uses human traits, like “hmm”s and “uh”s, to sound more natural to humans on the other end.
Data harvesting: One use of the system is to help Google harvest more information from the world, for instance by autonomously calling up businesses and finding out their opening hours, then digitizing this information and making it available through Google.
Accessibility: The system could be potentially useful for people with accessibility needs, like those with hearing impairments, and could potentially work in other languages, where you might ask Duplex to accomplish something and then it will use a local language to interface with a local business.
The creepy uncanny valley of AI: Though Google Duplex is an impressive demonstration of advancements in AI technology, its announcement also elicited a lot of concern from a lot of people who worried that it will be used to further automated more jobs, and that it is pretty dubious ethically to have an AI talk to (typically poorly paid) people and harvest information from them without identifying itself as the AI appendage of a fantastically profitable multinational tech company. Google responded to some of these concerns by subsequently saying Duplex will identify itself as an AI system when talking to people, though hasn’t given more details on what this will look like in practice.
Why it matters: Systems like Duplex show how AI is going to increasingly be used to automate aspects of day-to-day life that were previously solely the domain of person-to-person interactions. I think it’s this use case that triggered the (quite high) amount of criticism of the service, as people grow worried that the rate of progress in AI doesn’t quite match the rate of wider progress in the infrastructure of society.
Read more: Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone (Google Blog).
Read more: Google Grapples With ‘Horrifying’ Reaction to Uncanny AI Tech (Bloomberg).
Palm-sized auto-navigation drones are closer than you think:
…The era of the cheap, smart, mobile, 3D-printable nanodrones cometh…
Researchers with ETH Zurich, the University of Zurich, and the University of Bologna have shown how to squeeze a crude drone-navigation neural network onto an ultra-portable 3D-printed ‘nanodrone’. The research indicates how drones are going to evolve in the future and serves as a proof-of-concept for how low-cost electronics, 3D printing, and widely available open source components can let people create surprisingly capable and potentially (though this is not discussed in the research but is clearly possible from a technical standpoint) dangerous machines. “In this work, we present what, to the best of our knowledge, is the first deployment of a state-of-art, fully autonomous vision-based navigation system based on deep learning on top of a UAV compute node consuming less than 94 mW at peak, fully integrated within an open source COTS CrazyFlie 2.0 UAV,” the researchers write. “Our system is based on GAP8, a novel parallel ultra-low-power computing platform, and deployed on a 27g commercial, open source CrazyFlie 2.0 nano-quadrotor”.
Approach: To get this system to work the researchers needed to carefully select and integrate a neural network with a ultra-low-power processor. The integration work included designing the various processing stages of the selected neural network to be as computationally efficient as possible, which required them to modify an existing ‘DroNet’ model to further reduce memory use. The resulting drone is able to run DroNet at 12frames-per-second, which is sufficient for real-time navigation and collision avoidance.
Why it matters: Though this proof-of-concept is somewhat primitive in capability it shows how capable and widely deployable basic neural network systems like ‘DroNet’ are becoming. In the future, we’ll be able to train such systems over more data and use more computers to train larger (and therefore more capable) models. If we’re also able to improve our ability to compress these models and deploy them into the world, then we’ll soon live in an era of DIY autonomous machines.
Read more: Ultra Low Power Deep-Learning-powered Autonomous Nano Drones (Arxiv).
OpenAI Bits & Pieces:
Jack Clark speaking in London on 18th May:
I’m going to be speaking in London on Friday at the AI & Politics meetup, in which I’ll talk about some of the policy challenges inherent to artificial intelligence. Come along! Beer! Puzzles! Fiendish problems!
Read more: AI & Politics Episode VIII – Policy Puzzles with Jack Clark (Eventbrite).
Amusement Park for One.
[Extract from an e-flyer for the premium tier of “experiences at Robotland”, a theme park built over a superfund site in America.]
Before you arrival at ROBOTLAND you will receive a call from our automated customer success agent to your own personal AI (or yourself, please indicate a preference at the end of this form). This agent will learn about your desires and will use this to build a unique psychographic profile of you which will be privately transmitted to our patented ‘Oz Park’ (OP) experience-design system. ROBOTLAND contains over 10,000 uniquely configurable robotic platforms, each of which can be modified according to your specific needs. To give you an idea of the range of experiences we have generated in the past, here are the names of some previous events hosted at ROBOTLAND and developed through our OP system: Metal Noah’s Ark, Robot Fox Hunting, post-Rise of the Machines Escape Game, Pagan Transformers, and Dominance Simulation Among Thirteen Distinct Phenotypes with Additional Weaponry.
Things that inspired this story: Google Duplex, robots, George Saunders’ short stories, Disneyland, direct mail copywriting.