Import AI: Issue 25: Open source neural machine translation, Microsoft acquires language experts Maluuba, Keras tapped for TensorFlow
by Jack Clark
If this, then drive: self-driving startup NuTonomy is using a complex series of rules to get its self-driving cars in Singapore to drive safely, but not be so timid that they get can’t get anywhere. Typically, AI researchers prefer to reduce the number of specific rules in a system and instead try to learn as much behavior as possible, inferring proper codes of conduct from data gleaned from reality. NuTonomy’s decision to hand-code a hierarchy of rules into its system provides a notable counterpoint to the general trend towards learning everything from data. The company plans to expand its commercial offering in Singapore next year, though its cars will still be accompanied by a human ‘safety driver’ — for the time being.
Disposable lifesaving drones: Otherlab is building disposable drones with cardboard skins, as part of a research program funded by DARPA. The drones lack an onboard motor and navigate by deforming their wing surfaces as they glide to their targets.
…perhaps one day these cardboard drones will fly in swarms? Scientists have long been fascinated by the science of swarms because they afford distributed resiliency and intelligence. The US military has recently highlighted how swarms of drones can perform the job of much larger, more expensive, single machines. I wonder if we’ll eventually develop two-tiered swarms, where some specialized functions are present in a minority of the swarm. After all, it works for ants and bees.
AI acquisitions: Amazon quietly acquired security startup Harvest.AI, according to Techcrunch. Next, Microsoft, acquired Canadian AI startup Maluuba…
…Maluuba has spent a few years conducting research into language understanding, publishing research papers on areas like reading comprehension and dialogue generation. It has also released free datasets for the AI community, like NewsQA…
…Deep learning stalwart Yoshua Bengio will become an advisor to Microsoft as part of the Maluuba acquisition – quite a coup for Microsoft, though worth noting Bengio advises many companies (including IBM, OpenAI, and others). This might make up for Microsoft losing longtime VP Qi Lu, who had done work for the company in AI and is now heading to Baidu to become its COO.
Sponsored: RE•WORK Machine Intelligence Summit, San Francisco, 23-24 March – Discover advances in Machine Learning and AI from world leading innovators and explore how AI will impact transport, manufacturing, healthcare and more. Confirmed speakers include: Melody Guan from Google Brain; Nikhil George from Volkswagen Electronics Research Lab and Zornitsa Kozareva, from Amazon Alexa. The Machine Intelligence in Autonomous Vehicles Summit will run alongside, meaning attendees can enjoy additional sessions and networking opportunities. Register now.
Keras gets TensorFlow citizenship: high-level machine learning library Keras will become an official, supported third-party library for TensorFlow. Keras makes TensorFlow easier to use for certain purposes and has been popular with artists and other people who don’t spend quite so much time coding. Anything that broadens the number of people able to fiddle with and contribute to AI is likely to be helpful in the short term. Congratulations to Keras’s developer Francois!
Don’t regulate AI, have AI regulate the regulators: Instead of regulating AI, we should create ‘AI Guardians’ – technical oversight systems that will be bound up in the logic of the AIs we deploy in the world, says Oren Etzioni, CEO of the Allen Institute for AI Research. (Etzioni doesn’t rule out all cases of regulation but, as with what parents say about sugar or computer games, his attitude seems to be ‘a little bit goes a long way’.)
Self-driving car deployment, AKA Capitalism Variant A, versus Capitalism Variant B: “Industry and government join hands to push for self-driving vehicles within China,” reports Bloomberg, as Chinese search engine Baidu joins up with local government-owned automaker BAIC to speed development of the technology….
… Meanwhile, in America, the Department of Transport has formed a federal Committee on Automation, which gathers people together to advise the DOT on automation. Members include people from Delphi Automotive, Ford, Zipcar, Zoox, Waymo, Lyft, and others. “This committee will play a critical role in sharing best practices, challenges, and opportunities in automation, and will open lines of communication so stakeholders can learn and adapt based on feedback from each other,” the DoT says…
Open Source Neural translation: Late in 2016 Google flipped a switch that ported a huge chunk of its translation infrastructure over to a Multilingual Neural Machine Translation system. This tech combined the representations of numerous languages into a big neural network, and let you translate between pairs that you didn’t have raw data for. (So, if you had translations for English to Portuguese, as well as ones for Portuguese to German, but no corpus of English to German, this system could attempt to bridge the gap by tunneling through the joint representations from its Portuguese expertise…
…Now, researchers Yoon Kim and harvardnlp, have released an open source neural machine translation system written in Torch, so people can build their own offline, non-cloud translation systems. The Babelfish gets closer!
AI, AI everywhere, and not a Bit of information to send: our automated future consists of many machines and little human-accessible information, according to this airport-hell tale from Quartz. Technology that seems efficient in the aggregate can have exceedingly irritating edge case failures.
$27 million for AI research: Reid Hoffman, Pierre Omidyar, the Knight Foundation, and others, have put $27 million toward funding research into ethical AI systems. The funds will support research that combines the humanities with AI, and will help answer questions about how to communicate about the capabilities of the technology, what controls should be placed over it, and how to grow the field to ensure the largest number of people are involved in the design of this powerful technology, among others.
Power-sipping eyes in the sky: the US military says it’s pleased with the performance of IBM’s neuromorphic TrueNorth processor. The chip performs on par with a traditional high-end computer for AI-based image identification tasks, while consuming between one twentieth and one thirtieth the power of an NVidia Jetson TX1 processor, apparently. This represents another endorsement of IBM’s idea that non-Von Neumann architectures are needed for specialized AI chips. However, deploying the software on the chip can be a bit more laborious than going via NVidia’s well supported inbuilt ecosystem, the military says.
Deep learning is made of people! Startup Spare5 has raised $14 million and renamed itself to Mighty AI, as it looks to capitalize on the need for better training data for AI. It will compete with companies like Crowdflower and services like Amazon’s Mechanical Turk to offer companies access to a pool of people they can tap to label data for them. One note to remember: for research, it’s possible to mostly use public datasets when developing new techniques, but for commercial products you’ll typically need highly-specific labelled data as you build products for specific verticals.
Never underestimate the pre-Cambrian computing power of government: I had a friend of my Dad’s who, a few years ago, told me he was maintaining some old UK National Health Service systems by writing stuff for them in BASIC – something I recollect whenever I have cause to visit a UK emergency room. It’s almost reassuring that the White House is no different. “We had a computer on our desk. We didn’t have laptops, we didn’t have iPads, we didn’t have iPhones, and we had about a half a bar of service. So if you brought in your own equipment, you couldn’t use it…We had Compaqs running Windows 98 or 2000. No laptops. It was like we had gone back in time,” staffers recall. Technology takes a long time to turn over in large bureaucracies, so while we’re all getting excited about AI it’s worth remembering that uptake in certain areas will be sl-oooo-wwww.
Computer, enhance: just a year ago, researchers were getting excited about deep learning based techniques to upscale the resolution of photos. These methods work, roughly, by showing a neural network loads of small pictures and their big picture counterparts, and train it to figure out how to infer the high resolution details from low-resolution inputs. You wouldn’t want to use this to increase the resolution of keyhole satellite photos of foreign arms dumps (as any new or errant information here could have extremely unpleasant consequences), but you might want to use it to increase the size of your wedding photos…
…Twitter appeared to be enthused by this technique when it acquired UK startup Magic Pony, which had done a lot of research in this area. Now Google is tapping the same techniques to save 75% of bandwidth for users of Google plus by using its RAISR tech, which it first talked about in November. Another demonstration of the rapid rate at which research goes into production within AI.
Think AI is automated? Think again. You’ve heard of gradient descent – one of the processes by which we can propagate information through AI. Well, there’s a joke among professors that for sufficiently hard problems you also turn to another less known but equally important technique called ‘Grad Student Descent’, the formula of which is roughly:
Solution = (N post-doc humans * (Y ramen * Z coffee))…
… so as much as the research community talks about new techniques based around learning to learn, and getting AI to smartly optimize its own structure, it’s worth remembering that most real world applications of the technology rely more on the ingenuity of people than of the amazing power of the algorithms…
…David Brailovsky, who recently solved a traffic light classification competition, explains that “The process of getting higher accuracy involved a LOT of trial and error. Some of it had some logic behind it, and some was just “maybe this will work”.” Some tricks tried include rotating images, training with a lower learning rate, and, inevitably, finding and correcting bugs in the underlying dataset. (Hence the business opportunity for aforementioned companies like Mighty AI, Crowdflower, and so on.)
What does it mean to be the CTO of OpenAI, and how did that role come about? Co-founder Greg Brockman explains. Shame he gave away his trick about deadlines, though.
[2019: A cafe, somewhere in the baltics.]
So it comes down to this: after two years of work, you just write a few lines, and shift the behavior of, hopefully, millions of people. But you need to get this exactly right, or else the algorithms could realize the charade and you burn the accounts for almost no gain, he thinks, hands hovering above the keyboard. He’s about to send out a very particular product endorsement from the account of a famous, Internet personality.
He spent years constructing the personality, building it up from the dry seeds of some long-inactive, later-deleted, tumblr and instagram accounts. It took years, but the ghost has grown into a full internet force with fans and detractors and even a respectable handful of memes.
The next step is product endorsement – and it’s a peculiar one. SideKik, as it’s called, will give the ghost-celeb’s followers the chance to give control over a little bit of their online identity to a small AI, said to be controlled by the celebrity. Be a part of something bigger than yourself!, he wants the celebrity to say and the fans to think, download SideKik and let’s get famous together!
What the fans don’t know is that if they give away SideKik they won’t be gaining the subtle, occasional input of the celebrity, instead they’ll become an extension of the underlying thicket of AI systems, carefully sculpted and maintained by the man at the keyboard. Slowly, they’ll be used to gather microscopic shreds of data from the internet through targeted messages with their own followers, and they’ll also be used to create the appearance of certain trends or inclinations in specific groups on the internet. The anti-AI detectors are getting better all the time now, so it takes all this work just to create the facsimile of a real community orbiting around a real star. Due to the spike in illegitimate traffic from automated AI readbots, typical internet ads have become so common and so abused as to be almost worthless, so what’s a marketer meant to do?, he thinks, composing his next few words that could give him a legion of unsuspecting guerrilla marketers.