Import AI: Newsletter 42: Ensemble learning, the paradoxical nature of AI research, and Facebook’s CNN-for-RNN substitution

by Jack Clark

‘Mo ensembles, ‘No problems: new research shows how to get the benefits of grouping a bunch of neural networks together (known as an ensemble), without having to go to the trouble of training each of them individually. The technique is outlined in Snapshot Ensembles: Train 1, Get M For Free.
…it’s surprisingly simple and intuitive. The way neural networks are trained today can be thought of as like rolling a ball down a fairly uneven hill – the goal is to get the ball to the lowest possible point of the hill. But the hill is uneven, so it’s fairly easy for the ball to get trapped in a local low-elevation point in the hill and stay there. In AI land, this point is called a ’local minima’ – it’s bad to get stuck in a local minima.
…Most tricks in AI training involve getting the model to visit way more locations during training and thereby avoid a sub-optimal local minima – ideally you want the ball to find the lowest point in the hill, even if it runs into numerous depressions along the way.
…the presented technique shows how to record a snapshot of each local minima the neural network visits along the way during training. Then, once you finish training, you kind of combine all the previous local minima by taking the snapshots and re-animating them, then training them together.
…Results: the approach works, with the authors reporting that this technique yields more effective systems on tasks like image classification, while not costing too much more in the way of training.

Voice data – who speaks to whose speakers?: if data is the fuel for AI, then Amazon looks like it’s well positioned to haul in a trove of voice data, according to eMarketer.
…Amazon’s share of the US home chit-chat speaker market in 2017: ~70.6%
…Google’s: 23.8%
…Others: 5.6%

A/S/E? Startup researchers show off end-to-end age, sex, and emotion recognition system: AI is moving into an era dominated by composite systems, which see researchers complex, interlinked software to perform multiple categorization (and sometimes actions) within the same structure…
… in this example, researchers from startup Sighthound have developed DAGER: deep age, gender, and emotion recognition using convolutional neural networks. DAGER can guess someone’s age, sex, and emotion from a single face-on photograph. The training ingredients for this include 4 million images of over 40,000 distinct identities…
… Apparently has a lower mean absolute error than systems outlined by Microsoft and others.
… Good news: The researchers sought to offset some of the (sadly inevitable) biases in their datasets by adding “tens of thousands of images of different ethnicities as well as age groups”. It’s nice that people are acknowledging these issues and trying to get ahead of them.

Uber hires Raquel Urtasun: Self-driving car company Uber has hired Raquel Urtasun, a well-respected researcher with the University of Toronto, to help lead its artificial intelligence efforts.
…Urtasun’s group had earlier created KITTI, a free and open dataset used to benchmark computer vision systems against problems that self-driving caws encounter. Researchers have already used the dataset to train vision models entirely in simulation using KITTI data, then transfer them into the real world.
…meanwhile Lyft and Google (technically, Waymo) have confirmed that they’ve embarked on a non-exclusive collaboration to work together on self-driving cars.

Cisco snaps up speech recognition system with MindMeld acquisition: Cisco has acquired voice recognition startup MindMeld for around $125 million. The startup had made voice and conversation interface technologies, which had been used by commercial companies such as Home Depot, and others.

Government + secrecy + AI = fatal error, system override: Last week, hundreds of thousands of computers across the world were compromised by a virulent strain of malware, spread via a zero-day flaw that, Microsoft says in this eyebrow raising blogpost, was originally developed by the NSA.
…today, governments stockpile computer security vulnerabilities, using them strategically against foes (and sometimes ‘friends’). But as our digital systems become ever more interlinked, the risk of one of these exploits falling into the wrong hands increase, as do its effects.
…we’re still a few years away (I think) from government’s classifying and stockpiling AI exploits, but I’m fairly sure that in the future we could imagine government developing certain exploits, say a new class of adversarial examples, and not disclosing their particulars, instead keeping them private to be used against a foe.
…just as Microsoft advocates for what it calls a Digital Geneva Convention, it may make sense for AI companies to agree upon a similar set of standards eventually, to prevent the weaponization and exploitation of AI.

Doing AI research is a little bit like being a road-laying machine, where to travel forward you must also create the ground beneath you. In research, what this translates to is that new algorithms typically need to be paired with new challenges. Very few AI systems today are robust enough to be able to be plunked down in reality able to do useful stuff. Instead, we try to get closer to being able to build these systems by inventing learning algorithms that exhibit increasing degrees of general applicability on increasingly diverse datasets. The main way to test this kind of general applicability is to create new ways to test such AI systems – that’s why the reinforcement learning community is evolving from just testing on Atari games to more sophisticated domains, like Go, or video games like Starcraft and Doom.
…the same is true of other domains beyond reinforcement learning: to build new language systems we need to assemble huge corpuses of data and test algorithms on them – so over time it feels like the amounts of text we’ve been testing on have grown larger. Similarly, in fields like question answering we’ve gone from simple toy datasets to more sophisticated trials (like Facebook’s BaBi corpus) to even more elaborate datasets.
…A new paper from DeepMind and the University of Oxford, Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, is a good example of this sort of hybrid approach to AI development. Here, the researchers try to tackle the task of solving simple algebraic word problems by not only inventing new algorithmic approaches, but doing so while generating new types of data. The resulting system can not only generate the answers, but also its rationale for the answer.
…size of the new dataset: over 100,000 word problems that include answers as well as natural language rationales.
…how successful is it? Typical AI approaches (which utilize sequence-to-sequence techniques) tend to have accuracies of about 20% on the task. This new system gets things right 36% of the time. Still a bad student, but a meaningful improvement.
A little bit of supervision goes a long way: Facebook and Stanford researchers are carrying out a somewhat similar line of enquiry but in a different domain. They’ve come up with a new system that can get state-of-the-art results on a dataset intended to tend visual reasoning. The secret to their method? Training a neural network to invent its own small computer programs on the fly to answer questions about images it sees. You can find out more in ‘Inferring and Executing Programs for Visual Reasoning’. The most intriguing part? The resulting system is relatively data efficient, compared to fully supervised baselines, suggesting that its learning how to tackle the task in novel ways.
…it seems likely that in the future AI research may shift from involving generating new datasets alongside new algorithms, to generating new datasets, new algorithms, as well as new reasoning programs to aid with learning efficiency and interpretability.

Mujoco for free (restrictions apply): Physics simulator Mujoco will give students free licenses to its software, lowering the costs of doing AI research on modern, challenging problems, like those found in robotics.
…Due to the terms of the license, people will still need to stump up for a license for the proprietary software if they want to use AI systems trained within Mujoco in products.

Don’t read the words, look at them! (and get a 9X speedup): Facebook shows how to create a competitive translation system that is also around 9 times faster than previous state-of-the-art systems. The key? Instead of using a recurrent neural network to analyze the text, use a convolutional neural network.
…this is somewhat counterintuitive. RNNs are built to analyze and understand sequences, like strings of text or numbers. Convolutional neural networks are somewhat cruder and are mostly used as the basic perceptual component inside vision systems. How was Facebook able to manhandle a CNN into something with RNN-like characteristics? The answer is the usage of attention, which lets the network focus on particular words.

Horror of the week: what happens when you ask a neural network to make a person smile, then feed it that new smile–augmented image and ask it to make the person smile even more, and then you take that image and feed it back to the network and ask the network to enhance its smile again? You wind up with something truly horrifying! Thanks, Gene Kogan.

Tech Tales:

[2040: the partially flooded Florida lowlands.]

The kids nickname it “Rocky the Robster” the first time they see it and you tell them “No, it’s called the Automated Ocean Awareness and Assessment Drone,” and they smile at you then say “Rocky is better.” And it is. But you wish they hadn’t named it.

Rocky is about the size of a carry-on luggage suitcase, and it does look, if you squint, a little like a metallic lobster. Two antennas extend from its front, and its undercarriage is coated in grippers and sampling devices and ingest and egress ports. In about two months it’ll go into the sea. If things work correctly, it will never come out, but will become another part of the ocean, endlessly swimming and surveilling and learning, periodically surfacing, whale-like, to beam information back to the scientists of the world.

But before it can start its life at sea, you need to teach it out to swim and how to make friends. Rocky comes with a full low-grade suite of AI software and, much like a newborn, it learns through a combination of imitation and experimentation. Imitation is where your kids come in. They come in and watch you in your studio as you, on all fours, walk across the room. Rocky imitates you poorly. The kids crawl across the room. Rocky imitates them a bit better. You figure that Rocky finds it easier to imitate their movements as they’re closer in size to it. Eventually, you and the kids teach the robot to swim as well, all splashing around in a pool in the backyard, with the robot tethered to prevent its enthusiastic attempts to learn to swim leading to it running into your kids.

Then Rocky’s AI systems start to top out – as planned. It can run and walk and scuttle and swim and even respond to some basic hand gestures, but though it still gambles around with a kind of naive enthusiasm, it stops developing new tics and traits. The sense of life in it dims as the kids become aware that Rocky is more drone than they thought.
“Why isn’t Rocky getting smarter anymore, Dad?” they say.
You try to explain that some things can’t get smarter.
“No, that’s the opposite of what you’ve always told us. We just need to try and we can learn anything. You say this all the time!”
“It’s not like that for Rocky,” you say.
“Why not?” they say. Then tears.

The night before Rocky is due to be collected by the field technicians who will make some final modifications to its hardware before sending it into the sea, you hear the creak on the stairwell You don’t follow them or stop them but instead turn on a webcam and look into your workshop, watch the door slowly ease open as the kids quietly break-in. They sit down next to Rocky’s enclosure and talk to it. They show it pictures they’ve drawn of it. They motion for it to look at them. “Say it, Rocky,” you hear them say, “try to say ‘I want to stay here’”.

Having no vocals cords, it is unable. But as you watch your kids on the webcam you think that for a fraction of a second Rocky flexes its antennas, the two tops of each bowing in and touching each-other, forming a heart before thrumming back into their normal position. “A statistical irregularity,” you say to your colleagues, never believing it.