Import AI: Issue 43: Why curiosity improves AI algorithms, what follows ImageNet, and the cost of AI hardware

by Jack Clark

 

ImageNet is dead, long live WebVision: ImageNet was a dataset and associated competition that helped start the deep learning revolution by being the venue where in 2012 a team of researchers convincingly demonstrated the power of deep neural networks. But now it’s being killed off – this year will be the last official Imagenet challenge. That’s appropriate because last year’s error rate on the overall dataset was about 2.8 percent, suggesting that our current systems have exhausted much of ImageNet’s interesting challenges and may even be in danger of overfitting.
…What comes next? One potential candidate is WebVision, a dataset and associated competition from researchers at ETH Zurich, CMU, and Google, that uses the same 1000 categories as the ImageNet competition in 2012 across 2.4 million modern images and metadata taken directly from the web (1 million from Google Image Search and 1.4 million from Flickr.)
…Along with providing some degree of continuity in terms of being able to analyze image recognition progress, this dataset also has the advantage of being partially crappy, due to being culled from the web. It’s always better to test AI algorithms on the noisy real world.
…”Since the image results can be noisy, the training images may contain significant outliers, which is one of the important research issues when utilizing web data,” write the researchers.
…More information: WebVision Challenge: Visual Learning and Understanding With Web Data.

Making self-driving cars a science: the field of self-driving car development it lacks the open publication conventions of the rest of AI research, despite using and extending various cutting-edge AI research techniques. That’s probably because of the seemingly vast commercial-value of self-driving cars. But it brings forward a bunch of problems, namely, how can people try to make the development more scientific and thereby improve the efficiency of the industry, while benefiting society through the science being open.
…AI meme-progenator and former self-driving startup Comma.ai intern Eder Santana has written up a shopping list of things that, if fulfilled, would improve the science of self-driving startups. It’s a good start at a tough problem.
…I wonder if smaller companies might band together to enact some of these techniques – with higher levels of openness than titans like Uber and Google and Tesla and Ford etc – and use that to collaboratively pool research to let them compete? After all, the same philosophy already seems present in Berkeley DeepDrive, an initiative whereby a bunch of big automakers fund open AI research in areas relevant to their development.
The next step is shared data. I’m curious if Uber’s recent hire, Raquel Urtasun, will continue her work on the KITTI self-driving car dataset which she created and Eder lists as a good example.

AI aint cheap: Last week, GPUs across the world were being rented by researchers racing to perform final experiments for NIPS. This wasn’t cheap. Despite many organizations (including OpenAI) trying to make it easier for more researchers to experiment with and extend AI, the costs of raw computer remain quite high. (And because AI is mostly an experimental, empirical science, you can expect to have to shell out for many experiments. Some deep-pocketed companies, like Google, are trying to offset this by giving researchers free access to resources, most recently 1,000 of its Tensor Processing Units in a dedicated research cloud, but giveaways don’t seem sustainable in the long run.)
…”We just blew $5k of google cloud credits in a week, and managed only 4 complete training runs of Inception / Imagenet. This was for one conference paper submission. Having a situation where academia can’t do research that is relevant to Google (or Facebook, or Microsoft) is really bad from a long-term perspective”, wrote Hacker News user dgacmu.
A new method of evaluating AI we can all get behind: Over on the Amazon Web Services blog a company outlines various different ways of training a natural language classification system and it lists how much it costs not just in terms of computation, but in terms of how much it will cost you to rent the computing resources for it on AWS in both CPUs and GPUs. These sorts of numbers are helpful for putting into perspective how much AI costs and, more importantly, how long it takes to do things that the media (yours included) makes sound simple.

How to build an AI business, from A16Z: VC firm Andreessen Horowitz has created the AI Playbook, a microsite to help people figure out how AI works and how to embed into their business.
…Bonus: it includes links to the thing every AI person secretly (and not so secretly) lusts after: DATA.
…Though AI research has been proceeding at a fairly rapid clip, this kind of project hints at the fact that commercialization of it has been uneven. That’s partly due to a general skills deficit in AI across the tech industry and also because in many ways it’s not exactly clear how you can use AI – especially the currently on-trend strain of deep neural networks – in a business. Most real-world data requires a series of difficult transforms before it can be strained through a machine learning algorithm and figuring out the right questions to ask is its own science.

E-GADs: Entertaining Generative Adversarial Doodles! Google has released a dataset of 50 million drawings across 345 distinct categories, providing artists and other fiddlers with a dataset to experiment with new kinds of AI-led aesthetics.
…This is the dataset that supported David Ha’s fun SketchRNN project, whose code is already available.
… It may also be useful for learning representations of real objects – I’d find it fun to try to train doodles with real image counterparts in a semi-supervised way, then be able to transform new real world pictures into cute doodles. Perhaps generative adversarial networks are a good candidate? I must have cause to use the above bolded acronym – you all have my contact details.

Putting words in someone else’s mouth – literally: fake news is going to get even better based on new techniques for getting computers to synthesize realistic looking images and videos of people.
…in the latest research paper in this area a team of researchers at the University of Oxford have produced ‘speech2vid’, a technique to get computers to be able to take a single still image of a person and an audio track and synthesize an animated version of that person’s face saying those words.
…The effects are still somewhat crude – check out the blurred, faintly comic-book like textures in the clips in this video. But hint at a future where it’s possible to create compelling propaganda using relatively little data. AI dopplegangers won’t just be for celebrities and politicians and other people who have generated vast amounts of data to be trained on, but will be made out of normal data-lite people like you or me or everyone we know.
….More information in the research paper You said that?

The curious incident of the curiosity exploration technique inside the learning algorithm: how can we train AI systems to explore the world around them in the absence of an obvious reward? That’s a question that AI researchers have been pondering for some time, given that in real life rewards (marriage, promotions, finally losing weight after seemingly interminable months of exercise) tend to be relatively sparse.
…One idea is to reward agents for being curious, because curious people tend to stumble on new things which can help expand and deepen their perception of the world. Children, for instance, spend most of their time curiously exploring the world around them without specific goals in mind and use this to help them understand it.
…The problem for AI algorithms is figuring out how to get them to learn to be curious in a way that leads to them learning useful stuff. One way could be to reward the visual novelty of a scene – eg, if I’m seeing something I haven’t seen before, then I’m probably exploring stuff usefully. Unfortunately, this is full of pitfalls – show a neural network the static on an untuned television and every frame will be novel, but not useful.
…So researchers at The University of California at Berkeley have come up with a technique to do useful exploration, outlined in Curiostiy-driven exploration by Self-supervised Prediction. It works like this: “instead of making predictions in the raw sensory space (e.g. pixels), we transform the sensory input into a feature space where only the information relevant to the action performed by the agent is represented.’
…What this means is that the agent learns how to be curious by taking actions in the world, and if those actions yield a different world then it’s able to figure out how those actions corresponded to that difference and take them more accordingly.
…So, how well does it work? The researchers test out the approach on two environments – Super Mario and Vizdoom. They find that it’s able to attain higher scores in a faster time than other methods, and can deal with increasingly sparse rewards.
…The most tantalizing part of the result? “An agent trained with no extrinsic rewards was able to learn to navigate corridors, walk between rooms and explore many rooms in the 3-D Doom environment. On many occasions the agent traversed the entire map and reached rooms that were farthest away from the room it was initialized in. Given that the episode terminates in 2100 steps and farthest rooms are over 250 steps away (for an optimally-moving agent), this result is quite remarkable, demonstrating that it is possible to learn useful skills without the requirement of any external supervision of rewards.”
…The approach has echoes of a recent paper from DeepMind outlining a reinforcement learning agent called UNREAL. This system was a composite of different neural network components; it used a smart memory-replay system to let it figure out how actions it had taken in the environment corresponded to rewards, and was able to also use it to figure out how actions it had taken corresponded to unspecified intermediate rewards that helped it gain an actual one (for example, though it was rewarded for moving itself to the same location as a delicious hovering apple, it subsequently figured out that to attain this reward it should achieve an intermediary reward which it creates and focuses on itself. It learned this by being able to figure out how its actions affected its observation of the world and adjusted accordingly.
…(Curiosity-driven exploration and related fields like intrinsic motivation are quite mature, well-studied areas of AI, so if you want to trawl through the valuable context I recommend reading papers cited in the above research.)

Import AI reader comment of the week: Ian Goodfellow wrote in to quibble with my write-up of a recent paper about how snapshots of the same network at different points in time can be combined to form an ensemble model. The point of contention is whether these snapshots represent different local minima:
”…Local minima are basically the kraken of deep learning. Early explorers were afraid of encountering them, but they don’t seem to actually happen in practice,” he writes. “What’s going on is more likely that each snapshot of the network is in a different location, but those locations probably aren’t minima. They’re like snapshots of a person driving a car trying to get to a specific point in a really confusing city. The driver keeps circling around their destination but can’t quite get to it because of one way street signs and their friend keeps texting them telling them to park in a different place. They’re always moving, never trapped, and they’re never in quite the right place, but if you average out all their locations the average is very near where they’re trying to go.”
…Thanks, Ian!

Help deal with the NIPS-A-GEDDON: This week, AI papers are going to start flooding onto Arxiv from submissions to NIPS, and some other AI conferences. Would people like to help rapidly evaluate the papers, noting interesting things? We tried a similar experiment a few weeks ago and it worked quite well. We used a combination of a form and a Google Doc to rapidly analyze papers. Would love suggestions from people on whether this format [GDoc] is helpful (I know it’s ugly as sin, so suggestions welcome here.)
…if you have any other thoughts for how to structure this project or make it better, then do let me know.

OpenAI bits&pieces:

It was a robot-fueled week at OpenAI. First, we launched a new software package called Roboschool, open-source software for robot simulation, integrated with OpenAI Gym. We also outlined a robotics system that lets us efficiently learn to reproduce behaviors from single demonstrations.

CrowdFlower founder and OpenAI intern chats about the importance of AI on this podcast with Sam Lessin, and why he thinks computers are eventually going to exceed humans at many (if not all!) capabilities.

Tech tales:

[2018: The San Francisco Bay Area, two people in two distant shared houses, conversing via their phones.]

Are you okay?
I’ve been better. You?
Things aren’t going well.
Anything I can do?
Fancy drinks?
Sure, when?
Wednesday at 930?
Sounds good!

You put your phone down and, however many miles away, so does the other person. Neither of you typed a word of that, instead you both just kept on thumbing the automatically suggested messages until you scheduled the drinks.

It’s true, the both of you are having trouble at the moment. Your system was smart enough to make the suggestions based on studying your other emails and the rhythms of the hundreds of millions of other users. When you eventually go and get drinks the GPS in your phones tracks you both, records the meeting – anonymously, only signalling to the AI algorithms that this kind of social interaction produced a Real In-Person Correspondence.

Understanding what leads to a person meeting up with another, and what conversational rhythms or prompts are crucial to ensuring this occurs, is a matter of corporate life and death for the companies pushing these services. We know when you’re sad, is the implication. So perhaps you should consider $drinks, or $a_contemporary_lifestyle_remedy, or $sharing_more_earlier.

You know you’re feeding them, these machines that live in football field-sized warehouses, tended to by a hundred-computer mechanics who cannot know what the machines are really thinking. No person truly knows what these machines relate to, instead it is the AI at the heart of the companies that does – and we don’t know how to ask it questions.

Technologies that inspired this story: sequence-to-sequence learning, Alexa/Siri/Cortana/Google, phones, differential privacy, federated learning.