Import AI: Issue 35: The end of ImageNet, unsupervised image fiddling with DiscoGan, and Alibaba’s voice data stockpile

by Jack Clark

 

Inside the freaky psychology of a machine learning researcher: being a machine learning researcher is a lot like being an addict at a slot machine, forever running experiments to see if intuitions about hyperparameters or setups are working, writes researcher Filip Piekniewski. …This sort of slot machine mentality does not encourage good science. “Perhaps by chance we get to a set of parameters that “looks promising”. Here is the reward signal. Most likely spurious, but the cause that gets rewarded is clear: running more models. Before we know, the researcher is addicted to running simulations and like any other addict he confabulates on why this is great and moves humanity forward.”
… Some of these problems will go away as machine learning matures into more of a scientific discipline in its own right. But until then it’s likely people will continue to get trapped in these dark slot machine patterns. Steer clear of the “nonlinear parameter cocaine”, kids.

Hunter-Killer video analysis, now available to buy! Stanford adjunct professor Reza Zadeh has given details on his startup Matroid. The company makes it easy for people to create and train new AI classifiers for specific objects or people, and helps them automatically analyze videos to find the people or objects in. “‘Like a metal detector detects metal, a matroid will detect something in media,” he said at the Scaled Machine Learning Conference at Stanford this week.

17,000 hours of speech: data used by Alibaba’s AI research team to train a speech recognition system.
…”The dataset is created from anonymous online users’ search queries in Mandarin, and all audio file’s sampling rate is 16kHz, recorded by mobile phones. This dataset consists of many different conditions, such as diverse noise even low signal-to-noise, babble, dialects, accents, hesitation and so on,” they write.

Weirdly evocative slide title of the week: ‘Growing AI Muscles at Microsoft’ – seen at the Scaled Machine Learning Conference at Stanford on Saturday. Main image that jumps to mind is a bunch of arms floating in some Microsoft-branded see-through cylinders, endlessly swiping over tablets displaying the company’s blocky ‘Metro’ design language.

Today’s evidence that deep neural networks are not your omniscient savior: DNNs are unable to classify negative images, report researchers at the University of Washington in ‘Deep Neural Networks Fail To Classify Negative Images’. Any human can usually ID the key contents of an image whose colors have been reversed. The fact DNNs fail to do so is further evidence that they need additional research and development to be able to classify data as effectively as a person.

Canada seeks to retain AI crown: The Canadian government is putting $125 million towards AI research, as it seeks to maintain its pre-eminent position in AI research and education. You can throw as much money at things as you like, but it’s going to be challenging to retain a lead when talented professors and students continue to depart for foreign companies or institutions (see Geoff Hinton, Russ Salakhutdinov, a significant percentage of DeepMind, and so on.)

ImageNet is dead, long live ImageNet: ImageNet, the image recognition competition that kick-started the deep learning boom Hinton&his gang won the competition in 2012 with a deep learning based approach, is ending. The last competition will be held alongside CVPR this summer. Attendees of the associated workshop will use the time to “focus on unanswered questions and directions for the future”…
…ImageNet was a hugely important competition and dataset. It has also had the rare privilege of being the venue for not one, but two scientific lightning strikes: the 2012 deep learning result, and 2015’s debut of residual networks from Microsoft Research. Like deep learning (at the time, a bunch of stacked convnets), resnets have become a standard best-in-class tool in the AI community.
…But it is sad ImageNet is going away, as it provided a simple, handy measure for AI progress. Future candidates for measuring progression could be competitions like MS COCO, or challenges based around richer datasets, like Fei-Fei Li’s visual genome.

Andrew Ng leaves Baidu: Andrew Ng, a genial AI expert who occasionally plays down the emphasis people place on AI safety, has resigned from his position of Chief Scientist at Chinese search engine Baidu. No word on what he’ll do next. One note: Ng’s partner runs autonomous vehicle startup Drive.ai, which recently recorded a video of one of its cars cracking a notorious AI challenge by being able to drive, no human required, in the rain.

Microsoft invents drunk-chap-at-dartboard networks: Microsoft Research has published a new paper on “Deformable Convolutional Neural Networks”. This proposes a new type of basic neural network building block, called a deformable convolution.
…A ‘Deformable Convolution’ is able to sample from a broader set of spaces than traditional convolutional networks, Microsoft says. Think of a standard convolutional network as sampling from a grid of nine points on, say, an image, which are arranged together. A deformable convolution can sample from a bunch of points, but spread out in relation to one another in weirder ways. By inventing a component which can do this sort of fuzzy sampling Microsoft is able to create marginally better image recognition and object detection systems, and the underlying flexibility suggests it will make it easier for systems that classify images and other data in this way.

Neural networks – keep them secret, keep them safe: Tutorial by Andrew Trask in which he seeks to create a homorphically encrypted neural network. What’s the point? It makes it hard to steal the output of the model, and also gives the human natural control over the system by virtue of them holding the key to not only decrypt the network to other observers, but also decrypt the world to the neural network. “If the AI is homomorphically encrypted, then from it’s perspective, the entire outside world is also homomorphically encrypted. A human controls the secret key and has the option to either unlock the AI itself (releasing it on the world) or just individual predictions the AI makes (seems safer),” he writes.

DiscoGan: How often do you say to yourself – I wish I had a shoe in the style of this handbag? Often? Me too. Now researchers with SK T-Brain have a potential solution via DiscoGan, a technique introduced in Learning to Discover Cross-Domain Relations with Generative Adversarial Networks.
…Careful constraints let them teach the system to generate images of something in one domain, say a Shoe, which is visually similar to other Shoes, but retains the features of the original domain, such as a Handbag. The technique works in the following way: “When learning to generate a shoe image based on each handbag image, we force this generated image to be an image-based representation of the handbag image (and hence reconstruct the handbag image) through a reconstruction loss, and to be as close to images in the shoe domain as possible through a GAN loss”.
…the researchers demonstrate that the approach works on multiple domains, like translating the angles of a person’s face, converting the gender of someone in a picture, rotating cars, turning shoes into handbags, and so on.
…DiscoGan has already been re-implemented in PyTorch by Taehoon Kim and published on GitHub, if you want to take it for a boogie.
… see where it fails: it’s always worth looking at both the successes and failures of a given AI approach. In the case of this PyTorch reimplementation, DiscoGan appears to fail to generate good quality images when working from dense segmentation datasets

Presenting the Inaugural Import AI Award for Advancing Science via Altruistic Community Service: We should all congratulate Taehoon Kim for taking the time to re-implement so many research papers and publish the code on GitHub. Not only is this a great way to teach yourself about AI, but by making it open you help speed up the rate at which other researchers can glom onto new concepts and experiment with them. Go and check out Kim’s numerous GitHub repos and give them some stars.

OpenAI bits&pieces:

OpenAI has a new website which outlines a bit more of our mission, and is accompanied by some bright, colorful imagery. Join us as we try to move AI away from a shiny&chrome design aesthetic into something more influenced by a 3001AD-Hallmark-Card-from-The-Moon.

Evolution Strategies: Andrej Karpathy and others have produced a lengthy write-up of our recent paper on evolution strategies. Feedback request: What do you find helpful about this sort of explanatory blog? What should we write more or less about in these posts? Answers to jack@jack-clark.net, please!

Robotics: Two new robotics papers: imitation learning and transfer learning.

Tech Tales:

[2021: Branch office of a large insurance company, somewhere in the middle of America.]

“So, the system learns who is influential by studying the flow of messages across the company. This can help you automatically identify the best people to contact for a specific project, and also work out who to talk to if you need to negotiate internal systems as well,” the presenter says.
You sit in the back, running calculations in your head. “This is like automated high school,” you whisper to a colleague.
   “But who’s going to be the popular one?” they say.
Me. You think. It’ll be me. “Who knows?” you say.

Over the course of the next two weeks the system is rolled out. Now, AI monitors everything sent over the company network, cataloguing and organizing files and conversations according to perceived importance, influence, and so on. Slowly, you learn the art of the well-timed email, or the minimum viable edit to a document, and you gain influence. Messages start flowing to you. Queries are shifted in your direction. Your power, somewhat inexorably, grows.

One year passes. It’s time to upgrade the system. And now, as one of its ‘super administrators’, you’re the one giving the presentation. “This system has helped us move faster and more efficiently than our competitors,” you say, “and it has surfaced some previously unknown talent in our organization. Myself excluded!” Pause for laughter. “Now it’s time to take the next step. Over the next month we’re going to change how we do ongoing review here, and we’re going to factor in some signals from the system. We’ve heard your complaints about perceived bias in reviews, and we think this will solve that. Obviously, all decisions will be fully auditable and viewable by anyone in the organization.”

And just like that, you and the system gain the power to diminish, as well as elevate someone’s power. All it takes is the studious study of the system’s machine learning components, and the construction of a mental model in your head of where the fragile points are – the bits where a single action by you can flip a classifier from positive to negative, or shift a promotion decision in a certain direction. Once you’ve done that you’re able to gain power, manipulate things, and eventually teach the system to mimic your own views on what social structures within the company should be encouraged and what should be punished. Revolutions are silent now; revolutions are now things that can be taught and learned. Next step: train the system to mount its own insurrections.