Import AI: Issue 49: Studying the crude psychology of neural networks, Andrew Ng’s next move, and teaching robots to grasp with DexNet2

by Jack Clark

Interdisciplinary AI: Unifying human and machine thought through psychological studies of deep neural nets:
…a paper from DeepMind sees the company’s researchers probe neural networks (specifically, ‘Inception’ and Matching Networks) for biases.
….They discover that when they present these image classification networks with new, never-before-seen images, the networks have a tendency to apply the label for the new image to other similarly shaped images, in preference to ones with similar color, texture, or size. This is the same phenomenon that psychologists observe in humans.
…but what does it mean? Mostly it’s an encouraging sign that these sorts of techniques that we’ve used to analyze humans have something to offer when analyzing neural networks, paving the ground for future studies. “As a good cognitive model should, our DNNs make testable predictions about word-learning in humans. Specifically, the current results predict that the shape bias should vary across subjects as well as within a subject over the course of development. They also predict that for humans with adult-level one-shot word learning abilities, there should be no correlation between shape bias magnitude and one-shot-word learning capability,” the authors write.
Read more in the paper ‘Cognitive Psychology for Deep Neural Networks’.

Neural net libraries for everyone! Sony steps up.
…As companies base more of their long-term corporate strategy around being leaders in AI some are seeking to create the essential (software) picks and shovels to be used by other AI developers. Currently, Google languages TensorFlow and Keras are becoming popular among developers, while Amazon (MXNet), Microsoft (CNTK) and Facebook (PyTorch) are all seeking to gain some developer enthusiasm with their own languages.
…It’s already a busy ecosystem and now it’s getting busier as companies like Samsung and, now, Sony, design their own frameworks and supporting tools. Much like how smartphones have solidified around a few basic apps (WeChat / WhatsApp / FB / Google / YouTube / etc), it seems likely developers will hone in on a few choice AI frameworks/languages. The question is whether it’s too late to start another language. One thing’s for sure – Sony’s decision to give its framework the generic, un-googlable name  ‘Neural Network Libraries’ is unlikely to help.

And the award for most obvious name for a startup goes to… Andrew Ng!
…Andrew Ng, a former AI whiz at Baidu, Google, and Stanford, has finally revealed the name of his new startup: Creative name. I’ve heard some rumors it relates to education, but that could just be people making assumptions based on Ng’s history with Coursera.

McDonald’s plots mass automation via 2,500 robot kiosks:
Fast food company McDonald’s plans to upgrade 2,500 restaurants this year to be ones that include a robot kiosk, automating the ordering process for people. McDonalds says this lets its staff concentrate on providing better service and notes that locations which already host an automated kiosk have better sales than those that don’t. My intuition is this could become another ‘ATM example’ regarding automation, where McDonald’s will continue to grow aggregate employment long after the introduction of the automation technology (in this case, the kiosk.) However, rolling this out may serve to put a ceiling on (human) wages.

Silicon Valley TV goes all in on AI:
…Tim Anglade of HBO’s Silicon Valley has written up a few of the technical details behind the show’s Not HotDog app, which uses the combined might of the smartphone and AI ecosystem – representing literally billions of dollars of investment to date – to produce software that tells you if your phone is looking at a hotdog or not. We live in amazing times. I think that the emergence of jokey or playful applications of a technology is usually a sign of its broader maturation and adoption, so the arrival of this app seems to herald good things for AI.
…One observation made by app developer TIm Anglade is that the modern AI ecosystem moves so quickly it’s unlike other technical communities. “With less than a month to go before the app had to launch we endeavored to reproduce the paper’s results. This was entirely anticlimactic as within a day of the paper being published a Keras implementation was already offered publicly on GitHub by Refik Can Malli, a student at Istanbul Technical University, whose work we had already benefited from when we took inspiration from his excellent Keras SqueezeNet implementation. The depth & openness of the deep learning community, and the presence of talented minds like R.C. is what makes deep learning viable for applications today — but they also make working in this field more thrilling than any tech trend we’ve been involved with,” he writes.
…I exchanged a few emails with Tim about the project. He notes that data collection was a tricky part of the project and – sorry readers – he didn’t stumble across any magical way to ease this process. “Honestly there was just a lot of manual download of images, or checking images I already had (such as my own vacation/food pictures). It took days upon days,” he writes. “In that respect I think Dinesh’s experience in the show staring at “penile imagery” for days on end quite accurately reflects my plight for much of the project.”
…Tim says (emphasis mine) his experience developing the app has led him to fall in love with the AI community. He’s now busily working away on some other future projects. “I think A.I, can have an otherworldly sort of quality, where it both seems to good to be true, but it’s also flawed in a way that can be charming, disarming — or just plain human,” he says.

Who said what and when and to whom? State-of-the-art results on semantic role labeling:
…Researchers with the University of Washington, Facebook, and the Allen Institute for artificial intelligence have come up with a system that gets state of the art results on semantic role labeling, a natural language processing task that challenges AI systems to “recover the predicate-argument structure of a sentence to determine essentially ‘who did what to whom’, ‘when’, and ‘where’?
…One thing that’s notable about this system is there isn’t a single killer idea, instead it uses a collection of best practices and new components like highway nets and recurrent dropout which were developed originally by other researchers for other purposes.
…Results: State of the art results on the CoNLL 2005 dataset across recall, precision, and other measures. Similarly good results on the CoNLL 2012 dataset.
…Components used: Highway connections, recurrent dropout
A nice surprise: My intuition is that scientists within AI are starting to spend more time in their papers analyzing the precise ways in which systems fail. This paper is a good example of this encouraging trend, containing an extensive ablation study where they strip out different components of the network in an attempt to better figure out which parts contribute which elements to its learning. More of this, please!
You can read more in Deep Semantic Role Labeling: What Works and What’s Next.

Multi-Modal Driving:
Waymo’s cars are outfitted with microphones to let them hear the sirens of emergency vehicles, helping them learn when to pull over safely.
…Elsewhere, Volvo’s own self-driving cars can identify deer, elk, caribou, but have a hard time responding to Kangaroos due to their idiosyncratic bouncy gait.

Where we are with AI development, with Fei-Fei Li.
…”We’re entering a new phase but there is a long way to go”, said Google/Stanford’s Fei-Fei Li about the current state of AI research at the ACM Turing Awards last month, before paraphrasing British Prime Minister Winston Churchill to note that AI development is not at the beginning of the end, but rather at the end of the beginning.
…Afterwards, I caught up with Fei-Fei briefly and asked her what kind of metric might supersede ImageNet for measuring the effectiveness of visual classifiers (ImageNet is being retired after this year’s competition as we’ve started to over-fit the dataset). She suggested that the vision community is going in a number of different directions and we may be entering a period where there isn’t a single, simple metric we can pick. Fei-Fei was very clear that “vision is not solved” and instead there are numerous datasets out there – some of varying levels of complexity and some at the limit or beyond of current techniques like VQA – that could be good candidates for the next phase of measurement.
…This maps to my own understanding of the space – instead of simply measuring the ability to pick an object out of a photo we’re now moving onto the harder (and potentially more fruitful) problems of labeling, segmentation, disentanglement, inference about relationships, and so on.

Reach out and touch shapes: UC Berkeley researchers release Dex-Net 2.0
…How can we teach computers to easily grasp objects, even novel ones? That’s a question researchers have been grappling with for decades. Recently, some groups have turned to neural networks as an answer, trying to give computers the ability to approximate the specific function to grip a specific thing. Google has experimented with fleet learning robots picking up and grasping real world things to do this, letting them learn in an unsupervised way how to pick up and put down objects.
…UC Berkeley has its own (supervised) spin on it. Last week the UC Berkeley AUTOLAB released Dex-Net 2.0, a 6.7 million object-large dataset to help researchers teach computers how to get a grip on reality.
…”The key to Dex-Net 2.0 is a hybrid approach to machine learning Jeff Mahler and I developed that combines physics with Deep Learning. It combines a large dataset of 3D object shapes, a physics-based model of grasp mechanics, and sampling statistics to generate 6.7 million training examples, and then using a Deep Learning network to learn a function that can rapidly find robust grasps when given a 3D sensor point cloud. It’s trained on a very large set of examples of robust grasps, similar to recent results in computer vision and speech recognition,” says UC Berkeley professor Ken Goldberg.
Find out more about Dex-Net 2.0 (and its predecessor) on the official project page.

Competition grows in machine translation:
Amazon Web Services plans to soon launch a machine translation service, according to CNBC. This aligns with some of Amazon’s recent research requests including robust, distributed translation systems that can learn from small amount of user feedback.
.Amazon’s service will sit alongside similar offerings from Microsoft, Google and IBM. AI seems like the next technology around which cloud providers will compete as they seek to offer increasingly higher-order abstractions and services on top of their world-spanning fleets of computers.

The Geography of AI will be defined by regulation – or the lack of it:
Fun article in BusinessWeek about Starsky Robotics, a company that employs blue collar truck drivers and elite AI coders who work together to create automated trucks that drive on highway, which are then remotely piloted around towns by traditional drivers working from remote operations centers.
…Self-driving will be defined partially by where it gets developed, so it’s of note that some states, such as Florida, have taken particularly permissive and loose approaches to regulation in this area, while others including California have been somewhat harsher.
The Geography of the world will be defined by AI – or the lack of it:
…One interesting tidbit in the article is the idea that, if the company is successful, it could prompt the creation of “climate-controlled “driver centers,” in towns like Jacksonville, where people like Runions will work regular shifts in front of computers, without the greasy food or loneliness that has traditionally gone along with being a trucker.”
…Which begs the question – what happens to the vcast exurban ecosystem of truckstops, drive-thrus, and so on that cater to drivers? How will cities change in response to providing services for these stay-at-console truckers, and how will small towns whose economies are built around being on trucking routes fare in this new world?
…”I can tell the difference between a dead porcupine and a dead raccoon, and I know I can hit a raccoon, but if I hit a porcupine, I’m going to lose all the tires on the truck on that side,” says Tom George, a veteran driver who now trains other Teamsters for the union’s Washington-Idaho AGC Training Trust. “It will take a long time and a lot of software to program that competence into a computer.”

OpenAI Bits & Pieces:

Free tools: mujoco-py, an open source Python library to make it easier to simulate and experiment with the (proprietary, license required) MuJoCo physics engine. Bonus: psychedelic robot gif!

Tech Tales:

[ 2024: An Internet cafe in South East Asia. ]

No, you say, watching the price of $BRAIN_COIN plummet from highs down to crushing lows. No no no no no. Rumors of your death swirling on the internet. Fake news about a hijacking. Videos of regulators saying that your currency is under investigation, that the treasury department has a warrant out for your arrest, that George Soros has reversed his position on the cryptocurrency and is liquidating assets. No, no, no, you say, until someone sitting next to you in the cybercafe shushes you, unaware you just went from being a billionaire to a several-hundred millionaire.

All fake, of course. Propaganda dreamed up by the (sometimes automated) marketing departments behind other currencies seeking to sow doubt and confusion, creating enough questions to make people suspicious and thereby manipulate the price of the currencies. The question is how to fight back? How can you send information out into the world that people will actually believe.

And the whole time the currency, your baby; digital scrip designed to form the bedrock of a marketplace between AIs, trading the currency with eachother in exchange for influence, is being rocked to and fro by waves of automated propaganda, dreamed up and sent out by bots around the world. You record a video of yourself holding up a copy of today’s newspaper, having scrawled a long string of numbers on its front that come from the currency. People don’t believe you. “Oh this can very easily be faked,” writes some internet denizen. “Has all the hallmarks of a synthjob – the slight wetness around the eyes, the blur on some of the zoomed in skin pores, the folds on the newspaper. Ridiculous they think we’d believe this.”

So to truly verify yourself you must pair off with a livestreamer: someone who had sufficient fame to have a real audience that, when their audience sees you hanging out with the celeb in real life, will enthusiastically photograph and write about the encounter for their own follows – proof by association. So that’s how you end up walking down the touristy part of Bangkok with a flavor-of-the-month e-celeb,posing for photos taken by numerous fans, all providing an expanding galaxy of hard-to-fake coincidental evidence that you are truly alive. You even forgive the celebrity for mispronouncing the name of the currency, twice, as BRANCOIN and BRAINDOLLAR, while testifying to its merits.

After a day or so the price of the currency recovers, despite conspiracy theories to the contrary that the e-celeb and their fans are fake as well. How long till then, you wonder. How long till even this isn’t enough?

Technologies that inspired this story: generative adversarial networks, synthetic text/speech/vision, social media, Vitaly Buterin (Ethereum)

Monthly Sponsor:
Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to