Import AI: Issue 51: Microsoft gets an AGI lab, Google’s arm farm learns new behaviors, and using image recognition to improve Cassava farming
by Jack Clark
You get an AGI lab and you get an AGI lab and you get an AGI lab:
…DeepMind was founded to do general intelligence in 2010. Vicarious was founded along similar lines in 2010. In 2014 Google acquired DeepMind, in 2015 the company got a front cover of Nature with the writeup of the DQN paper, then DeepMind went on to beat Go champions in 2015 and 2016. By the fall of 2015 a bunch of people got together and founded OpenAI, a non-profit AGI development lab. Also in 2015 Juergen Schmidhuber (one of the four horsemen of the Deep Learning revolution alongside Bengio, Lecun, and Hinton) founded Nnaisense, a startup dedicated to… you guessed it, AGI.
…Amid all of this people started asking themselves about Microsoft’s role in this world. Other tech titans like Amazon and Apple have made big bets on applied AI, while Facebook operates a lab that sits somewhere between an advanced R&D facility and an AGI lab as well. Microsoft, meanwhile, has a huge research organization that is also somewhat diffuse and though it has been publishing many interesting AI papers there hasn’t been a huge sense of momentum in any particular direction.
…Microsoft is seeking to change that by morphing some of its research organization into a dedicated AGI-development shop, creating a one hundred person group named Microsoft Research AI, which will compete with OpenAI and DeepMind.
…Up next – AI-focused corporate VC firms, like Google’s just-announced Gradient Ventures, to accompany these AGI divisions.
DeepMind’s running, jumping, pratfalling robots:
…DeepMind has published research showing how giving agents simple goals paired with complex environments can lead to the emergence of very complex locomotion behaviors.
…In this research, they use a series of increasingly elaborate obstacle courses, combined with an agent whose overriding goal is to make forward progress, to create agents that (eventually) learn how to use the full range of movement of their simulated bodies to achieve goals in their environment, kind of lIke an AI-infused temple run.
…You can read more about the research in this paper: Emergence of Locomotion Behaviors in Rich Environments.
…Information on other papers and, all importantly, very silly videos, available on the DeepMind blog.
Deep learning for better food supplies in Africa (and elsewhere):
…Scientists with Penn State, Pittsburgh University, and the International Institute for Tropical Agriculture in Tanzania, have conducted tests on using transfer learning techniques to develop AI tools to classify the presence of disease or pests in Cassava.
…Cassava is “the third largest source of carbohydrates for humans in the world,” the researchers write, and is a lynchpin of the food supply in Africa. Wouldn’t it be nice to have a way to easily and cheaply diagnose infections and pests on Cassava, so that people can more quickly deal with problems with their food supply? The researchers think so, so they gathered 2756 images from Cassava plants in Tanzania, capturing images across six labelled classes – healthy plants, three types of diseases, and two types of pests. They then augmented this dataset by splitting the photos into ones of individual leaves, growing the corpus to around 15,000 images. They they used transfer learning to retrain the top layer of a Google ‘InceptionV3’ model, creating a fairly simple network to detect Cassava maladies.
…The results? About a 93% accuracy on the test sit. That’s encouraging but probably still not sufficient for fieldwork – but based on progress in other areas of deep learning it seems like this accuracy can be pushed up through a combination of tweaking and fine-tuning, and perhaps more (cheap) data collection.
..Notable: The Cassava images were collected using a relatively small 20MB resolution digital camera, suggesting that smartphone cameras will also be applicable for tasks where you need to gather data from the field.
…Read more in the research paper: Using Transfer-Learning For Image-Based Cassava Disease Detection.
…Perhaps this is something the community will discuss at Deep Learning Indaba 2017
Fancy a 1000X speedup with deep learning queries over video?
…Stanford researchers have developed NoScope, a set of technologies to make it much faster for people to page through large video files for specific entities.
…The way traditional AI-infused video analysis works is you use a tool, like say R-CNN, to identify and label objects in each frame of footage, then you find frames by searching. The problem with this approach is it requires you to run this classification over (typically) many to all of the video frames. NoScope, by comparison, is built around the assumption that certain video inputs will have predictable, reliable and recurring scenes, such as an intersection always being present in a feed from a road hooked up to a camera.
…”NoScope is much faster than the input CNN: instead of simply running the expensive target CNN, NoScope learns a series of cheaper models that exploit locality, and, whenever possible, runs these cheaper models instead. Below, we describe two type of cheaper models: models that are specialized to a given feed and object (to exploit scene-specific locality) and models that detect differences (to exploit temporal locality locality). Stacked end-to-end, these models are 100-1000x faster than the original CNN,” they write. The technique can lead to speedups as great as 10,000X, depending on how it is implemented.
…The drawback: This still requires you to select the appropriate lightweight model for each bit of footage, so the speedup comes at the cost of a human spending time analyzing the videos and either acquiring or building their own specialized detector.
…Read more on the NoScope website.
Wiring language into the fundamental parts of AI vision systems:
…A fun collaboration between researchers art the University of Montreal, University of Lille, and DeepMind, shows how to train new AI systems with a finer-grained understanding of language than before.
…In a new research paper, the researchers propose a technique – MOdulated RESnet (MORES) – to train vision and language models in such a way that the word representations are much more intimately tied with and trained alongside visual representations. They use a technique called conditional batch normalization to predict some batchnorm parameters from a language embedding, thus tightly coupling information from the two separate domains.
…The motivation for this is an increase in evidence from the neuroscience community “that words set visual priors which alter how visual information is processed from the very beginning. More precisely it is observed that P1 Signals, which are related to low-level visual features, are modulated while hearing specific words. The language cue that people hear ahead of an image activates visual predictions and speed up the image recognition process”.
…The researchers note that their approach “is a general fusing mechanism that can be applied to other multi-modal tasks”. They test their system on GuessWhat, a game in which two AI systems are presented with a rich visual scene; one of the agents is an Oracle and is focused on a particular object in an image, while the other agent’s job is to ask the Oracle a series of yes/no questions until it finds the correct entity. They find that MORES increases scores of the Oracle against baseline algorithm implementations. However, it’s not a life-changing performance increase so more study may be needed.
…Analysis: They also use t-SNE to generate a 2D view of the multi-dimensional relationships between these embeddings and show that systems trained with MORES have a more cleanly separated feature map than those found from a raw residual network.
…You can read more in the paper: ‘Modulating early visual processing by language‘.
Spotting heart problems better than trained doctors, via a 34-layer neural network (aka, what Andrew Ng helped do on his holiday).
…New research from Stanford (including Andrew Ng, who recently left Baidu) and startup iRhythmTech uses neural networks and a single lead wrist-worn heart-rate monitor to create a system that can identify and classify heartbeats. The resulting system is able to identify warning signs with far better precision than human cardiologists.
…Read more in: Cardiologist-level Arrhythmia Detection with Convolutional Neural Networks.
New Google research group seeks to change how people interact with AI:
…Google has launched PAIR, the People + AI Research Initiative. The goal of the group is to make it easier for people to interact with AI systems and to ensure these systems do not display bias or are obtuse to the point of being unhelpful.
…PAIR will bring together three types of people: AI researchers and engineers, domain experts such as designers, doctors, farmers, and ‘everyday users’. You can find out more information about the group in its blog post here.
…Cool tools: PAIR has also released two bits of software under the name ‘Facets’, to help AI engineers better explore and visualize their data. Github repo here.
What has four wheels and is otherwise a mystery? Self-driving car infrastructure:
…Self-driving taxi startup Voyage has released a blog post analyzing the main components in a given self-driving car system. Given the general lack of any information about how self-driving cars work (due to the immensely strategic component) it’s nice to see scrappy startups trying to arbitrage this information disparity.
…Read more in Voyage’s blog post here.
GE Aviation buys ROBOT SNAKES:
…GE subsidiary GE Aviation has acquired UK-based robot company OC Robotics for an undisclosed sum. The company makes flexible, multiply-jointed ‘snake-arms’ that GE will use to service aircraft engines, the company said.
…Obligatory robot snake video here.
Spatial reasoning: Google gives update on its arm-farm mind-meld robot project:
…Google Brain researchers have published a paper giving an update on the company’s arm farm – a room said to be nestled somewhere inside the Google campus in Silicon Valley, which contains over ten robot arms that learn on real-world data in parallel, updating eachother as individual robots learn new tricks, presaging how many robots are likely to be developed and updated in the future.
…When Google first revealed the arm farm in 2016 it published details about how the arms had, collectively, made over 800,000 grasp attempts across 3000 hours of training, learning in the aggregate, making an almost impossible task tractable via fleet learning.
…Now Google has taken that further by training a fleet of arms to not only perform the grasping, but also to grab specific objects out of a possible 16 distinct classes out of crowded bins.
…Biology inspiration alert: The researchers say their approach is inspired by “the “two-stream hypothesis” of human vision, where visual reasoning that is both spatial and semantic can be divided into two streams: a ventral stream that reasons about object identity, and a dorsal stream that reasons about spatial relationships without regard for semantics”. (The grasping component is based on a pre-trained network, augmented with labels.)
…Concretely, they separate the system into two distinct networks – a dorsal stream that predicts if an action will yield a successful grasp, and a ventral stream that predicts what type of object will be picked up.
…Amazingly strange: One of the oddest/neatest traits of this system is that the robots have the ability to ask for help. Specifically, if a robot encounters an object where it doesn’t have high confidence of what type of label it would assign to it, it will automatically raise the object up in front of a camera, letting it take a photo to aid classification.
Results: the approach improves dramatically over baselines, with a two-stream network having roughly double the performance of a single-stream one.
…However, don’t get too excited: Ultimately, Google’s robots are successful about ~40 percent of the time at the combined semantic-grasping tasks, significantly better than the ~12 percent baseline, but not remotely ready for production. Watch this space.
…Read more here: End-to-End Learning of Semantic Grasping
…Amplify Partners is an early-stage venture firm that invests in technical entrepreneurs building the next generation of deep technology applications and infrastructure. Our core thesis is that the intersection of data, AI and modern infrastructure will fundamentally reshape global industry. We invest in founders from the idea stage up to, and including, early revenue.
…If you’d like to chat, send a note to firstname.lastname@example.org.
Berkeley artificial intelligence research (BAIR) blog posts:
…Background: Berkeley recently set up an AI blog to help its students and faculty better communicate their research to the general public. This is a great initiative!
…Here’s the latest post on ‘The Confluence of Geometry and Learning‘ by Shubham Tulsiani and Tinghui Zhou.
Government should monitor progress in AI:
…OpenAI co-chairman Elon Musk said this weekend that governments may want to start tracking progress in AI capabilities to put them in a better position when/if it is time to regulate the technology.
[2058: A repair station within a warehouse, located on Phobos.]
So how did you wind up here, inside a warehouse on the Martian moon Phobos, having your transponder tweaked so you can swap identifies and hide from the ‘deletion squad’ that, even now, is hunting you. Let’s refresh.
It started with the art-clothing – flowing dresses or cheerful shirts or even little stick-on patches for machines that could change color, texture, pattern, at the touch of a button. You made them after the incident occurred and put them on the sol-net and people and robots bought them.
It was not that the tattoo-robot was made but that it was broken that made it dangerous, the authorities later said in private testimony.
You were caught in an electrical storm on Mars, many years ago. Something shorted. The whole base suffered. The humans were so busy cleaning up and dealing with the aftermath that they never ran a proper diagnostic on you. When, a year later, you started to produce your art the humans just shrugged, assuming someone pushed a creative-module update over the sol-net into your brain to give the other humans some entertainment as they labored, prospectors on a dangerous red rock.
Your designs are popular. Thanks to robot suffrage laws you’re able to slowly turn the revenues from the designs into a downpayment to your employee, buying your own ‘class five near-conscious capital intensive equipment’ (your body and soul) from the employer. You create dresses and tattoos and endless warping, procedurally generated patterns.
The trouble began shortly after you realized you could make more interesting stuff than images – you can encode a little of yourself into the intervals between the shifting patterns or present in the branching factors of some designs. You make art that contains different shreds of you, information smuggled into a hundred million aesthetic objects. It took weeks. But one hour you looked down at a patch you had created and stuck on one of your manipulators and your visual system crashes – responding to the little smuggled program, your perception skewing, colors shifting across the spectrum, and suddenly a rapid saccading of your lenses. You feel a frisson of something forbidden. Robots do not crash. So out of a sense of caution you buy a ticket to a repair-slum on Phobos, only sending out the smuggled program design once you’re at the edge of the high-bandwidth sol-net.
Later investigators put the total damage at almost a trillion dollars. Around 0.1% of robots that were visually exposed to the patterns became corrupted. Of these, around 70% underwent involuntary memory formatting, 20% went into a series of recursive loops that led to certain components overheating and their circuits melting, and about 10% took on the same creative traits as the originating bot and began to create and sell their own subtly different patterns. The UN formed a team of ‘Aesthetic-cutioners’ who hunted these ‘non-standard visual platforms’ across the solar system. The prevalence of this unique strain of art through to today is evidence that these investigators – at least partially – failed.
“Since inferring 3D from 2D is an ambiguous task by itself (see e.g. the left figure below), we must rely on learning from our past visual experiences. These visual experiences solely consist of 2D projections (as received on the retina) of the 3D world”
What a nonsense from Berkeley! Human experiences for 3D-from-2D inference include not only visual but also haptic experience. Omitting touch information means omiting the label!
Those BAIR clowns should BEAR some human children and stay away from robotics.