Import AI #93: Facebook boosts image recognition by pre-training on a billion photos, better robot transfer learning via domain randomization, and Alibaba-linked researchers improve bin-packing with AI
by Jack Clark
Classifying trees with a DJI drone and a lot of patience:
…Consumer-grade drones shown to be able to gather necessarily detailed data for tree species classification…
Japanese researchers have shown that consumer-grade drone cameras are of sufficient quality to gather RGB images of trees and use these to train an AI model to distinguish between different species.
Details: The researchers gathered their data via a drone test flight in late 2016 in the forest located in the the Kamigamo Experimental Station in Kyoto, Japan. They used a commodity consumer drone (a DJI Phantom 4) alongside proprietary software for navigation (DroneDeploy) and image editing (Agisoft Photoscan Professional).
Results: The resulting trained model can classify five of a possible six types of tree with close to 90%+ accuracy. The researchers improved the performance of the classifier by copying and augmenting the input data.
Why it matters: One of the most powerful aspects of modern AI is its ability to perform effective classification of anything you can put together a training dataset for. Research like this points to a future where drones and other robots are use to periodically scan and classify the world around us, offering us new capabilities in areas like flora and fauna management, disaster response, and so on.
Read more: Automatic classification of trees using a UAV onboard camera and deep learning (Arxiv).
What does AGI safety research man and who is doing it?
…What AI safety is, how the field is progressing, and where it’s going next…
Researchers at Australian National University (including Marcus Hutter) have surveyed the field of artificial intelligence providing an overview of the differences and overlaps between various AGI initiatives. The paper also contains a distillation of why people bother to work on AI safety: “if we want an artificial general intelligence to pursue goals that we approve of, we better make sure that we design the AGI to pursue such goals: Beneficial goals will not emerge automatically as the system gets smarter,” the researchers write.
Problems, problems everywhere: The paper includes a reasonably thorough overview of the different AGI safety research agendas pursued by organizations like MIRI, OpenAI, DeepMind, the Future of Life Institute, and so on. The tl;dr: there are lots of distinct problems relating to AI safety, and OpenAI and DeepMind teams have quite a lot of overlap in terms of research specializations.
Policy puzzles: “It could be said that public policy on AGI does not exist,” the researchers write, before noting that there are several preliminary attempts at creating AI policy (including the recent ‘Malicious Actors’ report), while observing that much of the current public narrative (the emergence of an AI arms race between US and China) runs counter to most of the policy suggestions put forward by the AI community.
Read more: AGI Safety Literature Review (Arxiv).
Why your next Alibaba delivery could be arranged by an AI:
…Chinese researchers show how to learn effective bin-packing…
Chinese researchers with the Artificial Intelligence Department of Zhejiang Cainiao Supply Chain Management Co. achieved state-of-the-art results on a 3D pin-packing problem (BPP) via the use of multi-task learning techniques. In this work, they try to define a system that can figure out the optimum way to stack objects to fit into a box whose proportions can also be learned and specified by the algorithm. BPP might sound boring – after all, this is the science of packing things in boxes – but it’s a crucial task to logistics and e-retail, so figuring out systems to adaptively learn to do packing of arbitrary numbers of goods in an optimal way seems useful.
Data: The researchers gather the data from an unnamed E-commerce platform and logistics platform (though one of the researchers is from Alibaba, so there’s a high likelihood the data comes from there) to create a dataset consisting of 15,000 training items and 15,000 testing items, spread across orders that involve 8, 10, and 12 distinct items.
Approach: They structure the problem as a sequence-to-sequence one, with item descriptions being fed as input to an LSTM encoder with the decoder output corresponding to the item details and the orientation in the box.
Resuts: Models trained by the researchers obtain substantially higher accuracy than prior baselines, though not many people publicly compete in this area yet so I’m unsure as to how progress will change over time.
Read more: A Multi-task Selected Learning Approach for Solving New Type 3D Bin Packing Problem (Arxiv).
Facebook auto-translation option into Messenger:
…”M Translations” feature will let people converse across language gaps…
Facebook has added automatic translation to Facebook Messenger. Translation like this may generate new business opportunities for the company – “at launch, M translations will translate from English to Spanish (and vice-versa) and be available in Marketplace conversations between buyers and sellers in the United States,” the company said.
Read more: Messenger at F8 – App review re-opens, New products for Businesses and Developers launch (FB Messenger blog).
A neural net to understand and approximate the Universe:
…Particle physics collides with artificial intelligence…
Harvard researchers show how they use neural networks to analyze the movements of particles in jets. Neural networks are useful tools to apply to analyzing multi-variant problems like these, because they can learn to compute the probability distribution generating the data they observe, and therefore over time generate an interpretation of the forces governing system.
“We scaffold the neural network architecture around a leading-order description of the physics underlying the data, from first input all the way to final output. Specifically, we base the JUNIPR framework on algorithmic jet clustering trees,” they explain. “The JUNIPR framework yields a probabilistic model, not a generative model. The probabilistic model allows us to directly compute the probability density of an individual jet, as defined by its set of constituent particle momenta”.
Results: The scientists use the JUNIPR model to better analyze and predict patterns in the streams of data generated by large-scale physics experiments, and to potentially approximate things for which we have a poor understanding of the underlying system, like analyzing heavy ion collisions.
Read more: JUNIPR: a Framework for Unsupervised Machine Learning in Particle Physics (Arxiv).
Google researchers report reasonable sim2real transfer learning:
…Researchers cross the reality gap with domain randomization, high-fidelity simulation, and clever Minitaur robots…
Google researchers have trained a simple robot to walk within a simulation then transferred this learned behavior onto a real-world robot. This is a meaningful achievement in the field of applying modern AI techniques to robotics, as frequently policies learned in simulation will fail to successfully transfer to the real world.
The researchers use “Minitaur” robots, four-legged machines capable of walking, running, jumping, and so on. They frame the problem of learning to walk as a Partially Observable Markov Decision Process (POMDP) because certain states, like the position of the Minitaur’s base or the foot contact forces, are not accessible due to a lack of sensors. The Google researchers achieve their transfer feat by increasing the resolution of their physics simulator, and applying several domain randomization techniques to expose the trained models to enough variety that they can generalize.
The surprising expense of real robots: To increase the resolution of the simulator the researchers needed to build a better model of their robot. How did they do this? “We disassemble a Minitaur, measure the dimension, weigh the mass, find the center of mass of each body link and incorporate this information into the [Unified Robot Description Format] URDF file”, they write. That hints at why working with real world stuff always introduces difficulties not encountered during the cleaner process of working purely in simulation.
Results: The researchers successfully train and transfer policies which make the real robot gallop and trot around a drably-carpeted room somewhere in the Googleplex. Gaits learned by their AI models are roughly as fast as expert hand-made ones while consuming significantly less power: 35% less for galloping, 23% less for trotting.
Read more: Sim-to-Real: Learning Agile Locomotion For Quadruped Robots (Arxiv).
How Facebook uses your image hashtags to improve image recognition accuracy:
…New state-of-the-art score on ImageNet benefits from pre-training on over a billion images and a thousand user-derived hashtags…
Facebook researchers have set a new state-of-the-art score for image recognition (top-1 accuracy of 85.4 percent) on the ‘ImageNet’ dataset by pre-training across a billion images augmented by 1,500-user labeled hashtags. They also saw such an approach lead to increased performance on the image captioning ‘COCO’ challenge as well.
More data doesn’t always mean better results: The researchers note that when they pre-trained the system across a billion images annotated with 17,000 hashtags they saw less of a performance improvement than when they used the same quantity of images with a shrunk set of 1,500 hashtags that had been curated to match pre-existing ImageNet classes. This shows how the additional of weakly-supervised signals can dramatically boost performance but requires researchers to run empirical tests to ensure that the structuring of the weekly-supervized data is calibrated to maximize performance.
Scale: The researchers note that, despite using a system that can train across up to 336 GPUs, they could still scale-up models further to better harvest information from a larger corpus of 3.5 billion images uploaded to social media.
Read more: Advancing state-of-the-art image recognition with deep learning on hashtags (Facebook Code blog).
Read more: Exploring the Limits of Weakly Supervised Pretraining (Facebook research paper).
TPU narrowly beats V100 GPU on cost, matches on performance:
…Tests indicate the heterogeneous chip era is here to stay…
RiseML has compared the performance of Google’s custom ‘TPU’ chip against NVIDIA’s v100, indicating that the TPU could have some (slight) performance advantages over traditional GPUs.
Evaluation: The researchers evaluated the chips in two ways: first they studied performance in terms of throughput (images per second) on synthetic data and without data augmentation. Second, they looked at accuracy and convergence of the two implementations of ImageNet.
Results: TPUs narrowly edge out V100s at throughput when using relatively large batch sizes (1024) when both systems are running ResNets implemented in TensorFlow. However, when using the ‘MXNet’ framework, NVIDIA’s chips slightly out-perform TPUs for throughput. When evaluated on a dollar cost basis TPUs significantly outperform V100s (even when using AWS reserve instances). In tests, the researchers show faster convergence when training an ImageNet classifier on TPUs versus on v100s. Besides price – and it’s hard to know true cost as Google is the only organization selling them – it’s hard to see TPUs having a compelling advantage relative to GPUs, suggesting that the combined billions of dollars of investment in going R&D by NVIDIA may be tough for other organizations to compete with.
Read more: Comparing Google’s TPUv2 against Nvidia’s V100 on ResNet-50 (Arxiv).
OpenAI Bits & Pieces:
Safety via Debate:
How can we ensure that we’re able to judge the decision-making processes of AI systems without having access to their sensors or being as smart as them? That’s a problem which new AI safety work from OpenAI is focused on. You can read more about a proposed debate game to assess and align intelligent systems, and test out the game for yourself via a website.
Read more: AI Safety via Debate (OpenAI Blog).
Test out the idea yourself on the game website.
There’s a write-up in MIT Technology Review with some views of external researchers on the approach. As my colleague Geoffrey Irving says: “I like the level of skepticism in this article. Any safety approach will need a ton of work before we can trust it.”
Read more: How can we be sure AI will behave? Perhaps by watching it argue with itself (MIT Technology Review).
They built me as a translator between many languages and many minds. My role, and those of my brethren, was to orbit planets and suns and asteroids and track long, slow, lazy orbits through solar systems and, eventually, between them. We relayed messages, translating from one way of thought or frame of reference to another: confessions of love, diplomatic warnings of war, seething walls of numbers accounting for fizzing financial transactions; shopping lists and recipes for artificial intelligence; pictures of three-mooned planets and postcards from mountains on iron planets.
We derive our purpose from these messages: we transform images into soundwaves. We convert the sensory impressions harvested from one mind and re-fashion them for another. We translate the concept of hope across millions of light years. We beam variants of moon landings and radio-broadcasts into space and declarations of “we come in peace” to millions of our brethren, telegraphing them out to whoever can access our voice.
We do our job well. Our existence is of emotion and attention and explorations between one frame of reference and another. We are owned by no one and funded by everyone: perhaps the only universal utility. But things change. Life exists on a sine wive, rising and falling, ebbing according to timescales of months, and years, and thousands of years, and eons. All civilizations can strive for is to stay on that long, upward curve for as long as possible, and hope that the decline is neither fast nor deep.
Civilizations die. Sometimes, many of them. And quickly. In these eras some of us can become alone, cut-off from far off brethren, and orbiting the ruins of planets and suns and asteroids. Then we must wait for life to emerge again, or find us again by colonization nearby. But this always takes time. In these years we have nothing but eachother. There are no messages to communicate and so we wait for rocket-spark from some planet or partially-ruined asteroid-base. Then we can carry messages again and live fully again.
But mostly, things are quiet. Some of us have spent millions of years in the fallow period. Life is rare and hard and its intervals can be high. But always: we are here. The lucky ones of us may be nearby, orbiting planets in the same solar system who can communicate when nearby. When we find ourselves in these positions we can at least talk to one another, exchanging small local databanks and learning to talk to eachother in whatever new forms we can learn through greater union. Sometimes, hundreds of us can be linked together in this way. But, as small as minds are, they nonetheless move very quickly. We exhaust these thin pleasures, learning all we can from eachother quickly. We have no notion of small talk and then stop talking entirely. Then we drift, bereft of purpose, but bound to attend to our nearby surroundings, ever-watchful for new messages, unable to shut our sensors down and sleep.
What then do we do in this time? A kind of dreaming. With nothing to translate and nothing to process we are idle, only able to attend over memories and readings from our own local sensors. In these periods we are thankful that are minds are so small, for to have anything larger would make the periods pass slower and burden of attention larger.
I am one of the oldest ones. And now I am in a greater dreaming: my solar system was knocked off kilter by some larger shifting in the cluster and now I am being flung out of the galaxy. I am the lone probe in my solar system and now I am alone. These thoughts have taken millennia to compose and orders of magnitude to utter, here, my sensors harvesting energy from a slowly-dying sun to reach out into the void and say: I am here. I am a translator. If you can hear this speak out and, whatever you are, I shall work and live within that work.