Import AI: #84: xView dataset means the planet is about to learn how to see itself, a $125 million investment in common sense AI, and SenseTime shows off TrumpObama AI face swap
by Jack Clark
Chinese AI startup SenseTime joins MIT’s ‘Intelligence Quest’ initiative:
…Funding plus politics in one neat package…
Chinese AI giant SenseTime is joining the ‘MIT Intelligence Quest’, a pan-MIT AI research and development initiative. The Chinese company specializes in facial recognition and self-driving cars and has signed strategic partnerships with large companies like Honda, Qualcomm, and others. At an AI event at MIT recently SenseTime’s founder Xiao’ou Tang gave a short speech with a couple of eyebrow-raising demonstrations to discuss the partnership. “I think together we will definitely go beyond just deep learning we will go to the uncharted territory of deep thinking,” Tang said.
Data growth: Tang said SenseTime is developing better facial recognition algorithms using larger amounts of data, saying the company in 2016 improved its facial recognition accuracy to “one over a million” using 60 million photos, then in 2017 improved that to “one over a hundred million” via a dataset of two billion photos. (That’s not a typo.)
Fake Presidents: He also gave a brief demonstration of a SenseTime synthetic video project which generatively morphed footage of President Obama speaking into President Trump speaking, and vice versa. I recorded a quick video of this demonstration which you can view on Twitter here (Video).
Read more: MIT and SenseTime announce effort to advance artificial intelligence research (MIT).
Chinese state media calls for collaboration on AI development:
…Xinhua commentary says China’s rise in AI ‘is a boon instead of a threat’…
A comment piece in Chinese state media Xinhua tries to debunk some of the cold war lingo surrounding China’s rise in AI, pushing back on accusations that Chinese AI is “copycat” and calling for more cooperation and less competition. Liu Qingfeng, iFlyTek’s CEO, told Xinhua at CES that massive data sets, algorithms and professionals are a must-have combination for AI, which “requires global cooperation” and “no company can play hegemony”, Xinhua wrote.
Read more: Commentary: AI development needs global cooperation, not China-phobia (Xinhua).
New xView dataset represents a new era of geopolitics as countries seek to automate the analysis of the world:
…US defense researchers release dataset and associated competition to push the envelop on satellite imagery analysis…
Researchers with the DoD’s Defense Innovation Unit Experimental (DIUx), DigitalGlobe, and the National Geospatial-Intelligence Agency, have released xView, a dataset and associated competition used to assess the ability for AI methods to classify overhead satellite imagery. xView includes one million distinct objects across 60 classes, spread across 1,400km2 of satellite imagery with a maximum ground sample resolution of 0.3m. The dataset is designed to test various frontiers of image recognition, including: learning efficiency, fine-grained class detection, and multiscale recognition, among others. The competition includes $100,000 of prize money, along with compute credits.
Why it matters: The earth is beginning to look at itself. As launch capabilities get cheaper via new rockets like SpaceX, Rocket Labs, etc, better hardware comes online as a consequent of further improvements in electronics, and more startups stick satellites into orbit, the amount of data available about the earth is going to grow by several orders of magnitude. If we can figure out how to analyze these datasets using AI techniques we can ultimately better respond to the changes in our planet and to marshal resources for the purposes of remediating natural disasters and, more generally, to better equip large losticis organizations like militaries to better understand the world around them and plan and act accordingly. A new era of high-information geopolitics is approaching…
I spy with my satelite eye: xView includes numerous objects with parent classes and sub-classes, such as ‘maritime vessels’ with sub-classes including sailboat and oil tanker. Other classes include fixed wing aircraft, passenger vehicles, trucks, engineering vehicles, railway vehicles, and buildings. “xView contributes a large, multi-class, multi-location dataset in the object detection and satellite imagery space, built with the benchmark capabilities of PASCAL VOC, the quality control methodologies of COCO, and the contributions of other overhead datasets in mind,” they write. Some of the most frequently covered objects in the dataset include buildings and small cars, while some of the rarest include vehicles like a reach stacker and a tractor, and vessels like an oil tanker.
Baseline results: The researchers created a classification baseline via implementing a Single Shot Multibox Detector meta-architecture (SSD) and testing it on three variants of the dataset: standard xView, multi-resolution, and multi-resolution augmented via image augmentation. The best results were found from training on the multi-resolution dataset, with accuracies climbing to as high as over 67% for cargo planes. The scores are mostly pretty underwhelming, so it’ll be interesting to see what scores people get when they apply more sophisticated deep learning-based methods to the problem.
Milspec data precision: “We achieved consistency by having all annotation performed at a single facility, following detailed guidelines, with output subject to multiple quality control checks. Workers extensively annotated image chips with bounding boxes using an open source tool,” write the authors. Other AI researchers may want to aspire to equally high standards, if they can afford it.
Read more: xView: Objects in Context in Overhead Imagery (Arxiv).
Get the dataset: xView website.
Adobe researchers try to give robots a better sense of navigation with ‘AdobeIndoorNav’ dataset:
…Plus: automating data collection with Tango phones + commodity robots…
Adobe researchers have released AdobeIndoorNav, a dataset intended to help robots navigate the real-world. The contains 3,544 distinct locations across 24 individual ‘scenes’ that a virtual robot can learn to navigate. Each scene corresponds to a real-world location and contains a 3D reconstruction via a point cloud, a 360-degree panoramic view, and front/back/left/right views from the perspective of a small ground-based robot. Combined, the dataset gives AI researchers a set of environments to develop robot navigation systems in. “The proposed setting is an intentionally simplified version of real-world robot visual navigation with neither moving obstacles nor continuous actuation,” the researchers write.
Why it matters: For real-world robotic AI systems to be more useful they’ll have to be capable of being dropped into novel locations and figuring out how to navigate themselves around to specific targets. This research shows that we’re still a long, long way away from theoretical breakthroughs that give us this capability, but does include some encouraging signs for our ability to automate the necessary data gathering process to create the datasets needed to develop baselines to evaluate new algorithms on.
Data acquisition: The researchers used a Lenovo Phab 2 Tango phone to scan each scene by hand to create a 3D point cloud, which they then automatically decomposed into a map of specific obstacles as well as a 3D map. A ‘Yujin Turtlebot 2‘ robot then uses these maps along with its onboard laser scanner, RGB-D camera, and 360 camera to navigate around the scene and take a series of high resolution 360 photos, which it then stitches into a coherent scene.
Results: The researchers prove out the dataset by creating a baseline agent capable of navigating the scene. Their A3C agent with an LSTM network learns to successfully navigate from one location in any individual scene to another location, frequently figuring out routes that involve only a couple more steps than the theoretical minimum. The researchers also show a couple of potential extensions of this technique to further improve performance, like augmentations to increase the amount of spatial information which the robot incorporates into its judgements.
Read more: The AdobeIndoorNav Dataset: Towards Deeo Reinforcement Learning based Real-world Indoor Robot Visual Navigation (Arxiv).
Allen Institute for AI gets $125 million to pursue common sense AI:
…Could an open, modern, ML-infused Cyc actually work? That’s the bet…
Symbolic AI approaches have a pretty bad rap – they were all the rage in the 80s and 90s but after lots of money invested and few major successes have since been eclipsed by deep learning-based AI approaches. The main project of note in this area is Doug Lenat’s Cyc which has, much like fusion power, been just a few years away from a major breakthrough for… three decades. But that doesn’t mean symbolic approaches are worthless, they might just be a bit underexplored and in need of revitalization – many people tell me that symbolic systems are being used all the time today but they’re frequently proprietary or secret (aka, military) in nature. But, still, evidence is scant. So it’s interesting that Paul Allen (formerly co-founder of Microsoft) is investing $125 million over three years into his Allen Institute for Artificial Intelligence to launch Project Alexandria, an initiative that seeks to create a knowledge base that fuses machine reading and language and vision projects with human-annotated ‘common sense’ statements.
Benchmarks: “This is a very ambitious long-term research project. In fact, what we’re starting with is just building a benchmark so we can assess progress on this front empirically,” said AI2’s CEO Oren Etzioni in an interview with GeekWire. “To go to systems that are less brittle and more robust, but also just broader, we do need this background knowledge, this common-sense knowledge.”
Read more: Allen Institute for Artificial Intelligence to Pursue Common Sense for AI (Paul Allen.)
Read more: Project Alexandria (AI2).
Read more: Oren Etzioni interview (GeekWire).
Russian researchers use deep learning to diagnose fire damage from satellite imagery:
…Simple technique highlights generality of AI tools and implications of having more readily available satellite imagery for disaster response…
Researchers with the Skolkovo institute of Science and technology in Moscow have published details on how they applied machine learning techniques to automate the analysis of satellite images of the Californian wildfires of 2017. The researchers use DigitalGlobe satellite imagery of Ventura and Santa Rosa countries before and after the fires swept through to create a dataset of pictures containing around 1,000 buildings (760 non-damaged ones and 320 burned ones), then used a pre-trained ImageNet network (with subsequent finetuning) to learn to classify burned versus non-burned buildings with an accuracy of around 80% to 85%.
Why it matters: Stuff like this is interesting mostly because of hte implicit time savings, where once you have annotated a dataset it is relatively easy to train new models to improve classification in line with new techniques. The other component necessary for techniques like this to be useful will be the availability of more frequently updated satellite imagery, but there are startups working in this space already like Planet Labs and others, so that seems fairly likely.
Read more: Satellite imagery analysis for operational damage assessment in Emergency situations (Arxiv).
Google researchers figure out weird trick to improve recurrent neural network long-term dependency performance:
…Auxiliary losses + RNNs make for better performance…
Memory is a troublesome thing with neural networks, and figuring out how to give networks a better representative capacity has been a long-standing problem in the field. Now, researchers with Google have proposed a relatively simple tweak to recurrent neural networks that lets them model longer-time dependencies, potentially opening RNNs up to working on problems that require a bigger memory. The technique involves augmenting RNNs with an unsupervised auxiliary loss that either tries to model relationships somewhere through the network, or project forward over a relatively short distance, and in doing so lets the RNN learn to represent finer-grained structures over longer timescales. Now we need to figure out what those problems are and evaluate the systems further.
Evaluation: Long time-scale problems are still in their chicken and egg phase, where it’s difficult to figure out the appropriate methods we can use to test them. One approach is pixel-by-pixel image prediction, which is where you feed each individual pixel into a long-term system – in this case an RNN augmented by the proposed technique – and see how effectively it can learn to classify the image. The idea here is that if it’s reasonably good at classifying the image then it is able to learn high-level patterns from the pixels which have been fed into it, which suggests that it is remembering something useful. The researchers test their approach on images ranging in pixel length from 784 to 1024 (CIFAR-10) all the way up to around ~16,000 (via the ‘StanfordDogs’ dataset).
Read more: Learning Longer-term Dependencies in RNNs with Auxiliary Losses (Arxiv).
Alibaba applies reinforcement learning to optimizing online advertising:
…Games and robots are cool, but the rarest research papers are the ones that deal with actual systems that make money today…
Chinese e-commerce and AI giant Alibaba has published details on a reinforcement learning technique that, it says, can further optimize adverts in sponsored search real-time bidding auctions. The algorithm, M-RMDP (Massive-agent Reinforcement Learning with robust Markov Decision Process), improves ad performance and lowers the potential price per ad for advertisers, providing an empirical validation that RL could be applied to highly tuned, rule-based heuristic systems like those found in much of online advertising. Notably, Google has published very few papers on this area, suggesting Alibaba may be publishing in this strategic area because a) it believes it is still behind Google and others in this area and b) by publishing it may be able to tempt over researchers who wish to work with it. M-RMDP’s main contribution is being able to model the transitions between different auction states as demand waxes and wanes through the day, the researchers say.
Method and scale: Alibaba says it designed the system to deal with what it calls the “massive-agent problem”, which is figuring out a reinforcement learning method that can handle “thousands or millions of agents”. For the experiments in the paper it deployed its training infrastructure onto 1,000 CPUs and 40 GPUs.
Results: The company picked 1000 ads from the Alibaba search auction platform and collect two days worth of data for training and testing. It tested the effectiveness of its system by simulating reactions within its test set. Once it had used this offline evaluation to prove out the provisional effectiveness of its approach it carried out an online test and find that their M-RMDP approach substantially improves the return on investment for advertisers in terms of ad effectiveness, while marginally reducing the PPC cost, saving them money.
Why it matters: Finding examples of reinforcement learning being used for practical, money-making tasks is typically difficult; many of the technology’s most memorable or famous results involve mastering various video games or board games or, more recently, controlling robots performing fairly simple tasks. So it’s a nice change to have a paper that involves deep reinforcement learning doing something specific and practical: learning to bid on online auctions.
Read more: Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Arxiv).
OpenAI Bits & Pieces:
Improving robotics research with new environments, algorithms, and research ideas:
…Fetch models! Shadow Hands! HER Baselines! Oh my!…
We’ve released a set of tools to help people conduct research on robots, including new simulated robot models, a baseline implementation of the Hindsight Experience Replay algorithm, and a set of research ideas for HER.
Read more: Ingredients for Robotics Research (OpenAI blog).
Play X Time.
It started with a mobius strip and it never really ended: after many iterations the new edition of the software, ToyMaker V1.0, was installed in the ‘Kidz Garden’ – an upper class private school/playpen for precocious children ages 2 to 4 – on the 4th of June 2022, and it was a hit immediately. Sure, the kids had seen 3D printers before – many of them had them in their homes, usually the product of a mid-life crisis of one of their rich parents; usually a man, usually a finance programmer, usually struggling against the vagaries of their own job and seeking to create something real and verifiable. So the kids weren’t too surprised when ToyMaker began its first print. The point when it became fascinating to them was after the print finished and the teacher snapped off the freshly printed mobius strip and handed it to one of the children who promptly sneezed and rubbed the snot over its surface – at that moment one of the large security cameras mounted on top of the printer turned to watch the child. A couple of the others kids noticed and pointed and hten tugged at the sleeve of the snot kid who looked up at the camera which looked back at them. They held up the mobius strip and the camera followed it, then they pulled it back down towards them and the camera followed that too. They passed the mobius strip to another kid who promptly tried to climb on it, and the camera followed this and then the camera followed the teacher as they picked up the strip and chastised the children. A few minutes later the children were standing in front of the machine dutifully passing the mobius strip between eachother and laughing as the camera followed it from kid to kid to kid.
“What’s it doing?” one of them said.
“I’m not sure,” said the teacher, “I think it’s learning.”
And it was: the camera fed into the sensor system for the ToyMaker software, which treated these inputs as an unsupervised auxiliary loss, which would condition the future objects it printed and how it made them. At night when the kids went home to their nice, protected flats and ate expensive, fiddly food with their parents, the machine would simulate the classroom and different perturbations of objects and different potential reactions of children. It wasn’t alone: ToyMaker 1.0 software was installed on approximately a thousand other printers spread across the country in other expensive daycares and private schools, and so as each day passed they collectively learned to try to make different objects, forever monitoring the reactions of the children, growing more adept at satisfying them via a loss function which was learned, with the aid of numerous auxiliary losses, through interaction.
So the next day in the Kidz Garden the machine printed out a Mobius Strip that now had numerous spindly-yet-strong supports linking its sides together, letting the children climb on it.
The day after that it printed intertwined ones; two low-dimensional slices, linked together but separate, and also climbable.
Next: the strips had little gears embedded in them which the children could run their hands over and play with.
Next: the gears conditioned the proportions of some aspects of the strip, allowing the children to manipulate dimensional properties with the spin of various clever wheels.
And so it went like this and is still going, though as the printing technologies have grown better, and the materials more complex, the angular forms being made by these devices have become sufficiently hard to explain that words do not suffice: you need to be a child, interacting with them with your hands, and learning the art of interplay with a silent maker that watches you with electronic eyes and, sometimes – you think when you are going to sleep – nods its camera head when you snot on the edges, or laugh at a surprising gear.
Technologies that inspired this story: Fleet learning, auxiliary losses, meta-learning, CCTV cameras, curiosity, 3D printing.