Import AI: #104: Using AirBNB to generate data for robots; Google trains AI to beat humans at lip-reading; and NIH releases massive ‘DeepLesion’ CT dataset
by Jack Clark
Rosie the Robot takes a step closer with new CMU robotics research:
…What’s the best way to gather a new robotics research dataset – AirBNB?!…
Carnegie Mellon researchers have done the robotics research equivalent of ‘having cake and eating it too; – they have created a new dataset to evaluate generalization within robotics, and have successfully built low-cost robotics which have been able to show meaningful performance on the dataset. The motivation for the research is that most robotics datasets are specific to highly-controlled lab environments, and instead it’s worth exploring generating and gathering data from more real world locations (in this case, homes rented on AirBNB), then see if it’s possible to develop a system that can learn to grasp objects within these datasets, and see if the use of these datasets improves generalization relative to other techniques.
How it works: The approach has three key components: a Grasp Prediction Network (GPN) which takes in pixel imagery and tries to predict the correct grasp to take (and which is fine-tuned from a pretrained ResNet-18 model); a Noise Modelling Network (NMN) which tries to estimate the latent noise based on the image of the scene and information from the robot; and a marginalization layer which helps combine the two data streams to predict the best grasp to use.
The robot: They use a Dobot Magician robotic arm with five degrees of freedom, customized with a two axis wrist with electric gripper, and mounted on a Kobuki mobile base. For sensing, they re-quip it with an Intel R200 RGB camera with a pan-tilt attachment positioned 1m above the ground. The robot’s onboard processor is a laptop with an i5-8250U CPU with 8GB of RAM. Each of these robots costs about $3,000 – far less than the $20k+ prices for most other robots.
Data gathering: To gather data for the robots the researchers used six different properties from AirBNB. They then deployed the robot in this home, used a low-cost ‘YOLO’ model to generate bounding boxes around objects near the robot, then let the robot’s GPN and NMN work together to help it predict how to grasp objects. They collect about 28,000 grasps in this manner.
Results: The researchers try to evaluate their new dataset (which they call Home-LCA) as well as their new ‘Robust-Grasp’ two-part GPN&NMN network architecture. First, they examine the test accuracy of their Robus-Grasp network trained on the Home-LCA dataset and applied to other home environments, as well as two datasets which have been collected in traditional lab settings (Lab-Baxter and Lab-LCA). The results here are very encouraging as their approach seems to generalize better to the lab datasets than other approaches, suggesting that the Home-LCA dataset is rich enough to create policies which can generalize somewhat.
They also test their approach on deployed physical environments in unseen home environments (three novel AirBNBs). The results show that Home-LCA does substantially better than Lab-derived datasets, showing performance of around 60% accuracy, compared to between 20% and 30% for other approaches – convincing results.
Why it matters: Most robotics research suffers from one of two things: 1) either the robot is being trained and tested entirely in simulation, so it’s hard to trust the results. 2) the robot is being evaluated on such a constricted task that it’s hard to get a sense for whether algorithmic progress leading to improved task performance will generalize to other tasks. This paper neatly deals with both of those problems by situating the task and robot in reality, collecting real data, and also evaluating generalization. It also provides further evidence that robot component costs are falling while network performance is improving sufficiently for academic researchers to conduct large-scale real world robotic trials and development, which will no doubt further accelerate progress in this domain.
Read more: Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias (Arxiv).
Learning to navigate over a kilometer of paths, with generalization:
…Bonus: Dataset augmentation techniques and experimental methodology increase confidence in result…
QUT and DeepMind researchers have successfully trained a robot to learn to navigate over two kilometers of real-world paths connected up to one another by 2,099 distinct nodes. The approach shows that it’s possible to learn sufficiently robust policies in simulation to be subsequently transferred to the real world, and the researchers validate their system by testing it on real world data.
The method: “We propose to train a graph-navigation agent on data obtained from a single coverage traversal of its operational environment, and deploy the learned policy in a continuous environment on the real robot,” the researchers write. They create a map of a given location, framed as a graph with points and connections between them, gathering 360-degree images from an omnidirectional camera to populate each point on the graph and, in addition, gathering the data lying between each point. “This amounts to approximately 30 extra viewpoints per discrete location, given our 15-Hz camera on a robot moving at 0.5 meters per second,” they write. They then use this data to augment the main navigation task. They also introduce techniques to randomize – in a disciplined manner – the brightness of gathered images, which lets them create more synthetic data and better defend against robots trained with the system overfitting to specific patterns of light. They then use curriculum learning to train a simulated agent using A3C to learn to navigate between successively farther apart points of the (simulated) graph. These agents themselves use image recognition systems pre-trained on the Places365 dataset and finetuned on the gathered data.
Results: The researchers test their system by deploying it on a real erobot (a Pioneer 3DX) and ask it to navigate to specific areas of the campus. There are a couple of reasons to really like this evaluation approach: one) they’re testing it in reality rather than a simulator, so the results are more trustworthy, and 2) they test on the real robot three weeks after collecting the initial data, allowing for significant intermediary changes in things like the angle of the sun at given times of day, the density of people, placement of furniture, and other things that typically confound robots. They test their system against an ‘oracle’ (aka, perfect) route, as well as what was learned during training in the simulator. The results show that their technique successfully generalizes to reality, navigating successfully to defined locations on ten out of eleven tries, but at a significant cost: on average, routes come up with in reality are on the order of 2.42X more complex than optimal routes.
Why it matters: Robots are likely one of the huge markets that will be further expanded and influenced by continued development of AI technology. What this result indicates is that existing, basic algorithms (like A3C), combined with well-understood data collection techniques, are already sufficiently powerful to let us develop proof-of-concept robot demonstrations. The next stage will be learning to traverse far larger areas while reducing the ‘reality penalty’ seen here of selected routes not being as efficient as optimal ones.
Read more: Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal (Arxiv).
Watch videos: Deployable Navigation Policies.
Why better measurements can lead to better robot research:
…New tasks, best practices, and datasets to evaluate smart robot agents…
An interdisciplinary team of researchers from universities and companies has written about the many problems inherent to contemporary robotic agent research and have issued a set of recommendations (along with the release of some specific testing environments) meant to bring greater standardization to robotics research. This matters because standardization on certain tasks, benchmarks, and techniques has led to significant progress in other areas of AI research – standardization on ‘ImageNet’ helped generate enough research to show the viability of deep learning architectures for hard supervised learning problems, and more recently OpenAI’s ‘OpenAI Gym’ helped to standardize some of the experimental techniques for reinforcement learning research. But robotics has remained stubbornly idiosyncratic, even when researchers report results in simulators. “Convergence to common task definitions and evaluation protocols catalyzed dramatic progress in computer vision. We hope that the presented recommendations will contribute to similar momentum in navigation research,” the authors write.
A taxonomy of task-types: Navigation tasks can be grouped into three basic categories: PointGoal (navigate to a specific location); ObjectGoal (navigate to an object of a specific category, eg a ‘refrigerator’); and AreaGoal (agent but must navigate to an area of a specific category, eg a kitchen). The first category requires coordinates while the latter two require certain the robot to assign labels to the world around it.
Specific tasks can be further distinguished by analyzing the extent of the agent’s exposure to the test environment, prior to evaluation. These different levels of exposure can roughly be characterized as: No prior exploration; pre-recorded prior exploration (eg, supplied with a trajectory through the space); and time-limited exploration by the agent (explores for a certain distance before being tested on the evaluation task).
Evaluation: Add in ‘DONE’ which agent signals when it completes an episode – this lets the agent characterize runs where it believes it has completed the task, giving scientists an additional bit of information to use when evaluating what the agent did to achieve that task. This differs to other methodologies which can simply end the evaluation episode when the agent reaches the goal, which doesn’t require the agent to indicate that it knows it has finished the task.
Avoid using Euclidean measurements to determine the proximity of the goal, as this might reward the agent for placing itself near the object despite being separated from it by a wall. Instead, scientists might consider measuring the shortest-path distance in the environment to the goal, and evaluating on that.
Success weight by (normalized inverse) Path Length (SPL): Assess performance by using the agent’s ‘DONE’ signal each test episode and path length, then calculate the average score for how close-to-optimal the agent’s paths were across all episodes (so, if an agent was successful on 8 runs out of ten, and each of the successful runs was 50% greater than the optimal path distance, its SPL for the full evaluation would be 0.4). “Note that SPL is a rather stringent measure. When evaluation is conducted in reasonably complex environments that have not been seen before, we expect an SPL of 0.5 to be a good level of navigation performance,” the researchers explain.
Simulators: Use continuous state spaces, as that better approximates the real world conditions agents will be deployed into. Also, simulators should standardize reporting distances as SI Units, eg “Distance 1 in a simulator should correspond to 1 meter”.
Publish full details (and customizations) of simulators, and ideally release the code as open source. This will make it easier to replicate different simulated tasks with a high level of accuracy (whereas comparing robotics results on different physical setups tends to introduce a huge amount of noise, making disciplined measurement difficult). “This customizability comes with responsibility”, they note.
Standard Scenarios: The authors have also published a set of “standard scenarios” by curating specific data and challenges from contemporary environment datasets SUNCG, AI2-THOR, Matterport3D, and Gibson. These tasks closely follow the recommendations made elsewhere in the report and, if adopted, will bring more standardization to robotic research.
Read more: On Evaluation of Embodied Navigation Agents (Arxiv).
Read more: Navigation Benchmark Scenarios (GitHub).
I can see what you’re saying – DeepMind and Google create record-breaking lip-reading system:
…Lipreading network has potential (discussed) applications for people with speech impairment and potential (undiscussed) applications for surveillance…
DeepMind and Google researchers have created a lipreading speech recognition system with a lower word error rate than professional humans, and which is able to use a far larger vocabulary (127,055 terms versus 17,428 terms) than other approaches. To develop this system they created a new speech recognition dataset consisting of 3,886 hours of speaking of faces saying particular phoneme sequences.
How it works: The system relies on “Vision to Phoneme (V2P)”, a network trained to produce a sequence of phoneme distributions given a sequence of video frames. They also implement V2P-Sync, a model that verifies the audio and video channels are aligned (and therefore prevents the creation of bad data, which would lead to poor model performance). V2P uses a 3D convolutional model to extract features from a given video clip and aggregate them over time via a temporal module. They implement their system as a very large model which is trained in a distributed manner.
Results: The researchers tested their approach on a held-out test-set containing 37 minutes of footage, across 63,000 video frames and 7100 words. They found that their system significantly outperforms people. “This entire lipreading system results in an unprecedented WER of 40.9% as measured on a held-out set from our dataset,” they write. “In comparison, professional lipreaders achieve either 86.4% or 92.9% WER on the same dataset, depending on the amount of context given.”
Motivation: The researchers say the motivation for the work is to provide help for people with speech impairments. They don’t discuss the obvious surveillance implications of this research anywhere in the paper, which seems like a missed opportunity .
Why it matters: This paper is another example of how, with deep learning techniques, if you can access enough data and compute then many problems become trivial – even ones that seem to require a lot of understanding and ‘human context’, like lipreading. Another implication here is that many tasks that we suspect are not that well suited to AI may in fact be more appropriate than we assume.
Read more: Large-Scale Visual Speech Recognition (Arxiv).
Researchers fuse hyperparameter search with neural architecture search:
…Joint optimization lets them explore architectural choices and hyperparameters at the same time…
German researchers have shown how researchers can jointly optimize the hyperparameters of a model while searching through different architectures. This takes one established thing within machine learning (finding the right combination of hyperparameters to maximize performance against cost) and combines it with a newer area that has received lots of recent interest (using reinforcement learning and other approaches to optimize the architecture of the neural network, as well as its hyperparameters). “We argue that most NAS search spaces can be written as hyperparameter optimization search spaces (using the standard concepts of categorical and conditional hyperparameters),” they write.
Results: They test their approach by training a multiple-brand ResNet architecture on CIFAR-10 while exploring a combination of ten architectural choices and seven hyperparameter choices. They limited training time to a maximum of three hours for each sampled configuration and performed 256 of these full-length runs (amounting to about 32 GPU days of constant training). They discover that the relationship between hyperparameters, architecture choices, and trained model performance, is more subtle than anticipated, indicating that there’s value in training these jointly.
Why it matters: As computers get faster it’s going to be increasingly sensible to offload as much of the design and optimization of a given neural network architecture as possible to the computer – further development in fields of automatic model optimization will spur progress here.
Read more: Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search (Arxiv).
NIH releases ‘DeepLesion’ dataset to aid medical researchers:
…Bonus: the data is available immediately online, no sign-up required…
The National Institute of Health has released ‘DeepLesion’, a set of 32,000 CT images with annotated lesions, giving medical machine learning researchers a significant data resource to use to develop AI systems. The images are from 4,400 unique individuals.and have been heavily annotated with bookmarks around the lesions.
The NIH says it hopes researchers will use the dataset to help them “develop a universal lesion detector that will help radiologists find all types of lesions. It may open the possibility to serve as an initial screening tool and send its detection results to other specialist systems trained on certain types of lesions”.
Why it matters: Data is critically important for many applied AI applications and, at least in the realm of medical data, simulating additional data is fraught with dangers, so the value of primary data taken from human sources is very high. Resources like those released by the NIH can help scientists experiment with more data and thereby further develop their AI techniques.
Read more: NIH Clinical Center releases dataset of 32,000 CT images (NIH).
Get the data here: NIH Clinical Center (via storage provider Box).
AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: firstname.lastname@example.org…
AI leaders sign pledge opposing autonomous weapons:
The Future of Life Institute has issued a statement against lethal autonomous weapons (LAWs) which has been signed by 176 organizations and 2,515 individuals, including Elon Musk, Stuart Russell, Max Tegmark, and the cofounders of DeepMind.
What’s wrong with LAWs: They letter says humans should never to delegate the decision to use lethal force to machines, because these weapons remove the “risk, attributability, and difficulty” of taking lives, and that this makes them potentially destabilizing, and powerful tools of oppression.
(Self-)Regulation: The international community does not yet possess the governance systems to prevent a dangerous arms race. The letter asks governments to create “strong international norms, regulations and laws against LAWs.” This seems deliberately timed ahead of the upcoming meeting of the UN CCW to discuss the issue. The signatories pledge to self-regulate, promising to “neither participate in nor support the development, manufacture, trade, or use of LAWs.”
Why it matters: Whether a ban of these weapons is a feasible, or desirable, remains unclear. Nonetheless, the increasing trend of AI practitioners mobilizing on ethical and political issues will have a significant influence on how AI will be developed. If efforts like this lead to substantive policy changes they could also serve as a useful model to study as researchers try to achieve political ends in other aspects of AI research and development.
Read more: Lethal Autonomous Weapons Pledge (Future of Life Institute).
US military’s AI plans take shape:
The DoD has announced that they will be releasing a comprehensive AI strategy ‘within weeks’. This follows a number of piecemeal announcements, which have included the establishment earlier this month of the Joint Artificial Intelligence Center (JAIC), which will oversee all large AI programs in US defence and intelligence and forge partnerships with industry and academia.
Why it matters: This is just the latest reminder that militaries already see the potential in AI, and are ramping up investment. Any AI arms race between countries carries substantial risks, particularly if parties prioritize pace of development over building safe, robust systems (see below). Whether or not the creation of a military AI strategy will prompt the US to finally release a broader national strategy remains to be seen.
Read more: Pentagon to Publish Artificial Intelligence Strategy ‘Within Weeks’.
Read more: DoD memo announcing formation of the JAIC.
Germany releases framework for their national AI strategy:
The German government has released a prelude to their national AI strategy, which will be announced at the end of November. (NB – the document has not been released in English, so I have relied on Google Translate)
Broad ambitions: The government presents a long list of goals for their strategy. These include fostering a strong domestic industry and research sector, developing and promoting ethical standards and new regulatory frameworks, and encouraging uptake in other industries.
Some specific proposals:
– A Data Ethics Committee to address the ethical and governance issues arising from AI.
– Multi-national research centers with France and other EU countries.
– The development of international organizations to manage labor displacement.
– Ensuring that Germany and Europe lead international efforts towards common technical standards.
– Public dialogue on the impacts of AI.
Read more: Cornerstones of the German AI Strategy (German).
Solving the AI race:
GoodAI, a European AI research organization, held a competition for ideas on how to tackle the problems associated with races in AI development. Here follows summaries of two of the winning papers.
A formal theory of AI coordination: This paper approaches the problem from an international relations perspective. The researchers use game theory to model 2-player AI races, where AI R&D is costly, the outcome of the race is uncertain, and players can either cooperate or defect. They consider four models the race could plausibly take, determined by the coordination regime in place, and suggest which models are the ‘safest’ in terms of players being incentivized against developing risky AI. They suggest policies to promote cooperation within different games, and to shift race dynamics into more safety-conducive set-ups.
Solving the AI race: This paper gives a thorough overview of how race dynamics might emerge, between corporations as well as militaries, and spells out a comprehensive list of the negative consequences from such a situation. The paper present 3 mitigation strategies with associated policy recommendations. (1) encouraging and enforcing cooperation between actors; (2) providing incentives for transparency and disclosure; (3) establishing AI regulation agencies.
Why it matters: There are good reasons to be worried about race dynamics in AI. Competing parties could be incentivized to prioritize pace of development over safety, with potentially disastrous consequences. Equally, if advanced AI is developed in an adversarial context, this could make it less likely that its benefits are fairly distributed amongst humanity. More worryingly, it is hard to see how race dynamics can be avoided given the ‘size of the prize’ in developing advanced AI. Given this, researching strategies for managing races and enforcing cooperation should be a priority.
Read more: General AI Challenge Winners (GoodAI).
OpenAI Bits & Pieces:
OpenAI Five Benchmark:
We’ve removed many of the restrictions on our 5v5 bots and will be playing a match in a couple of weeks. Check out the blog for details about the restrictions we’ve removed and the upcoming match.
Read more: OpenAI Five Benchmark (OpenAI blog).
AI wizard Mike Cook wants OpenAI’s Dota bots to teach him, not beat him:
Here’s a lengthy interview with Mike Cook, a games AI research, who gives some of his thoughts on OpenAI Five.
Read more: AI wizard Mike Cook wants OpenAI’s Dota bots to teach him, not beat him (Rock Paper Shotgun).
So believe it or not the first regulations came in because people liked the drones too much – these delivery companies started servicing areas and, just like in online games, there were always some properties in a given region that massively outspent others by several orders of magnitude. As in any other arena of life where these fountains of money crop up, workers would nickname the people in these properties ‘whales’. The whales did what they did best and spent. But problems emerged as companies continued expanding delivery hours and the whales continued to spend and spend – suddenly, an area that the companies machine learning algorithm had zoned for 50 deliveries a day (and squared away with planning officials) suddenly had an ultra-customer to contend with. And these customers would order things in the middle of the night. Beans would land on lawns at 3am. Sex toys at 4am. Box sets of obscure TV shows would plunk down at 6am. Breakfast burritos would whizz in at 11am. And so on. So the complaints started piling up and that led to some of the “anti-social drone” legislation, which is why most cities now specify delivery windows for suburban areas (and ignore the protests of the companies who point to their record-breakingly-quiet new drones, or other innovations).
Things that inspired this story: Drones, Amazon Prime, everything-as-a-service, online games.