Import AI: #102: Testing AI robustness with IMAGENET-C, militarycivil AI development in China, and how teamwork lets AI beat humans
by Jack Clark
Microsoft opens up search engine data:
…New searchable archive simplifies data finding for scientists…
Microsoft has released Microsoft Research Open Data, a new web portal that people can use to comb through the vast amounts of data released in recent years by Microsoft Research. The data has also been integrated with Microsoft’s cloud services, so researchers can easily port the data over to an ‘Azure Data Science virtual machine’ and start manipulating it with pre-integrated data science software.
Data highlights: Microsoft has released some rare and potentially valuable datasets, like 10GB worth of ‘Dual Word Embeddings Trained on Big Queries‘ (data from live search engines tends to be very rare), along with original research-oriented datasets like FigureQA, and a bunch of specially written mad libs.
Read more: Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud (Microsoft Research Blog).
Browse the data: Microsoft Research Open Data.
What does military<>civil fusion look like, and why is China so different from America?
…Publication from Tsinghua VP highlights difference in technology development strategies…
What happens when you have a national artificial intelligence strategy that revolves around developing military and civil AI applications together? A recent (translated) publication by You Zheng, vice president of China’s Tsinghua University, provides some insight.
Highlights: Tsinghua is currently constructing the ‘High-End Laboratory for Military Intelligence’, which will focus on developing AI to better support China’s country-level goals. As part of this, Tsinghua will invest in basic research guided by some military requirements. The university has also created the ‘Tsinghua Brain and Intelligence Laboratory’ to encourage interdisciplinary research which is less connected to direct military applications. Tsinghua also has a decade-long partnership with Chinese social network WeChat and search engine Sohuo, carrying out joint development within the civil domain. And it’s not focusing purely on technology – the school recently created a ‘Computational Legal Studies’ masters program “to integrate the school’s AI and liberal arts so as to try a brand-new specialty direction for the subject.”
Why it matters: Many governments are currently considering how to develop AI to further support their strategic goals – many countries in the West are doing this by relying on a combination of classified research, public contracts from development organizations like DARPA, and partnerships with the private sector. But the dynamics of the free market and tendency in these countries to have relatively little direct technology development and research via the state (when compared to the amounts expended by the private sector) has led to uneven development, with civil applications leaping ahead of military ones in terms of capability and impact. China’s gamble is that a state-led development strategy can let it better take advantage of various AI capabilities to more rapidly integrate AI into its society – both civil and military. The outcome of this gamble will be a determiner of the power balance of the 21st century.
Read more: Tsinghua’s Approach to Military-Civil Fusion in Artificial Intelligence (Battlefield Singularity).
DeepMind bots learn to beat humans at Capture the Flag:
…Another major step forward for team-based AI work…
Researchers with DeepMind have trained AIs that are competitive with humans in a first-person multiplayer game. The result shows that it’s possible to train teams of agents to collaborate with each other to achieve an objective against another team (in this case, Capture the Flag played from the first person perspective within a modified version of the game Quake 3), and follows other recent work from OpenAI on the online team-based multiplayer game Dota, as well as work by DeepMind, Facebook, and others on StarCraft 1 and StarCraft 2.
The technique relies on a few recently developed approaches, including multi-timescale adaptation, an external memory module, and having agents evolve their own internal reward signals. DeepMind combines these techniques with a multi-agent training infrastructure which uses its recently developed ‘population-based training’ technique. One of the most encouraging results is that trained agents can generalize to never-before-seen maps and typically beat humans when playing under these conditions.
Additionally, the system lets them train very strong agents: “we probed the exploitability of the agent by allowing a team of two professional games testers with full communication to play continuously against a fixed pair of agents. Even after twelve hours of practice the human game testers were only able to win 25% (6.3% draw rate) of games against the agent team, though humans were able to beat the AIs when playing on pre-defined maps by slowly learning to exploit weaknesses in the AI. Agents were trained on ~450,000 separate games.
Why it matters: This result, combined with work by others on tasks like Dota 2, shows that it’s possible to use today’s existing AI techniques, combined with large-scale training, to create systems capable of beating talented humans at complex tasks that require teamwork and planning over lengthy timescales – I think because of the recent pace of AI progress these results can seem weirdly unremarkable, but I think that perspective would be wrong: it is remarkable we can develop agents capable of beating people at tasks requiring ‘teamwork’ – a trait that seems to require many of the cognitive tools we think are special, but which is now being shown to be achievable via relatively simple algorithms. As some have observed, one of the more intuitive yet counter-intuitive aspects of these results is how easily it seems that ‘teamwork’ can be learned.
Less discussed: I think we’re entering the ‘uncanny valley’ of AI research when it comes to developing things with military applications. This ‘capture the flag’ demonstration, along with parallel work on OpenAI and on StarCraft, has a more militaristic flavor than prior research by the AI community. My suspicion is we’ll need to start thinking more carefully about we contextualize results like this and work harder at analyzing which other actors may be inspired by research like this.
Read more: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning (Arxiv).
Watch extracts of the agent’s behavior here (YouTube).
Discover the hidden power of Jupyter at JupyterCon.
2017: 1.2 million Jupyter notebooks on GitHub.
2018: 3 million, when JupyterCon starts in New York this August.
– This is just one sign of the incredible pace of discovery, as organizations use notebooks and use recent platform developments to solve difficult data problems such as scalability, reproducible science, and compliance, data privacy, ethics, and security issues. JupyterCon: It’s happening Aug 21-25.
– Save 20% on most passes with the code IMPORTAI20.
Ever wanted to track the progress of language modelling AI in minute detail? Now is your chance!
…Mapping progress in a tricky-to-model domain…
How fast is the rate of progression in natural language processing technologies, and where does that progression fit into the overall development of the AI landscape? That’s a question that natural language processing researcher Seb Ruder has tried to answer with a new project oriented around tracking the rate of technical progress on various NLP tasks. Check out the project’s GitHub page and try to contribute if you can.
Highlights: The GitHub repository already contains more than 20 tasks, and we can get an impression of recent AI progress by examining the results. Tasks like language modeling have seen significant progress in recent years, while tasks like constituency parsing and part-of-speech tagging have seen less profound progress (potentially because existing systems are quite good at these tasks).
Read more: Tracking the Progress in Natural Language Processing (Sebastian Ruder’s website).
Read more: Tracking Progress in Natural Language Processing (GitHub).
Facebook acquires language AI company Bloomsbury AI:
…London-based acquihire adds language modeling talent…
Facebook has acquired the team from Bloomsbury AI who will join the company in London and work on natural language processing research. Bloomsbury had previously built systems for examining corpuses of text and answering questions about them, and includes an experienced AI engineering and research team including Dr Sebastian Riedel, a professor at UCL (acquiring companies with professors tends to be a strategic move as it can help with recruiting).
Read more: Bloomsbury AI website (Bloomsbury AI).
Read more: I’d link to the ‘Facebook Academics’ announcement if Facebook didn’t make it so insanely hard to get direct URLs to link to within its giant blue expanse.
What is in Version 2, makes the world move, and just got better?
…Robot Operating System 2: Bouncy Bolson…
The ‘Bouncy Bolson’ version of ROS 2 (Robot Operating System) has been released. New features for the open source robot software include better security features, support for 3rd party package submission on the ROS 2 build farm, new command line tools, and more. This is the second non-beta ROS 2 release.
Read more: ROS 2 Bouncy Bolson Released (Ros.org).
Think deep learning is robust? Try out IMAGENET-C and think again:
…New evaluation dataset shows poor robustness of existing models…
Researchers with Oregon State University have created new datasets and evaluation criteria to see how well trained image recognition systems deal with corrupt data. The research highlights the relatively poor representation and generalisation of today’s algorithms, while providing challenging datasets people may wish to test systems against in the future. To conduct their tests, the researchers create two datasets to evaluate how AI systems deal with these changes. IMAGENET-C is a dataset to test for “corruption robustness” and ICONS-50 is for testing for “surface variation robustness”.
IMAGENET-C sees them apply 15 different types of data corruption to existing images, ranging from blurring images, to adding noise, or the visual hallmarks of environmental effects like snow, frost, fog, and so on. ICONS-50 consists of 10,000 images from 50 clases of icons of different things like people, food, activities, logos, and so on, and each class contains multiple different illustrative styles.
Results: To test how well algorithms deal with these visual corruptions the researchers test pre-trained image categorization models against different versions of IMAGENET-C (where a version roughly corresponds to the amount of corruption applied to a specific image), then compute the error rate. The results of the test are that more modern architectures have become better at generalizing to new datatypes (like corrupted images), but that robustness – which means how well a model adapts to changes in data – has barely risen. “Relative robustness remains near AlexNet-levels and therefore below human-level, which shows that our superhuman classifiers are decidedly subhuman,” they write. They do find that there are a few tricks that can be used to increase the capabilities of models to deal with corrupted data: “more layers, more connections, and more capacity allow these massive models to operate more stably on corrupted inputs,” they write.
For ICONs-50 they try to test classifier robustness by removing the icons from one source (eg Microsoft) or by selecting removing subtypes (like ‘ducks’) from broad categories (like ‘birds’). Their results are somewhat unsurprising: networks are not able to learn enough general features to effectively identify held-out visual styles, and similarly poor performance is displayed when tested on held-out sub-types.
Why it matters: As we currently lack much in the way of theory to explain and analyze the successes of deep learning we need to broaden our understanding of the technology through empirical experimentation, like what is carried out here. And what we keep on learning is that, despite incredible gains in performance in recent years, deep nets themselves seem to be fairly inflexible when dealing with unseen or out-of-distribution data.
Read more: Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations (Arxiv).
AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net …
Technology Roulette:
Richard Danzig, former secretary of the Navy, has written a report for thinktank the Center for a New American Security, on the risks arising from militaries pursuing technological superiority.
Superiority does not imply security: Creating a powerful, complex technology creates a new class of risks (e.g. nuclear weapons, computers). Moreover, pursuing technological superiority, particularly in a military context, is not a guarantee of safety. While superiority might decrease the risk of attack, through deterrence, it raises the risk of a loss of control, through accidents, misuse, or sabotage. These risks are made worse by the unavoidable proliferation of new technologies, which will place “great destructive power” in the hands of actors without the willingness or ability to take proper safety precautions.
Human-in-the-loop: A widely held view amongst the security establishment is that these risks can be addressed by retaining human input in critical decision-making. Danzig counters this, arguing that human intervention is “too weak, and too frequently counter-productive” to control military systems that rely on speed. And AI decision-making is getting faster, whereas humans are not, so this gap will only widen over time. Efforts to control such systems must be undertaken at the time of design, rather that during operation.
What to do: The report makes 5 recommendations for US military/intelligence agencies:
-Increase focus on risks of accidents and emergent effects
– Give priority to reducing risks of proliferation, adversarial behavior, accidents and emergent behaviors.
– Regularly assess these risks, and encourage allies and opponents to do so.
– Increase multilateral planning with allies and opponents, to be able to recognize and respond to accidents, major terrorist events, and unintended conflicts.
– Use new technologies as means for encourage and verifying norms and treaties.
Why this matters: It seems inevitable that militaries will see AI as a means of achieving strategic advantage. This report sheds light on the risks that such a dynamic could pose to humanity if parties do not prioritize safety, and do not cooperate on minimizing risks from loss of control. One hopes that these arguments are taken seriously by the national security community in the US and elsewhere.
Read more: Technology Roulette: Managing Loss of Control as Many Militaries Pursue Technological Superiority (CNAS).
UK government responds to Lords AI Report:
The UK government has responded to the recommendations made in the House of Lords’ AI report, released in April. For the most part, the government accepts the committee’s recommendations and is taking actions to help with specific elements of the recommendations:
On public perceptions of AI, the government will work to build trust and confidence in AI through the AI institutions like the Centre for Data Ethics and Innovation (CDEI), which will pursue extensive engagement with the public, industry and regulators, and will align governance measures with the concerns of the public, and businesses.
– On algorithmic transparency, the government pushes back against the report’s recommendation that deployed AI systems have a very high level of transparency/explainability. They note that excessive demands for algorithmic transparency in deployed algorithms could hinder development, particularly in deep learning, and must therefore be weighed against the benefits of the technologies.
– On data monopolies the government will strengthen the capabilities of the UK’s competition board to monitor anti-competitive practices in data and AI, so it can better analyze and respond to the potential for the monopolisation of data by tech giants.
– On autonomous weapons the report asked that the UK improves its definition of autonomous weapons, and brings it into line with that of other governments and international bodies. The government defines an autonomous system as one that “is capable of understanding higher-level intent and direction”, which the report argued “sets the bar so high that it was effectively meaningless.” The gov’t said they have no plans to change their definition.
– Why this matters: The response is not a game-changer, but it is worth reflecting on the way in which the UK has been developing their AI strategy, particularly in comparison with the US (see below). While the UK’s AI strategy can certainly be criticized, the first stage of information-gathering and basic policy recommendations has proceeded commendably. The Lords AI Report and the Hall-Pesenti Review were both detailed investigations, drawing on a array of expert opinions, and asking informed questions. Whether this methodology produces good policy remains to be seen, and depends on a number of contingencies.
Read more: Government response to House of Lords AI Report\.
Civil liberties group urges US urged to include public in AI policy development, consider risks
Civil liberties group EPIC has organized a petition, with a long list of signatories from academia and industry, to the US Office of Science and Technology Policy (OSTP). Their letter is critical of the US government’s progress on AI policy, and the way in which the government is approaching issues surrounding AI.
Public engagement in policymaking: The letter asks for more meaningful public participation in the development of US AI policy. They take issue with the recent Summit on AI being closed to the public, and the proposal for a Select Committee on AI identifying only the private sector as a source of advice. This contrasts with other countries, including France, Canada and UK, all of whom have made efforts to engage public opinion on AI.
Ignoring the big issues: More importantly, the letter identifies a number of critical issues that they say the government is failing to address:
– Potential harms arising from the use of AI.
– Legal frameworks governing AI.
– Transparency in the use of AI by companies, government.
– Technical measures to promote the benefits of AI and minimize the risks.
– The experiences of other countries in trying to address challenges of AI.
– Future trends in AI that could inform the current discussion.
Why this matters: The US is conspicuous amongst global powers for not having a coordinated AI strategy. Other countries are quickly developing plans not only to support their domestic AI capabilities, but to deal with the transformative change that AI will have. The issues raised by the letter cover much of the landscape governments need to address. There is much to be criticized about existing AI strategies, but it’s hard to see the benefits of the US’ complacency.
Read more: Letter to Michael Kratsios.
OpenAI Bits & Pieces:
Exploring with demonstrations:
New research from OpenAI shows how to obtain a state-of-the-art score on notoriously hard exploration game Montezuma’s Revenge by using a single demonstration.
Read more: Learning Montezuma’s Revenge from a Single Demonstration (OpenAI blog).
Tech Tales:
When we started tracking it, we knew that it could repair itself and could go and manipulate the world. But there was no indication that it could multiply. For this we were grateful. We were hand-picked from several governments and global corporations and tasked with a simple objective: determine the source of the Rogue Computation and how it transmits its damaging actions to the world.
How do you find what doesn’t want to be found? Look for where it interacts with the world. We set up hundreds of surveillance operations to monitor the telecommunications infrastructure, internet cafes, and office buildings back to which we had traced viruses that bore the hallmarks of Rogue Computation. One day we identified some humans who appeared to be helping the machine, linking a code upload to a person who had gone into the building a few minutes earlier holding a USB key. In that moment we stopped being metal-hunters and became people-hunters.
Adapt, our superiors told us. Survey and deliver requested analysis. So we surveiled the people. We mounted numerous expeditions, tracking people back from the internet cafes where they had uploaded Rogue Computation Products, and following them into the backcountry behind the megacity expanse – a dismal set of areas that, from space, looks like a the serrated ridges caused in the wake of the passage of a boat. These areas were forested; polluted with illegal e-waste and chem-waste dumps; home to populations of the homeless and those displaced by the cold logic of economics; full of discarded home robots and bionic attachments; and everywhere studded with the rusting metal shapes of crashed or malfunctioned or abandoned drones. When we followed these people into these areas we found them parking cars at the heads of former hiking trails, then making their way deeper into the wilderness.
After four weeks of following them we had our first confirmed sighting of the Suspected Rogue Computation Originator: it was a USB inlet, which dangled out of a drainage pipe embedded in the side of a brown, forested hillside. Some of us shivered when we saw a human approach the inlet and, like an ancient peasant paying tribute to a magician, extend a USB key and plug it into the inlet, then back away with their palms held up toward the inlet. A small blue light in the USB inlet went on. Then the inlet, now containing a USB key, began to withdraw backward into the drainage pipe, pulled from within.
Then things were hard for a while. We tracked more people. Watched more exchanges. Observed over twenty different events which led to Rogue Computation Products being delivered to the world. But our superiors wouldn’t let us interfere, afraid that, after so many years searching, they might spook their inhuman prey at the last minute and lose it forever. So we watched. Slowly, we pieced the picture together: these groups had banded together under various quasi-religious banners, worshiping fictitious AI creatures, and creating endless written ephemera scattered across the internet. Once we found their signs it became easy to track them and spot them – and then we realized how many of them there were.
But we triangulated it eventually, tracking it back to a set of disused bombshelters and mining complex buildings scattered through a former industrial sector in part of the ruined land outside of the urban expanse. Subsequently classified assessments predicted a plausible compute envelop registering in the hundreds of exaflops – enough to make it a strategic compute asset and in violation of numerous AI-takeoff control treaties. We found numerous illegal power hookups linking the Rogue Computation facilities to a number of power substations. Repeated, thorough sweeps failed to identify any indication of a link with an internet service provider, though – small blessings.
Once we knew where it was and knew where the human collaborators were, things became simple again: assassinate and destroy. Disappear the people and contrive a series of explosions across the land. Use thermite to melt and distort the bones of the proto Rogue Computation Originator, rewriting their structure from circuits and transistor gates to uncoordinated lattices of atoms, still gyrating from heat and trace radiation from the blasts.
Of course there are rumors that it got it: that those Rogue Computation Products it smuggled out form the scaffolds for its next version, which will soon appear in the world, made real as if by imagination, rather than the brutal exploitation of the consequences of a learning system and compute and time.
Things that inspired this story: Bladerunner, Charles Stross stories.