Import AI: 122: Google obtains new ImageNet state-of-the-art with GPipe; drone learns to land more effectively than PD controller policy; and Facebook releases its ‘CherryPi’ StarCraft bot
by Jack Clark
Google obtains new ImageNet state-of-the-art accuracy with mammoth networks trained via ‘GPipe’ infrastructure:
…If you want to industrialize AI, you need to build infrastructure like GPipe…
Parameter growth = Performance Growth: The researchers note that the winner of the 2014 ImageNet competition had 4 million parameters in its model, while the winner of the 2017 challenge had 145.8 million parameters – a 36X increase in three years. GPipe, by comparison, can support models of up to almost 2-billion parameters across 8 accelerators.
Pipeline parallelism via GPipe: GPipe is a distributed ML library that uses synchronous mini-batch gradient descent for training. It is designed to spread workloads across heterogeneous hardware systems (multiple types of chips) and comes with a bunch of inbuilt features which let it efficiently scale up model training, with the researchers reporting a (very rare) near-linear speedup: “with 4 times more accelerators we can achieve a 3.5 times speedup for training giant neural networks [with GPipe]” they write.
Results: To test out how effective GPipe is the researchers trained ResNet and AmoebaNet (previous ImageNet SOTA) networks on it, running the experiments on TPU-V2s, each of which has 8 accelerator cores and an aggregate memory of 64GB. Using this technique they were able to train a new ImageNet system with a state-of-the-art Top-1 Accuracy of 84.3% (up from 82.7 percent), and a Top-5 Accuracy of 97 percent.
Why it matters: “Our work validates the hypothesis that bigger models and more computation would lead to higher model quality,” write the researchers. This trend of research bifurcating into large-compute and small-compute domains has significant ramifications for the ability for smaller entities (for instance, startups) to effectively compete with organizations with access to large computational infrastructure (eg, Google). A more troubling effect with long-term negative consequences is that at these compute scales it is almost impossible for academia to do research at the same scale as corporate research entities. I continue to worry that this will lead to a splitting of the AI research community and potentially the creation of the sort of factionalism and ‘us vs them’ attitude seen elsewhere in contemporary life.
Companies will seek to ameliorate this inequality of compute by releasing the artifacts of compute (eg, pre-trained models). Though this will go some way to empowering researchers it will fail to deal with the underlying problems which are systemic and likely require a policy solution (aka, more money for academia, and so on).
Read more: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (Arxiv).
Neural net beats tuned PD controller at tricky drone landing task:
…The future of drones: many neural modules…
A recent trend in AI research has been work showing that deep learning-based techniques can outperform hand-crafted rule-based systems in domains as varied as image recognition, speech recognition, and even the design of neural network architectures. Now, researchers with CalTech, Northeastern University, and the University of California at Irvine, have shown that it is possible to use neural networks to learn how to land quadcopters with a greater accuracy than a PD (proportional derivative) controller.
Neural Lander: The researchers call their system the ‘Neural Lander’ and say it is designed “to improve the precision of quadrotor landing with guaranteed stability. Our approach directly learns the ground effect on coupled unsteady aerodynamics and vehicular dynamics…We evaluate Neural-Lander for trajectory tracking of quadrotor during take-off, landing and near ground maneuvers. Neural-Lander is able to land a quadrotor much more accurately than a naive PD controller with a pre-identified system.”
Testing: The researchers evaluate their approach on a real world system and show that “compared to the PD controller, Neural-Lander can decrease error in z direction from 0.13m to zero, and mitigate average x and y drifts by 90% and 34% respectively, in 1D landing. Meanwhile, NeuralLander can decrease z error from 0.12m to zero, in 3D landing. We also empirically show that the DNN generalizes well to new test inputs outside the training domain.”
Why it matters: Systems like this show not only the broad utility of AI systems for diverse tasks, but also highlight how researchers are beginning to think about meshing these learnable modules into products. It’s likely of interest that one of the sponsors of this research was defense contractor Raytheon (though as with the vast majority of academic research it’s almost certain Raytheon did not have any particular role or input into this research, but rather has decided to broadly fund research into drone autonomy – nonetheless, this indicates the direction where major defense contractors think the future lies).
Read more: Neural Lander: Stable Drone Landing Control using Learned Dynamics (Arxiv).
Watch videos of the Neural Lander in action (YouTube).
AI Research Group MIRI plans future insights to be “nondisclosed-by-default”:
….Research organization says recent progress, desire for ability to concentrate, and worries that its safety research will be increasingly related to capabilities research, means it should go private…
Nate Soares, the executive director of AI research group MIRI, says the organization “recently decided to make most of its research ‘nondisclosed-by-default’, by which we mean that going forward, most results discovered within MIRI will remain internal-only unless there is an explicit decision to release those results”.
MIRI is doing this because it thinks it can increase the pace of its research if it focuses on making research progress “rather than on exposition, and if we aren’t feeling pressure to justify our intuitions to wide audiences, and that it is worried that some of its new research paths could have “capabilities insights” which thereby speed the arrival of (in its view, unsafe-by-default) AGI. It also sees some merit to deliberate isolation, based on an observation that “historically, early-stage scientific work has often been done by people who were solitary or geographically isolated”.
Why going quiet could be dangerous: MIRI acknowledges some of the potential risks of this approach, noting that it may make it more difficult for it to hire and evaluate researchers; makes it harder to get useful feedback on its ideas from other people around the world; increases the difficulty of it obtaining funding; and leading to various “social costs and logistical overhead” from keeping research private.
“Many of us are somewhat alarmed by the speed of recent machine learning progress”, Soares writes. That’s combined with the fact MIRI believes it is highly likely people will successfully develop artificial general intelligence at some point with or without safety. “Humanity doesn’t need coherent versions of [AI safety/alignment] concepts to hill-climb its way to AGI,” Soares writes. “Evolution hill-climbed that distance, and evolution had no model of what it was doing”.
Money: MIRI benefited from the cryptocurrency boom in 2017, receiving millions of dollars in donations from people who had made money on the spike in Ethereum. It has subsequently gained further funding, so – having surpassed many of its initial fundraising goals – is able to plan for the long term.
Secrecy is not so crazy: Many AI researchers are privately contemplating when and if certain bits of research should be taken private. This is driven by a combination of near-term concerns (AI’s dual use nature means people can easily repurpose an innovation made for one purpose to do something else), and longer-term worries around the potential development of powerful and unsafe systems. In OpenAI’s Charter, published in April this year, the company said “we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research”.
Read more: 2018 Update: Our New Research Directions (MIRI).
Read more: OpenAI Charter (OpenAI Blog).
Hand-written bots beat deep learning bots in annual StarCraft: Brood War competition:
…A team of researchers from Samsung has won the annual AIIDE StarCraft competition…
Team Samsung SDS AI and Data Analytics (SAIDA) has won the annual StarCraft: Brood War tournament using a bot based on hand written rules, beating out bots from other teams including Facebook, Stanford University, and Locutus. The win is significant for a couple of reasons: 1) the bot’s have massively improved compared to the previous year and 2) a bot from Facebook (CherryPi) came relatively close to dethroning the hand-written SAIDA bot.
Human supremacy? Not for long: “Members of the SAIDA team told me that they believe pro StarCraft players will be beaten less than a year from now,” wrote competition co-organizer Dave Churchill on Facebook. “I told them it was a fairly bold claim but they reassured me that was their viewpoint.”
Why StarCraft matters: StarCraft is a complex, partially observable strategy game that involves a combination of long-term strategic decisions oriented around building an economy, traversing a tech free, and building an army, and short-term decisions related to the micromanagement of specific units. Many AI researchers are using StarCraft (and its successor, StarCraft II) as a testbed for machine learning-based game playing systems.
Read more here: AIIDE StarCraft Competition results page (AIIDE Conference).
Facebook gives details on CherryPi, its neural StarCraft II bot:
…Shows that you can use reinforcement learning and a game database to learn better build orders…
Researchers with Facebook AI Research have given details on some of the innards of their “CherryPi” bot, which recently competed and came second in the annual StarCraft competition, held at AIIDE in Canada. Here, they focus on the challenge of teaching their robots to figure out what build orders (out of a potential set of 25) their bots should pursue at any one point in time. This is challenging because StarCraft is partially observable – that is, the map has fog-of-war, and until the late stages of a game it’s unlikely any single player is going to have a good sense of what is going on with other players, so figuring out the correct unit selection relies on a player being able to model this unknowable aspect of the game, and judge appropriate actions. “While it is possible to tackle hidden state estimation separately and to provide a model with these estimates, we instead opt to perform estimation as an auxiliary prediction task alongside the default training objective,” they write.
Method: The specific way the researchers get their system to work is by using an LSTM with 2048 cells (the same component was used by OpenAI in its ‘OpenAI Five’ Dota 2 system), training this using a series of Facebook-assembled 2.8 million games containing 3.3 million switches between build orders. They evaluate two variants of this system: visible, which counts units currently visible, and memory which uses hard-coded rules to keep track of enemy units that were seen before but are currently hidden.
Results: The researchers show that memory-based systems which perform hidden state estimation as an auxiliary task obtain superior scores to visible systems, and systems trained without the auxiliary loss. These systems are able to obtain win rates of as high as 88% against inbuilt bots, and 60% to 70% against Locutus and McRave bots (the #5 and #8 ranked bots @ the AIIDE competition this year).
Why it matters: If we zoom out from StarCraft and consider the problem domain it represents (partially observable environments where a player needs to make strategic decisions without full information) it’s clear that the growing applicability of learning approaches will have impacts on competitive scenarios in fields like logistics, supply chain management, and war. But these techniques still require an overwhelmingly large amount of data to be viable, suggesting that if people don’t have access to a simulator it’s going to be difficult to apply such systems.
Read more: High-Level Strategy Selection under Partial Observability in StarCraft: Brood War (Arxiv).
Facebook releases TorchCraftAI, its StarCraft AI development platform:
…Open source release includes CherryPi bot from AIIDE, as well as tutorials, support for Linux, and more…
Facebook has also released TorchCraftAI, the platform it has used to develop CherryPi. TorchCraftAI includes “a modular framework for building StarCraft agents, where modules can be hacked with, replaced by other, or by ML/RL-trained models”, as well as tutorials, CherryPi, and support for TCP communication.
Read more: Hello, Github (TorchCraftAI site).
Get the code: TorchCraftAI (GitHub).
Training AI systems to spot people in disguise:
…Prototype research shows that deep learning systems can spot people in disguise, but more data from more realistic environments needed…
Researchers with IIIT-Delhi, IBM TJ Watson Research Center, and the University of Maryland have created a large-scale dataset, Disguised Faces in the Wild (DFW), which they say can potentially be used to train AI systems to identify people attempting to disguise themselves as someone else.
DFW: The DFW dataset contains 11,157 pictures across 1,000 distinct human subjects. Each human subject is paired with pictures of them, as well as pictures of them in disguise, and pictures of impersonators (people that either intentionally or unintentionally bear a visual similarity to the subject). DFW is also pre-split into subsets split across ‘easy’, ‘medium’, and ‘hard’ difficulty, with the segmenting being done according to the success rate of three baseline algorithms at correctly identifying the right faces.
Can a neural network identify a disguised face in the wild? The researchers hosted a competition at CVPR 2018 to see which team could devise the best system to sort images of people into the person or an imposter. They evaluate systems on two metrics: their Genuine Acceptance Rate at 1% False Acceptance Rate (FAR), and the far harder 0.1% FAR. Top-scoring systems obtain scores of as high as 96.80% at 1% FAR and 57.64% at 0.1% FAR on the relatively easy task of telling true faces from impersonated faces; scores of 87.82% at 1% FAR and 77.06% at 0.1% FAR at the more challenging task of dealing with deliberately obfuscated faces.
Why it matters: I view this research as a kind of prototype showing the potential efficacy of deep learning algorithms at spotting people in disguise, but I’d want to see an analysis of algorithmic performance on a significantly larger dataset with greater real world characteristics – for instance, one involving tens of thousands of distinct humans in a variety of different lighting and environmental conditions, ideally captured via the sorts of CCTV cameras deployed in public spaces (given that this is where this sort of research is heading). Papers like this provide further evidence of the ways in which surveillance can be scaled up and automated via the use of deep learning approaches.
Read more: Recognizing Disguised Faces in the Wild (Arxiv).
Get the data: The DFW dataset is available from the project website, though it requires people to sign a license and request a password to access the dataset. Get the data here (Image Analysis and Biometrics Lab @ IIIT Delhi).
AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: firstname.lastname@example.org…
US moves closer to stronger export controls on emerging technology:
The US Department of Commerce has released plans for potential new, strengthened export controls on emerging technologies. Earlier this year, Congress authorized the Department to establish new measures amidst concerns that US export controls were increasingly dated. The proposal would broaden the scope of the existing oversight to include AI and related hardware, as well as technologies related to robotics, biotech and quantum computing. The Department hopes to determine how to implement such controls without negatively impacting US competitiveness. They will be soliciting feedback on the proposals for the next 3 weeks.
Why it matters: This is yet more evidence of growing AI nationalism, as governments realize the importance of retaining control over advanced technologies. Equally, this can be seen as adapting long-standing measures to a new technological landscape. The effects on US tech firms, and international competition in AI, will likely only become clear once such measures, if they pass, start being enforced.
Why it could be challenging: A lot of AI capabilities are embodied in software rather than hardware, making the technology significantly harder to apply controls to.
Read more: Review of Controls for Certain Emerging Technologies (Federal Register).
Read more: The US could regulate AI in the name of national security (Quartz)
UK outlines plans for AI ethics body:
The UK government has released its response to the public consultation on the new Centre for Data Ethics and Innovation. The Centre was announced last year as part of the UK’s AI strategy, and will be an independent body, advising government and regulators on how to “maximise the benefits of data and AI” for the UK. The document broadly reaffirms the Centre’s goals of identifying gaps in existing regulation in the UK, and playing a leading role in international conversations on the ethics of these new technologies. The Centre will release their first strategy document in spring 2019.
Why it matters: The UK is positioning itself as a leader in the ethics of AI, and has a first-mover advantage in establishing this sort of body. The split focus between ethics and ‘innovation’ is odd, particularly given that the UK has established the Office for AI to oversee the UK’s industrial strategy in AI. Hopefully, the Centre can nonetheless be a valuable contributor to the international conversation on the ethics and governance of AI.
Read more: Centre for Data Ethics and Innovation: Response to Consultation.
OpenAI Bits & Pieces:
Jack gets a promotion, tries to be helpful:
I’ve been promoted to Director of Policy for OpenAI. In this role I’ll be working with our various researchers to translate work at OpenAI into policy activities in a range of forums, both public and private. My essential goal is to ensure that OpenAI can achieve its mission of ensuring that powerful and highly capable AI systems benefit all of humanity.
Feedback requested: For OpenAI I’m in particular going to be attempting to “push” certain policy ideas around core interests like building international AI measurement and analysis infrastructure, trying to deal with the challenges posed by the dual use nature of AI, and more. If you have any feedback on what you think we should be pursuing, how you think we should go about executing our goals, and have any ideas for how you could help or introduce us to people that can help us, then please get in touch: email@example.com
The new weather isn’t hot or cold or dry; the new weather is about money suddenly appearing and disappearing, spilling into our world from the AI financial markets.
It seemed like a good idea at the time: why not give the robots somewhere to trade with eachother? By this point the AI-driven corporations were inventing products too rapidly for them to have their value reflected in the financial markets – they were just too fast, and the effects too weird; robot-driven corporate finance departments started placing incredibly complex multi-layered long/short contracts onto their corporate rivals, all predicated on AI-driven analysis of the new products, and so the companies found a new weapon to use to push eachother around: speculative trading about eachother’s futures.
So our solution was the “Fast Low-Assessment Speculative Corporate Futures Market” (FLASCFM) – what everyone calls the SpecMark – here, the machines can trade against eachother via specially designated subsidiaries. Most of the products these subsidiaries make are virtual – they merely develop the product, put out a specification into the market, and then judge the success of the product on the actions of the other corporate robo-traders in the market. Very few products are made as a consequence, with instead the companies getting better and better at iterating through ideas more and more rapidly, forcing their competitors to invest more and more in the compute resources needed to model their competitors in the market.
In this way, a kind of calm reigns. The vast cognitive reservoirs of the AI corporations are mostly allocated into the SpecMark. We think they enjoy this market, insofar as the robots can enjoy anything, because of its velocity combined with its pressure for novelty.
But every so often a product does make it all the way through: it survives the market and rises to the top of the vast multi-dimensional game of rock-paper-scissors-n-permutations being played by the corporate robotraders. And then the factories swing into gear, automatic marketing campaigns are launched, and that’s how we humans end up with the new things, the impossible things.
Weather now isn’t hot or cold or dry; weather now is a product: a cutlery set which melts at the exact moment you’ve finished your meal (no cleanup required – let those dishes just fade away!); a software package that lurks on your phone and listens to all the music you listen to then comes up with a perfect custom song for you; a drone that can be taught like a young dog to play fetch and follow basic orders; a set of headphones where if you wear them you can learn to hear anxiety in the tones of other people’s voices, making you a better negotiator.
We don’t know what hurricanes look like this with new weather. Yet.
Things that inspired this story: High-Frequency Trading; Flash Crash; GAN-generated products; reputational markets.