Import AI 115: What the DoD is planning for its robots over the next 25 years; AI Benchmark identifies 2018’s speediest AI phone; and DeepMind embeds graph networks into AI agents

by Jack Clark

UK military shows how tricky it’ll be to apply AI to war:
…Numerous AI researchers likely breathe a sigh of relief at new paper from UK’s Defence Science and Technology Laboratory…
Researchers with the UK’s Defence Science and Technology Laboratory, Cranfield Defense and Security Doctoral Training Centre, and IBM, have surveyed contemporary AI and thought about ways it can be integrated with the UK’s defence establishment. The report makes for sobering reading for large military organizations keen to deploy AI, highlighting the difficulties in terms of practical deployment (eg, procurement) and in terms of capability (many military situations require AI systems that can learn and update in response to sparse, critical data.
  Current problems: Today’s AI systems lack some key capabilities that militiaries need when deploying systems, like being able to configure systems to always avoid certain “high regret” occurrences (in the case of a military, you can imagine that firing a munition at an incorrect target (hopefully) yields such ‘high regret); being resilient to adversarial examples being weaponized against systems via another actor (whether a defender or aggressor); being able to operate effectively with very small or sparse data; being able to shard AI systems across multiple partners (eg, other militaries) in such a way that the system can be reverted to sovereign control following the conclusion of an operation; and begin able to deploy such systems into the harsh low-compute operational environment that militaries face.
  High expectations: “If it is to avoid the sins of its past, there is the need to manage stakeholder expectations very carefully, so that their demand signal for AI is pragmatic and achievable”.
  It’s the data, stupid: Militaries, like many large government organizations, have an unfortunate tendency to sub-contract much of their IT systems out to other parties. This tends to lead to systems that are:
a) moneypits
b) brittle
c) extremely hard to subsequently extend.
These factors add a confounding element to any such military deployment of AI. “Legacy contractual decisions place what is effectively a commercial blocker to AI integration and exploitation in the Defence Equipment Program’s near-term activity,” the researchers write.
  Procurement: UK defence will also need to change the way it does procurement so it can maximize the number of small-and-medium-sized enterprises it can buy its AI systems from. But buying from SMEs creates additional complications for militaries, as working out what to do with the SME-supported service if the SME stops providing it, or goes bankrupt, is difficult and imposes a significant burden on the part of the SME.
  Why it matters: Military usage of AI is going to be large-scale, consequential, and influential in terms of geopolitics. It’s also going to invite numerous problems from AI accidents as a consequence of poor theoretical guarantees and uneven performance properties, so it’s encouraging to see a military organization like representatives from UK defence seek to think through this.
  Read more: A Systems Approach to Achieving the Benefits of Artificial Intelligence in UK Defence (Arxiv).

Want to understand the mind of another? Get relational!
…DeepMind research into combining graph networks and relational networks shows potential for smarter, faster agents…
DeepMind researchers have tried to develop smarter AI agents by combining contemporary deep learning techniques with recent work by company on graph networks and relational networks. The resulting systems rely on a new module, which DeepMind calls a “Relational Forward Model”. This model obtains higher performance than pure-DL baselines, suggesting that fusing DL and more structured approaches is a viable approach which yields good performance.
  How it works: The RFM module consists of a graph network encoder, a graph network decoder, and a graph-compatible GRU. Combined, these components create a way to represent structured information in a relational manner, and to update this information in response to changes in the environment (or, theoretically, the inputs of other larger structured systems).
  Testing: The researchers test their approach on three distinct tasks: cooperative navigation, which requires agents to collaborate to efficiently navigate to both be on a distinct reward tile in an area; coin game, which requires agents to position themselves above some reward coins and to figure out by observing eachother which coins yield a negative reward and thus should be avoided; and stag hunt, where agents inhabit a map containing stags and apples, and need to work with one another to capture stags which yield a significant reward. “By embedding RFM modules in RL agents, they can learn to coordinate with one another faster than baseline agents, analogous to imagination-augmented agents in single-agent RL settings” , the researchers write.
   The researchers compare the performance of their systems against systems using Neural Relational Inference (NRI) and Vertex Attention Interaction Networks (VAIN) and find that their approach displays significantly better performance than other approaches. They also ablated their system by training versions without the usage of relational networks, and ones using feedforward networks only. These ablations showed that both components have a significant role in the performance of these systems.
  Why it matters: The research is an extension of DeepMind’s work on integrating graph networks with deep learning. This line of research seems promising because it provides a way to integrate structured data representations with differentiable learning systems, which might let AI researchers have their proverbial cake and eat it to by being able to marry the flexibility of learned systems with the desirable problem specification and reasoning properties of more traditional symbolic approaches.
  Read more: Relational Forward Models for Multi-Agent Learning (Arxiv).

Rethink Robotics shuts its doors:
…Collaborative robots pioneer closes…
Rethink Robotics, a robot company founded by MIT robotics legend Rodney Brooks, has closed. The company had developed two robots; Baxter, a two-armed bright red robot with expressive features and the pleasing capability to work around humans without killing them, and Sawyer, a one-armed successor to Baxter.
  Read more: Rethink Robotics Shuts Down (The Verge).

Want to know what the DoD plans for unmanned systems through to 2042? Read the roadmap:
….Drones, robots, planes, oh my! Plus, the challenges of integrating autonomy with military systems…
The Department of Defense has published its (non-classified) roadmap for unmanned systems through to 2042. The report identifies four core focus areas that we can expect DoD to focus on. These are: Interoperability, Autonomy, Network Security, and Human-Machine Collaboration.
  Perspective: US DoD spent ~$4.245 billion on unmanned systems in 2017 (inclusive of procurement and research, with a roughly equal split between them). That’s quite a substantial amount of money to spend and, if we can assume that this will remain the same (adjusted for inflation), then that means DoD can throw quite significant resources towards the capital R parts of unmanned systems research.
  Short-Term Priorities: DoD’s short-term priorities for its unmanned systems include: the use of standardized and/or open architectures; a shift towards modular, interchangeable parts; a greater investment in the evaluation, verification, and validation of systems; the creation of a “data transport” strategy to deal with the huge floods of data coming from such systems; among others.
  Autonomy priorities: DoD’s priorities for adding more autonomy to drones includes increasing private sector collaboration in the short term and then adding in augmented reality and virtual reality systems by the mid-term (2029), before creating platforms capable of persistent sensing with “highly autonomous” capabilities by 2042. As for the thorny issue of weaponizing such systems, DoD says that between the medium-term and long-term it hopes to be able to give humans an “armed wingman/teammate” with fire control remaining with the human.
  Autonomy issues: “Although safety, reliability, and trust of AI-based systems remain areas of active research, AI must overcome crucial perception and trust issues to become accepted,” the report says. “The increased efficiency and effectiveness that will be realized by increased autonomy are currently limited by legal and policy constraints, trust issues, and technical challenges.”
  Why it matters: The maturation of today’s AI techniques mean that it’s a matter of “when” not “if” for them to be integrated into military systems. Documents like this give us a sense of how large, military bureaucracies are reacting to the rise of AI, and it’s notable that certain concerns within the technical community about the robustness/safeness of AI systems has made its way into official DoD planning.
  Read the full report here: Pentagon Unmanned Systems Integrated Roadmap 2017-2042 (USNI News).

Should we take deep learning progress as being meaningful?
…UCLA Computer Science chair urges caution…
Adnan Darwiche, chairman of the Computer Science Department at UCLA and someone who studied AI mid-winter in the 1980s, has tried to lay out some of the reasons to be skeptical about whether deep learning will ever scale to let us build truly intelligent systems. The crux of his objection is: “Mainstream scientific intuition stands in the way of accepting that a method that does not require explicit modeling or sophisticated reasoning is sufficient for reproducing human-level intelligence”.
  Curve-fitting: The second component of the criticism is that people shouldn’t get too excited about neural network techniques because all they really do is curve-fitting, and instead we should be looking at using model-based approaches, or making hybrid systems.
  Time is the problem: “It has not been sustained long enough to allow sufficient visibility into this consequential question: How effective will function-based approaches be when applied to new and broader applications than those already targeted, particularly those that mandate more stringent measures of success?”
  Curve-fitting can’t explain itself: Another problem identified by the author is the lack of explanation inherent to these techniques, which they see as further justifying investment by the deep learning community into model-based approaches which include more assumptions and/or handwritten sections. “Model-based explanations are also important because they give us a sense of “understanding” or “being in control” of a phenomenon. For example, knowing that a certain diet prevents heart disease does not satisfy our desire for understanding unless we know why.”
  Giant and crucial caveat: Let’s be clear that this piece is essentially reacting to a cartoonish representation of the deep learning AI community that can be caricatured as having this opinion: Deep Learning? Yeah! Yeah! Yeah! Deep Learning is the future of AI! I should note that I’ve never met anyone technically sophisticated who has this position, and most researchers when pressed will raise somewhat similar concerns to those identified in this article. I think some of the motivation for this article stems more from dissatisfaction with the current state of (most) media coverage regarding AI which tends to be breathless and credulous – this is a problem, but as far as I can tell it isn’t really a problem being fed intentionally by people within the AI community, but is instead a consequence of the horrific economics of the post-digital news business and associated skill-rot that occurs.
  Why it matters: Critiques like this are valuable as they encourage the AI community to question itself. However, I think that these critiques need to be manufactured over significantly shorter timescales and should take into account more contemporary research; for instance, some of the objections here seem to be (lightly) rebutted by recent work in NLP which shows that “curve-fitting” systems are capable of feats of reasoning, among other examples. (In the conclusion of this article it says the first draft was written in 2016, then a draft was circulated in the summer of 2017, and now it has been officially published in Autumn 2018, rendering many of its technical references outdated.)
  Read more: Human-level intelligence or animal-like abilities (ACM Digital Library).

Major companies create AI Benchmark and test 10,000+ phones for AI prowess, and a surprising winner emerges:
…Another sign of the industrialization of AI…new benchmarks create standards and standards spur markets…
Researchers with ETH Zurich, Google, Qualcomm, Huawei, MediaTek, and ARM, want to be able to better analyze the performance of AI software on different smartphones and so have created “AI Benchmark” and tested over 10,000 devices against it. AI Benchmark is, a batch of nine tests for mobile devices which has been “designed specifically to test the machine learning performance, available hardware AI accelerators, chipset drivers, and memory limitations of the current Android devices”.
  The ingredients of the AI Benchmark: The benchmark consists of nine deep learning tests: Image Recognition tested on ImageNet using a lightweight MobileNet-V1 architecture, and the same test but implementing a larger Inception-V3 network; Face Recognition performance of an Inception-Resnet-V1 on the VGGFace2 dataset; Image Deblurring using the SRCNN network; Image Super-Resolution with a downscaling factor of 3 using a VSDR network, and the same test but with a downscaling factor of 4 and using an SRGAN; Image Semantic Segmentation via an ICNet CNN; and a general Image Enhancement problem (encompassing things like “color enhancement, denoising, sharpening, texture synthesis”); and a memory limitations test which uses the same network as in the deblurring task while testing it over larger and larger image sizes to explore RAM limitations.
  Results: The researchers tested “over 10,000 mobile devices” on the benchmark. The core test for each of the benchmarks nine evaluations is the millisecond time it takes to run the network. The researchers blend  results of each of the nine tests together into an overall “AI-Score”. The top results, when measured via AI Score, are (chipset, score):
#1: Huawei P20 Pro (HiSilicon Kirin 970, 6519)
#2: OnePlus 6 (Snapdragon 845/DSP, 2053)
#3: HTC U12+ (Snapdragon 845, 1708)
#4: Samsung Galaxy S9+ (Exynos 9810 Octa, 1628)
#5: Samsung Galaxy S8 (Exynos 8895 Octa, 1413)
   It’s of particular interest to me that the top-ranking performance seems to come from the special AI accelerator which chips with the HiSilicon chip, especially given that it is a Chinese semiconductor company so provides more evidence of Chinese advancement in this area. It’s also notable to me that Google’s ‘Pixel’ phones didn’t make the top 5 (though they did make the top 10).
  The future: This first version of the benchmark may be slightly skewed due to Huawei managing to chip a device incorporating a custom AI accelerator earlier than many other chipmakers. “The real situation will become clear at the beginning of the next year when the first devices with the Kirin 980, the MediaTek P80 and the next Qualcomm and Samsung Exynos premium SoCs will appear on the market,” the researchers note.
  Full results of this test are available at the official AI Benchmark website.
  Why this matters: I think the emergence of new large-scale benchmarks for applied AI applications represent further evidence for the current era being ‘the Industrialization of AI’. Viewed through this perspective, the creation (and ultimate adoption) of benchmarks gives us a greater ability to model macro progress indicators in the field and use those to better predict not only where hardware & software is today, but also to be able to develop better intuitions about underlying laws that condition the future as well.
  Read more: AI Benchmark: Running Deep Neural Networks on Android Smartphones (Arxiv).
  Check out the full results of the Benchmark here (AI Benchmark).

Toyota researchers propose new monocular depth estimation technique:
…Perhaps a cyclops can estimate depth just as well as a person with two eyes, if deep learning can help?…
Any robot expected to act within the world and around people needs some kind of depth-estimation capability. Such a capability will aid them in estimating the proximity of objects to the car and be a valuable data input for safety-critical calculations like modelling the other entities in the environment and also performing velocity calculations. Therefore, depth estimation systems can be viewed as a key input technology for any self-driving car.
  But depth estimation systems can be difficult to implement, and they can sometimes be expensive as the typical way to do it is to implement a binocular system similar to how humans have two eyes and then use software to offset the differences and use that to estimate depth. But what if you can only afford one sensor? And what if you have a certain accuracy threshold which can be satisfied by somewhat lower accuracy than you would expect to get with binocular vision, but still good enough for your use case?  Then you might want to estimate depth from a single sensor – if so, new deep learning techniques in monocular upscaling and super-resolution might be able to augment and manipulate the data to perform accurate depth estimation in a self-supervised manner.
  That’s the idea behind a technique from the Toyota Research Institute, which proposes a depth estimation technique that uses encoder and decoder networks to learn a good representation of depth that can be applied to new images. This new technique obtains higher accuracy scores for depths of various ranges, setting state of the art scores on 5 out of 6 benchmarks. It relies on the usage of a “sub-pixel convolutional layer based on ESPCN for depth super-resolution”. This component “synthesizes the high-resolution disparities from their corresponding low-resolution multi-scale model outputs”.
  Qualitative evaluation: Samples generated by the dataset display greater specificity and smoothness than others. This is in part due to the use of the sub-pixel resolution technique. This technique yields an effect in samples shown in the paper that strikes me as being visually similar to the outcomes of an anti-aliasing process within traditional computer graphics.
  Read more: SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

California considers Turing Test law:
California’s Senate is considering a bill making it unlawful to use bots to mislead individuals about their artificial identities in order to influence their purchases or voting behaviour. The bill appears to be focused on a few specific use-cases, particularly social media bots. The proposed law would come into force in July 2019.
  Why it matters: This law points to an issue that will become increasingly important as AI systems’ ability to mimic humans improves. This received attention earlier this year when Google demonstrated their Duplex voice assistant mimicking a human to book appointments. After significant backlash, Google announced the system would make a verbal disclosure that it was an AI. Technological solutions will be important in addressing issues around AI identification, particularly since bad actors are unlikely to be concerned with lawfulness.
  Read more: California Senate Bill 1001.

OpenAI Bits & Pieces:

Digging into AI safety with Paul Christiano:
Ever wondered about technical solutions to AI alignment, what the long-term policy future looks like when the world contains intelligent machines, and how we expect machine learning to interact with science? Yes? Then check out this 80,000 hours podcast with Paul Christiano of OpenAI’s safety team.
  Read more: Dr Paul Christiano on how OpenAI is developing real solutions to the ‘AI alignment problem’, and his vision of how humanity will progressively hand over decision-making to AI systems.

Tech Tales:

The Day We Saw The Shadow Companies and Ran The Big Excel Calculation That Told Us Something Was Wrong.

A fragment of a report from the ‘Ministry of Industrial Planning and Analysis, recovered following the Disruptive Event. See case file #892 for further information. Refer to [REDACTED] for additional context.

Aluminium supplier. Welder. Small PCB board manufacturer. Electronics contractor. Solar panel farm. Regional utility supplier. Mid-size drone designer. 3D world architect.

What do these things have in common? They’re all businesses, and they all have, as far as we can work out, zero employees. Sure, they employ some contractors to do some physical work, but mostly these businesses are run on a combination of pre-existing capital investments, robotic process automation, and the occasional short-term set of human hands.

So far, so normal. We get a lot of automated companies these days. What’s different about this is the density of trades between these companies. The more we look at their business records, the more intra-company activity we see.

One example: The PCB boards get passed to an electronics contractor which does… something… to them, then they get passed to a mid-size drone designer which does… something… to them, then a drone makes its way to a welder which does… something… to the drone, then the drone gets shipped to the utility supplier and begins survey flights of the utility field.

Another example: The solar panel gets shipped to the welder. Then the PCB board manufacturer ships something to the welder. Then out comes a solar panel with some boards on it. This gets shipped to the regional utility supplier which sub-contracts with the welder which comes to the site and does some welding at a specific location overseen by a modified drone.

None of these actions are illegal. And none of our automated algorithms pick these kinds of events up. It’s almost like they’re designed to be indistinguishable from normal businesses. But something about it doesn’t register right to us.

We have a tool we use. It’s called the human to capital ratio. Most organizations these days sit somewhere around 1:5. Big, intensive organizations, like oil companies, sit up around 1:25. When we analyze these companies individually we find that they sit right at the edges of normal distributions in terms of capital intensity. But when we perform an aggregate analysis out pops this number: 1:40.

We’ve checked and re-checked and we can’t bring the number down. Who owns these companies? Why do they have so much capital and so few humans? And what is it all driving towards.

Our current best theory, after some conversations with the people in the acronym agencies, is [REDACTED].

Things that inspired this story: Automated capitalism, “the blockchain”, hiding in plain sight, national economic metric measurement and analysis techniques, the reassuring tone of casework files.