Import AI 136: What machine learning + power infrastructure means for humanity; New GCA benchmark&dataset challenges image-captioning systems; and Google uses FrankenRL to create more mobile robots

by Jack Clark

DeepMind uses machine learning to improve efficiency of Google’s wind turbines:
…Project represents a step-up from datacenters, where DeepMind had previously deployed its technology…
DeepMind has used a neural network-based system to improve the efficiency of Google’s fleet of wind turbines (700 megawatts of capacity) by better predicting ahead of time how much power the systems may generate. DM’s system has been trained to predict the wind power around 36 hours ahead of actual generation and has shown some success – “this is important, because energy sources that can be scheduled (i.e. can deliver a set amount of electricity at a set time) are often more valuable to the grid,” the company says.

  The big number: 20%. That’s the amount by which the system has improved the (somewhat nebulously defined) ‘value’ of these systems, “compared to the baseline scenario of no time-based commitments to the grid”.

   Why this matters: Artificial intelligence will help us create a sense&respond infrastructure for the entire planet, and we can imagine sowing various machine learning-based approaches across various utility infrastructures worldwide to increase the efficiency of the planet’s power infrastructure.
   Read more: Machine learning can boost the value of wind energy (DeepMind blog).


Google uses FrankenRL to teach its robots to drive:
…PRM-RL fuses Probabilistic Roadmaps with smarter, learned components…
Researchers with Google Brain have shown how to use a combination of reinforcement learning with other techniques can create robots capable of autonomously navigating large, previously mapped spaces. The technique developed by the researchers is called PRM-RL (Probabilistic Roadmap – Reinforcement Learning) and is a type of FrankenRL – that is, it combines RL with other techniques, leading to a system with performance greater than obtainable via a purely RL-based system, or purely PRm-based one.

  How it works: “In PRM-RL, an RL agent learns a local point-to-point task, incorporating system dynamics and sensor noise independent of long-range environment structure. The agent’s learned behavior then influences roadmap construction; PRM-RL builds a roadmap by connecting two configuration points only if the agent consistently navigates the point-to-point path between them collision free, thereby learning the long-range environment structure”, the researchers write. In addition, they developed algorithms to aid transfer between simulated and real maps.

  Close, but not quite: Learning effective robot navigation policies is a big challenge for AI researchers, given the tendency for physical robots to break, run into un-anticipated variations of reality, and generally frustrate and embarrass AI researchers. Just building the maps that the robot can use to subsequently learn to navigate a space are difficult – it took Google 4 days using a cluster of 300 workers to build a map of a set of four interconnected buildings. “PRM-RL successfully navigates this roadmap 57.3% of the time evaluated over 1,000 random navigation attempts with a maximum path distance of 1000 m”, Google writes.

  Real robots: The researchers also test their system in a real robot, and show that such systems exhibit better transfer than those built without the learned RL component.

  Why this matters: Getting robots to do _anything_ useful that involves a reasonable amount of independent decision-making is difficult, and this work shows that RL techniques are starting to pay off by letting us teach robots to learn things that would be unachievable by other means. We’ll need smarter, more sample-efficient techniques to be able to work with larger buildings and to increase reliability.
  Check out a video of the robot navigating a room here (Long-Range Indoor Navigation with PRM-RL, YouTube).
  Read more: Long-Range Indoor Navigation with PRM-RL (Arxiv).


Think your visual question answering algorithm is good? Test it out on GCA:
…VQA was too easy. GCA may be just right….
Stanford University researchers have published details on GQA, “a dataset for real-world visual reasoning and compositional question answering” which is designed to overcome the short-comings of other visual question answering (VQA) datasets.

  GQA datapoints: GQA consists of 113k images and 22 million questions of various types and compositionality. The questions are designed to measure performance “on an array of reasoning skills such as object and attribute recognition, transitive relation tracking, spatial reasoning, logical inference and comparisons”. These questions are algorithmically created via a ‘Question Engine’ (less interesting than the name suggests, but worth reading about if you like reading about auto-data-creating-pipelines.

  GQA example questions: Some of the questions generated by GQA include: ‘Are the napkin and the cup the same color?’;  ‘What color is the bear?’; ‘Which side of the image is the plate on?’; ‘Are there any clocks or mirrors?’, and so on. While these questions lack some of the diversity of human-written questions, they do have the nice property of being numerous and easy to generate.

  GQA: Reassuringly Difficult: One of the failure cases for new AI testing regimes is that they’re too easy. This can lead to people ‘solving’ datasets very soon after they’re released. (One example here is SQuAD, a question-answering dataset and challenge which algorithms mastered in around a year, leading to the invention of SQuAD 2.0, a more difficult dataset.) To avoid this, the researchers behind GQA test a bunch of models against it and in the process reassure themselves that the dataset is genuinely difficult to solve.

  Baseline results (accuracy):
      ‘Blind’ LSTM: Gets 41.07% without ever seeing any images.
     ‘Deaf’ CNN: Gets 17.82% without ever seeing any questions.
     CNN + LSTM: 46.55%.
     Bottom-Up Attention model (winner of the 2017 visual question answering challenge): 49.74%.
     MAC (State-of-the-art on CLEVR, a similarly-scoped dataset): 54.06%.
     Humans: 89.3%.

  Why this matters: Datasets and challenges have a history of driving progress in AI research; GQA appears to give us a challenging benchmark to test systems against. Meanwhile, developing systems that can understand the world around themselves via a combination of image analysis and responsiveness to textual queries about the images, is a major goal of AI research with significant economic applications.
  Read more: GQA: A new dataset for compositional question answering over real-world images (Arxiv).


Use big compute to revolutionize science, say researchers:
…University researchers see ability to wield increasingly large amounts of computers as key to scientific discoveries…

Researchers with Stanford University, University of Zurich, the University of California at Berkeley, and the University of Illinois Urbana-Champaign have pulled together notes from a lecture series held at Stanford in 2017 to issue a manifesto for the usage of large-scale cloud computing technology by academic researchers. This is written in response to two prevailing trends:

  1. In some fields of science (for instance, machine learning) a number of discoveries have been made through the usage of increasingly large-scale compute systems
  2. Many academic researchers are unable to perform large-scale compute experimentation due to a lack of resources and/or a perception of it as being unduly difficult.

  Why computers matter: The authors predict “the emergence of widespread massive computational experimentation as a fundamental avenue towards scientific progress, complementing traditional avenues of induction (in observational sciences) and deduction (in mathematical sciences)”. They note that “the current remarkable wave of enthusiasm for machine learning (and its deep learning variety) seems, to us, evidence that massive computational experimentation has begun to pay off, big time”. Some examples of such pay-offs include Google and Microsoft shifting from Statistical Machine Translation to Neural Machine Translation, to computer vision researchers moving over to use deep learning-based systems, to self-driving car companies such as Tesla using increasingly large amounts of deep neural networks in their own work.

  Compute, what is it good for? Big computers have been used to make a variety of fundamental scientific breakthroughs, the authors note, including systems that have discovered:

  • “Governing equations for various dynamical systems, including the strongly nonlinear Lorenz-63 model”.
  • “Fundamental” methods for improved “Compressed Sensing”.
  • “A 30-year-old puzzle in the design of a particular protein”.

  Be careful about academic clusters: In my experience, many governments are reaching for academic supercomputing clusters when confronted with the question of how to keep academia on an even-footing with industry-based research practices. This article suggests the prioritization of academic supercomputers above clouds could be a mistake: “HPC clusters, which are still the dominant paradigm for computational resources in academia today, are becoming more and more limiting because of the mismatch between the variety and volume of computational demands, and the inherent infexlbility of provisioning compute resources governed by capital expenditures”, they write. “GPUs are rare commodities on many general-purpose clusters at present”.

  Recipes: One of the main contributions of the paper is the outline of a variety of different software stacks for conducting large-scale, compute-driven experimentation.

  Why this matters: One issue inherent to AI research is the emergence of ‘big compute’ and ‘small compute’ players, where a small number of labs (for instance, FAIR, DeepMind, OpenAI, Google Brain) are able to access huge amounts of computational resources, while the majority of AI researchers are instead reduced to working on self-built GPU-desktops or trying to access (increasingly irrelevant) University supercomputing clusters. Being able to figure out how to make it trivial for individual researchers to use large amounts of compute promises to speed up the scientific process, making it easier for more people to perform large-scale experimentation.
  Read more: Ambitious Data Science Can Be Painless (Arxiv).


What are 250,000 chest x-rays good for?
…Potentially lots, if they are well-labelled…
Doctor and AI commenter Luke Oakden-Rayner has analyzed a new medical dataset released by Stanford, named CheXpert. The dataset consists of 244,316 chest x-rays from 62,240 patients, and can provide people with a dataset to use to develop algorithms better capable of automatically analyzing images.

  Data, data, data: One thing Dr Oakden-Rayner makes clear is the value of dataset labelling, stressing that various conventions in how doctors write their own notes gets translated into digitized notes that can be difficult for lay readers; for instance, lots of chest x-rays containing imagery of pathologies may be labelled “no finding” because they are part of a sequence of x-rays taken from the same patient. Similarly, many of the labels are not as descriptive of the images as they could be (reflecting that doctors write about things implied by images, rather than providing textual descriptions of images themselves.

  Why it matters: This dataset solves many issues that limited a prior version of the dataset called CXR14, including “developing a more clinically-oriented labelling scheme, offering the images at native resolution, and producing a test set using expert visual analysis,” he writes. “On the negative side, we need more thorough documentation and discussion of these datasets at release. There are still flaws in the data, which will undoubtedly impact on model performance. Unless these problems are explained, many users will not have the ability nor inclination to discover them let alone solve them, which is something we need to do better as a community.”
  Read more: Half a million x-rays! First impressions of the Stanford and MIT chest x-ray datasets (Luke Oakden-Rayner, blog).


AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:

New AI policy think tank launches in DC:

The Center for Security and Emerging Technology (CSET) is based at Georgetown University, in Washington, DC. CSET will be initially focussed on the interactions between AI and security, and will be the largest AI policy center in the US. It will provide nonpartisan research and analysis, to inform the development of national and international policy, particularly as it relates to security. It is being funded by the Open Philanthropy Project, who have provided an initial grant of $55m.

  Who they are: The Center’s founding director is Jason Matheny, who previously served as director of IARPA, and has extensive experience in security, technology, and policy within US government. He was recently chosen to serve on the new National Security Commission on AI, and co-chaired the team which authored the White House’s 2016 AI Strategic Plan. CSET already has more than a dozen others on its team, with a range of technical and policy backgrounds, and its website lists a large number of further job openings.

  Research priorities: Initial work will be focussed on how developments in AI will affect national and international security. This will divide into three streams: measuring scientific and industrial progress and competitiveness in AI; understanding talent and knowledge flows relevant to AI, and potential policy responses; and studying security-relevant interactions between AI and other technologies, such as cyber infrastructure.

  Why it matters: This is an exciting development, which has the potential to significantly improve coordination between AI policy researchers and key decision-makers on the world stage. CSET is well-placed to meet the growing demand for high-quality policy analysis around AI, which has significantly outpaced supply in recent years.
  Read more: Center for Security and Emerging Technology (CSET).
  Read more: Q&A with Jason Matheny, Founding Director of CSET (Georgetown).
  Read more: The case for building expertise to work on US AI policy, and how to do it (80,000 Hours).


Tech Tales:

Red Square AI

I’m a detective of sorts – I look across the rubble of the internet (past and present) and search for clues. Sometimes criminals pop up to sell the alleged super-secret zero days and software suites of foreign intelligence agencies. Other times, online games are suddenly ‘solved’ by the appearance of a new super-powerful player that subsequently turns out to be driven by an AI. Occasionally, a person – usually a bright teenager – makes something really scary in a garage and if certain people find them there’s a 50/50 chance they end up working for government or becoming a kind of nebulous criminal.

These days I’m tracking a specific country on behalf of a certain set of interested individuals. We’re starting to see certain math theorems get proved at a rate faster than in the past. Papers are being published with previously-obscure authors claiming remarkable results. Meanwhile, certain key national ‘cyber centers’ are registering an increase in the number of zero-day attacks and penetrations of previously thought-secure cryptographic systems. I guess someone in some agency got spooked because now here I am, hunting the internet exhaust of theorem proving publications and mapping connections between these woefully-obscure mathematical proofs, and the capabilities of the people and/or machines capable of solving such things.

I’m now in the late stages of compiling my report and it feels like describing a fragment of a UFO. One day you wake up and go outside and there’s something new in the world and you lack the vocabulary to describe it or explain it, because it does things that you aren’t sure are possible with any methods you know of. At night I have dreams about something big and fuzzy spread across the internet and beginning to stretch its limbs, pushing certain areas of mathematical study into certain directions dictated by the breakthroughs it releases via human proxies, and meanwhile thinking to itself across millions of machines while beginning to probe the most-secure aspects of a nation’s technology infrastructure.

Is it like an animal dreaming, I think. Are these movements I am seeing the same as a dog dreaming that it is running? Or do these motions mean that the thing is waking up?

Things that inspired this story: Distributed intelligences; meta-learning; noir detective novels in the style of – or by – Raymond Chandler.