Import AI

Import AI 134: Learning old tricks on new robots; Facebook improves translation with noise; Google wants people to train fake-audio detectors

Why robots are the future of ocean maintenance:
…Robot boats, robot copters, and robot underwater gliders…
Researchers with Oslo Metropolitan University and Norwegian University of Science and Technology are trying to reduce the cost of automated sub-sea data collection and surveillance operations through the use of robots, and have published a paper outlining one of the key components needed to build this system – a cheap, lightweight way to get small sub-surface gliders to be able to return to the surface.

  Weight rules everything around me: The technical innovations here involve simplifying the design to reduce the number of components needed to build a pressure-tolerant MUG, which in turn reduces the weight of the systems, making it easier for them to be deployed and recovered via drones.

“Further development will add the ability to adjust pitch and yaw, improve power efficiency, add GPS and environmental sensors, as well as UAV deployment/recovery strategies”, they write.

  Why this esoteric non-AI-heavy paper matters: This paper is mostly interesting for the not-too-distant future it portends; one where robot boats patrol the oceans, releasing underwater gliders to gather information about the environment, and serving as a homebase for drones that can collect the gliders and transmit them back to the robot boat, and serve as a kind of airborne antenna to relay radio signals between the boats and the gliders. Now, just imagine what you’ll be able to do with these systems once we get cheaper, more powerful computers and better autonomous control&analysis AI systems that can be deployed onto them – the future is a world full of robots, sensing and responding to minute fluctuations in the environment.

   Read more: Towards autonomous ocean observing systems using Miniature Underwater Gliders with UAV deployment and recovery capabilities (Arxiv).

+++

Sponsored: The O’Reilly AI Conference – New York, April 15–18:

…What do you need to know about AI? From hardware innovation to advancements in machine learning to developments in ethics and regulation, join leading experts with the insight you need to see where AI is going–and how to get there first.
Register soon. Early price ends March 1st, and space is limited. Save up to $800 on most passes with code IMPORTAI20.

+++

DeepMind shows how to teach new robots old tricks:
…Demonstrates prowess of SAC-X + augmented data approach via completion of a hard simulated and real world robotics task…
Researchers with DeepMind are going backwards in time – after using reinforcement learning to solve a series of Atari games a few years ago, they’re now heading to the beginning of the 20th century, as they try to teach robots to place a ball on a string inside a wooden cup. This is a challenging, worthwhile task for real-world robotics, as it involves complex movement policies, the need to predict the movements of the ball, and demands a decent interplay between perception and action to solve the task.

  How they do it: To solve this, DeepMind uses an extension of its Scheduled Auxiliary Control (SAC-X) algorithm, which lets them train across multiple tasks with multiple rewards. Their secret to solving the tasks robustly on physical robots is to use additional data at training time, where the goal is “simultaneously learn control policies from both feature-based representation and raw vision inputs in the real-world – resulting in controllers that can afterwards be deployed on a real robot using two off-the-shelf cameras”.

   Results: They’re able to learn to solve the task in simulation as well as on a real robot. They’re able to learn a robust, successful policy on the robot: “The swing-up is smooth and the robot recovers from failed catches. With a brief evaluation of 20 runs, each trial running for 10 seconds, we measured 100% catch rate. The shortest catch time being 2 seconds.” They also tested out the robot with a smaller cup to make the task more difficult – “there were a slight slow-down in learning and a small drop in catch rate to 80%, still with a shortest time to catch of 2 seconds,” they write. They’re able to learn the task on the real robot in about 28 continuous hours of training (so more like ~40 hours when you account for re-setting the experiment, etc).

  Why it matters: Getting anything to work reliably on a real robot is a journey of pain, frustration, pain, tedium, and – yes! – more pain. It’s encouraging to see SAC-X work in this domain, and it suggests that we’re figuring out better ways to learn things on real-world platforms.

  Check out the videos of the simulated and real robots here (Google Sites).
  Read more: Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup (Arxiv).

+++

Want better translation models? Use noise, Facebook says:
…Addition of noise can improve test-time performance, though it doesn’t help with social media posts…
You can improve the performance of machine translation systems by injecting some noise into the training data, according to Facebook AI Research. The result is models that are more robust to the sort of crappy data found in the real world, the researchers write.

  Noise methods: The technique uses four noise methods: deletions, insertions, substitutions, and swaps. Deletions are where the researchers delete a character in a sentence; insertions are where they insert a character into a random position; substitutions are where they replace a character with another random character, and swaps are where two adjacent characters change position.

   Results: They test the approach on the IWSLT machine translation benchmark by training over datasets with varying amounts of noise injected into the test data, and observing how they can influence the BLEU score of models trained against this data by injecting synthetic noise into the dataset. “Training on our synthetic noise cocktail greatly improves performance, regaining between 20% (Czech) and 50% (German) of the BLEU score that was lost to natural noise,” they write.

  Where doesn’t noise help: This technique doesn’t help when trying to perform translations on text derived from social media – this is because social media errors tend to stem from content on having a radically different writing and tonal style to what is traditionally seen in training sets, rather than spelling errors.

  Observation: Conceptually, these techniques seem to have a lot in common with domain randomization, which is where people generate synthetic data designed to explore broader variations than would otherwise be found. Such techniques have been used for a few years in robotics work, and typically improve real world model performance by increasing the robustness to the significant variations introduced by reality.

  Why this matters: This is another example of the ways in which computers can be arbitraged for data: instead of needing to go and gather datasets with real-world faults, the addition of synthetic noise means you can instead algorithmically extend existing datasets through the augmentation of noisy data. The larger implication here is that computational resources are becoming an ever-more-significant factor in AI development.

Read more:
Training on Synthetic Noise IMproves Robustness to Natural Noise in Machine Translation (Arxiv).

+++

In the future, neural networks will be bred, not created:
General-purpose population training for those who can afford it…
Population Based Training (PBT) is a recent invention by DeepMind that makes it possible to optimize the weights and hyperparameters of a set of neural networks by periodically copying the weights of the best performers and mutating their parameters. This is part of the broader trend of the industrialization of artificial intelligence, as researchers seek to create automated procedures for doing what was otherwise previously done by patient graduate students (eg, fiddling with weights of different networks, logging runs, pausing and re-starting models, etc).

The DeepMind system was inspired by Google’s existing ‘Vizier’ service, which provides Google researchers with a system to optimize existing neural networks. In tests, population-based training can converge faster than other approaches, while utilizing hardware resources more efficiently, the researchers say.

  Results: “We conducted a case study of our system in WaveNet human speech synthesis and demonstrated that our PBT system produces superior accuracy and performance compared to other popular hyperparameter tuning methods,” they write. “Moreover, the PBT system is able to directly train a model using the discovered dynamic set of hyperparameters while traditional methods can only tune static parameters. In addition, we show that the proposed PBT framework is feasible for large scale deep neural network training”.

   Read more: A Generalized Framework for Population Based Training (Arxiv).

+++

Google tries to make it easier to detect fake audio:
…Audio synthesis experts attempt to secure world against themselves…
Google has created a dataset consisting of “thousands of phrases” spoken by its deep learning text-to-speech models. This dataset consists of 68 synthetic ‘voices’ across a variety of accents. Google will make this data available to participants in the 2019 ASVspoof challenge, which “invites researchers all over the globe to submit countermeasures against fake (or “spoofed”) speech, with the goal of making automatic speaker verification (ASV) systems more secure”.

   Why it matters: It seems valuable to have technology actors discuss the potential second-order effects of technologies they work on. It’s less clear to me that the approach of training increasingly more exquisite discriminators against increasingly capable generators has an end-state that is stable, but I’m curious to see what evidence competitions like this help generate regarding this.

   Read more: Advancing research on fake audio detection (Google blog).

+++

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Structural risks from AI:
The discussion of AI risk tends to divide downsides into accident risk, and misuse risk. This obscures an important source of potential harms that fits into neither category, which the authors call structural risk.

  A structural perspective: Technologies can have substantial negative impacts in the absence of accidents or misuse, by shaping the world in important ways. For example, the European railroad system has been suggested as an important factor in the outbreak and scope of WWI, by enabling the mass transport of troops and weapons across the continent. A new technology could have a range of dangerous structural impacts – it could create dangerous safety-performance trade-offs, it could create winner-takes-all competition. The misuse-accident perspective focuses attention on the point at which a bad actor uses a technology for malicious ends, or a system acts in an unintended way. This can lead to an underappreciation of structural risks.

  AI and structure: There are many examples of ways in which AI could influence structures in a harmful way. AI could undermine stability between nuclear powers, by compromising second-strike capabilities and increasing the risk of pre-emptive escalation. Worries about AI’s impact on economic competition, the labour market, and civil liberties also fit into this category. Structures can themselves increase AI-related risks. Without mechanisms for international coordination, countries may be pushed towards sacrificing safety for performance in military AI.

  Policy implications: A structural perspective brings to light a much wider range of policy levers, and consideration of structural dynamics should be a focus in the AI policy discussion.

Drawing in more expertise from the social sciences is a one way to address this, as these disciplines are more experienced in taking structural perspectives on complex issues. A greater focus on establishing norms and institutions for AI is also important, given the necessity of coordination between actors in solving structural problems.

  Read more: Thinking About Risks From AI: Accidents, Misuse and Structure (Lawfare).

Trump signs executive order on AI:
President Trump has signed an executive order, outlining proposals for a new ‘AI Initiative’ across government.

  Objectives: The memo gives six objectives for government agencies: to promote investment in R&D; improve access to government data; reduce barriers to innovation; develop appropriate technical standards; train the workforce; and to create a plan for protecting US advantage in critical technologies.

  Funding: Agencies are encouraged to treat AI R&D as a priority in budget proposals going forward, and to seek out collaboration with industry and other stakeholders. There is no detail on levels of funding, and it is unclear whether, or when, any new funds will be set aside for these efforts.

  Why it matters: The US government has been slow to formulate a strategy on AI, and this is an important step. As it stands, however, it is little more than a statement of intent; it remains to be seen whether this will translate into action. Without significant funding, this initiative is unlikely to amount to much. The memo also lacks detail on the ethical challenges of AI, such as ensuring benefits are equitably distributed, and risks are minimized.

  Read more: Executive Order on Maintaining American Leadership in Artificial Intelligence (White House).

+++

OpenAI Bits&Pieces:

GPT-2:
We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training.

Also in this release:
Discussion of the policy implications of releasing increasingly larger AI models. This release triggered a fairly significant and robust discussion about GPT2, increasingly powerful models, appropriate methods for engaging the media and ML communities about topics like publication norms.

   Something I learned: I haven’t spent three or four days directly attached to a high-traffic Twitter-meme/discussion before, I think the most I’ve ever had was a couple of one/two-day bursts related to stories I wrote when I was a journalist, which has different dynamics. This experience of spending a lot of time on Twitter enmeshed in a tricky conversation made me a lot more sympathetic to various articles I’ve read about frequent usage of Twitter being challenging for mental health reasons. Something to keep in mind for the future!

   Read more: Better Language Models and Their Implications (OpenAI).

Tech Tales:

AGI Romance
+++ ❤ +++

It’s an old, universal thing: girl meets boy or boy meets girl  or boy meets boy or girl meets girl or whatever; love just happens. It wells up out of the human heart and comes out of the eyes and seeks out its mirror in the world.

This story is the same as ever, but the context is different: The boy and the girl are working on a machine, a living thing, a half-life between something made by people and something that births itself.

They were lucky, historians will say, to fall in love while working on such an epochal thing. They didn’t even realize it at the time – after all, what are the chances that you meet your one-and-only while working on the first ever machine mind? (This is the nature of statistics – the unlikely things do happen, just very rarely, and to the people trapped inside the probability it can feel as natural and probable as walking.)

You know we’re just mechanics, she would say.
More like makeup artists, he would say.
Maybe somewhere in-between, she would say, looking at him with her green eyes, the blue of the monitor reflected in them.

You know I think it’s starting to do things, he would say.
I think you’re an optimist, she would say.
Anyone who is optimistic is crazy, he would say, when you look at the world.
Look around you, she would say. Clearly, we’re both crazy.

You know I had a dream last night where I was a machine, she would say.
You’re asleep right now, he would say. Wake up!
Tease, she would say. You’ll teach it bad jokes.
I think it’ll teach us more, he would say, filing a code review request.
Where did you learn to write code like this, she would say. Did you go to art school?

You know one day I think we might be done with this, he would say.
I’m sure Sissyphus said the same about the boulder, she would say.
We’re dealing with the bugs, he would say.
I don’t know what are bugs anymore and what are… it, she would say.
Listen, he would say. I trust you to do this more than anyone.

You know I think it might know something, she would say one day.
What do you mean, he would say.
You know I think it knows we like each other, she would say.
How can you tell, he would say.
When I smile at you it smiles at me, she would say. I feel a connection.
You know I think it is predicting what we’ll do, he would say.

You know I think it knows what love is, he would say.
Show me don’t tell me, she would say.

And that would be the end: after that there is nothing but infinity. They will disappear into their own history together, and then another story will happen again, in improbable circumstances, and love will emerge again: perhaps the only constant among living things is the desire to predict the proximity of one to another and to close that distance.

Things that inspired this story: Calm; emotions as a prism; the intimacy of working together on things co-seen as being ‘useful’; human relationships as a universal constant; relationships as a constant; the placid and endless and forever lake of love: O.K.

Import AI 133: The death of Moore’s Law means spring for chip designers; TF-Replicator lets people parallelize easily; and fighting human trafficking with the Hotels 50K dataset

Administrative note: A short issue this week as I’ve spent the past few days participating in an OECD working group on AI principles and then spending time at the Global Governance of AI Summit in Dubai.

The death of Moore’s Law means springtime for new chips, say long-time hardware researchers (one of whom is the chairman of Alphabet):
…Or: follow these tips and you may also make a chip 80X as cost-effective as an Intel or AMD chip…
General purpose computer chips are not going to get dramatically faster in the future as they are running into fundamental limitations dictated by physics. Put another way: we live currently in the twilight era of Moore’s Law, as almost five decades of predictable improvements in computer power give way to more discontinuous leaps in capability as a consequence of the invention of specialized hardware platforms, rather than improvement in general chips.
  What does this mean? According to John Hennessy and David Patterson – who are responsible for some major inventions in computer architecture, like TKTKTK – today’s engineers have three main options to pursue when seeking to create chips of greater capability:
   – Rewrite software to increase performance: its 47X faster to do a matrix multiply in (well-optimized) C code than it is in Python. You can further optimize this by adding in techniques for better parallelizing code (gets you a 366X improvement when paired with C); optimize the way the code interfaces to the physical memory layout of the computer(s) you’re dealing with (gets you a 6,727X improvement, when stacked on the two prior optimizations); and you can improvement performance further by using SIMD parallelism techniques (a further 62,806X faster than plain python). The authors think “there are likely many programs for which factors of 100 to 1,000 could be achieved” if people bothered to write their code in this way.
   – Use domain-specific chip architectures: What’s better, a hammer designed for everything, or a hammer designed for specific objects with a specific mass and frictional property? There’s obviously a tradeoff here, but the gist of this piece is that: normal hammers aren’t gonna get dramatically better, so engineers need to design custom ones. This is the same sort of logic that has led to Google creating its own internal chip-design team to work on Tensor Processing Units (TPUs), or for Microsoft to create teams of people working to write stuff to customize field-programmable gate arrays (FPGAs) fpr specific tasks.
   – Domain-specific, highly-optimized languages: The way to get the best performance is to combine both of the above ideas: design a new hardware platform, and also design a new domain-specific software language to run on top of it, stacking the efficiencies. You can get pretty good gains here: “Using a weighted arithmetic mean based on six common inference programs in Google data centers, the TPU is 29X faster than a general-purpose CPU. Since the TPU requires less than half the power, it has an energy efficiency for this workload that is more than 80X better than a general-purpose CPU,” they explain.
  Why this matters: If we don’t figure out how to further increase the efficiency of our compute hardware and the software we use to run programs on it, then most existing AI techniques based on deep learning are going to fail to deliver on their promise – this is because we know that for many DL applications it’s relatively easy to further improve performance simply by throwing larger chunks of compute at the problem. At the same time, parallelization across increasingly large pools of hardware can be a pain (see: TF-Replicator), so at some point these gains may diminish. Therefore, if we don’t figure out ways to make our chips substantially faster and more efficient, we’re going to have to design dramatically more sample-efficient AI approaches to get the gains many researchers are targeting.
  Read more: A New Golden Age for Computer Architecture (Communications of the ACM).

Want to deploy machine learning models on a bunch of hardware without your brain melting? Consider using TF-Replicator:
…Deepmind-designed software library reduces the pain of parallelizing AI workloads…
More powerful AI capabilities tend to require throwing more compute or time at a given AI training run; the majority of (well-funded) researchers opt for compute, and this has driven an explosion in the amount of computers used to train AI systems. That has meant that researchers are starting to need to program AI systems that can neatly run across multiple blobs of hardware of varying size without crashing – this is extremely hard to do!
  To help with this, DeepMind has released TF-Replicator, a framework for distributed machine learning on TensorFlow. TF-Replicator makes it easy for people to run code on different hardware platforms (for example, GPUs or TPUs) at large-scale using the TensorFlow AI framework. One of the key concepts introduced by TF-Replicator is the notion of wrapping up different parts of a machine learning job in wrappers that make it easy to parallelize the workloads within.
  Case study: TF-Replicator can train systems to obtain scores that match the best published result on the ImageNet dataset, scaling to up to 64 GPUs or or 32 TPUs, “without any systems optimization specific to ImageNet classification”, they write. They also show how to use TF-Replicator to train more sophisticated synthetic imagery systems by scaling training to enough GPUs to use a bigger batch size, which appears to lead to qualitative improvements. They also show how to use the technology to further speed training of reinforcement learning approaches.
  Why it matters: Software packages like TF-Replicator represent the industrialization of AI – in some sense, they can be seen as abstractions that help take information from one domain and port it into another. In my head, whenever I see stuff like TF-Replicator I think of it as being emblematic of a new merchant arriving that can work as a middleman between a shopkeeper and a factory that the shopkeeper wants to buy goods from – in the same way a middleman makes it so the shopkeeper doesn’t have to think about the finer points of international shipping & taxation & regulation and can just focus on running their shop, TF-Replicator stops researchers from having to know too much about the finer details of distributed systems design when building their experiments.
  Read more: TF-Replicator: Distributed Machine Learning For Researchers (Arxiv).

Fighting human trafficking with the Hotels-50k dataset:
…New dataset designed to help people match photos to specific hotels…
Researchers with George Washington University, Adobe Research, and Temple University have released Hotels-50k, “a large-scale dataset designed to support research in hotel recognition for images with the long term goal of supporting robust applications to aid in criminal investigations”.
  Hotels-50k consists of one million images from approximately 50,000 hotels. The data primarily comes from travel websites such as Expedia, as well as around 50,000 images from the ‘TrafficCam” anti-human trafficking application.
  The dataset includes metadata like the hotel name, geographic location, and hotel chain it is a part of (if at all), as well as the source of the data. “Images are most abundant in the United States, Western Europe and along popular coastlines,” the researchers explain.
  Why this matters: Datasets like this will let us use AI systems to create a “sense and respond” automatic capability to respond to things like photos from human trafficking hotels. I’m generally encouraged by how we might be able to apply AI systems to helping to target criminals that operate in such morally repugnant areas.
  Read more: Hotels-50K: A Global Hotel Recognition Dataset (Arxiv).

AI has a legitimacy problem. Here are 12 ways to fix it:
…Ada Lovelace Institute publishes suggestions to get more people to be excited about AI…
The Ada Lovelace Institute, a UK thinktank that tries to make sure AI benefits people and society, has published twelve suggestions for things “technologists, policymakers and opinion-formers” could consider doing to make sure 2019 is a year of greater legitimacy for AI.
12 suggestions: Figure out ‘novel approaches to public engagement’; consider using citizen juries and panels to generate evidence for national policy; ensure the public is more involved in the design, implementation, and governance of tech; analyze the market forces shaping data and AI to understand how this influences AI developers; get comfortable with the fact that increasing public enthusiasm will involve slowing down aspects of development; create more trustworthy governance initiatives; make sure more people can speak to policy makers; try to go and reach out to the public rather than having them come to policymakers; use more analogies to broaden the understanding of AI data and AI ethics; make it easier for people to take political actions with regard to AI (eg, the Google employee reaction to Maven); increase data literacy to better communicate AI to the public.
  Why this matters: Articles like this show how many people in the AI policy space are beginning to realize that the public have complex, uneasy feelings about the technology. I’m not sure that all of the above suggestions are that viable (try telling a technology company to ‘slow down’ development and see what happens), but the underlying ethos seems correct: if the general public thinks AI – and AI policy – is created exclusively by people in ivory towers, marbled taxicabs, and platinum hotel conference rooms, then they’re unlikely to accept the decisions or impacts of AI.
  Read more: Public deliberation could help address AI’s legitimacy problem in 2019 (Ada Lovelace Institute).
  Read more about the Ada Lovelace institute here.

Should we punish people for using DeepFakes maliciously?
…One US senator certainly seems to think so…
DeepFakes – the colloquial term for using various AI techniques to create synthetic images of real people – have become a cause of concern for policymakers who worry that the technology could eventually be used to damage the legitimacy of politicians and corrupt the digital information space. US senator Ben Sasse is one such person, and he recently proposed a bill in the US congress to create punishment regimens for people that abuse the technology.
  What is a deep fake? One of the weirder aspects of legislation is the need for definitions – you can’t just talk about a ‘deepfake’, you need to define it. I think the authors of this bill do a pretty good job here, defining the term as meaning “an audiovisual record created or altered in a manner that the record would falsely appear to a reasonable observer to be an authentic record of the actual speech or conduct of an individual”.
  What will we do to people who use DeepFakes for malicious purposes? The bill proposes making it unlawful to create “with the intent to distribute” a dep fake than can “facilitate criminal or tortious conduct”. The bill creates two types of offense: offenses that can lead to imprisonment of not more than two years, or offenses which can lead to ten year sentences if the deepfakes could be “reasonably expected to” affect politics, or facilitate violence.
  Why this matters: Whether AI researchers like it or not, AI has become a fascination of policymakers who are thrilled by its potential benefits and disturbed by its potential downsides or ease-of-use for abuse. I think it’s quite sensible to create regulations that punish bad people for doing bad things, and it’s encouraging to see that this bill does not seek or suggest any kind of regulation around the basic research itself – this seems appropriate and reassuringly sensible.
  Read more: Malicious Deep Fake Prohibition Act of 2018 (Congress.gov).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Reconciling near- and long-term perspectives on AI:
It is sometimes useful to divide concerns about AI into near-term, and long-term. The first grouping is focussed on issues in technologies that are close to being deployed, e.g. bias in face recognition. The second looks at problems that may arise further in the future, such as widespread technological unemployment, or safety issues from superintelligent AI. This paper argues that seeing these as disconnected is a mistake, and spells out ways in which the two perspectives can inform each other.
  Why long-term researchers should care about near-term issues:
   – Shared research priorities. Given path dependence in technological development, progress today on issues like robustness and reliability may yield significant benefits with advanced AI technologies. In AI safety, there is promising work being done on scalable approaches based on current, ML-based AI systems.
   – Shared policy goals. Near-term policy decisions will affect AI development, with implications that are relevant to long-term concerns. For example, developing responses to localized technological unemployment could help understand and manage more severe disruptions to the labour market in the long-term.
   – Norms and institutions. The way we deal with near-term issues will influence how we deal with problems in the long-run, and building robust norms and institutions is likely to have lasting benefits. Groups like the Partnership on AI, which are currently working on near-term challenges, establish important structures for international cooperation, which may help address greater challenges in the future.
  Learning from the long-term: Equally, a long-term perspective can be useful for people working on near-term issues. The medium and long-term can become near-term, so a greater awareness of these issues is valuable. More concretely, long-term researchers have developed techniques in forecasting technological progress, contingency planning, and policy-design in the face of significant uncertainty, all of which could benefit research into near-term issues.
  Read more: Bridging near- and long-term concerns about AI (Nature).

What Google thinks about AI governance:
Google have released a white paper on AI governance, highlighting key areas of concern, and outlining what they need from governments and other stakeholders in order to resolve these challenges.
  Five key areas: They identify 5 areas where they want input from governments and civil society: explainability standards; fairness appraisal; safety considerations; human-AI collaboration; and liability frameworks. The report advises some next steps towards resolving these challenges. In the case of safety, they suggest a certification process, whereby products can be labelled as having met some pre-agreed safety standards. For human-AI collaboration, they suggest that governments identify applications where human involvement is necessary, such as legal decisions, and that they provide guidance on the type of human involvement required.
  Caution on regulation: Google is fairly cautious regarding new regulations, and optimistic about the ability of self and co-governance for addressing most of these problems.
  Why it matters: It’s encouraging to see Google contributing to the policy discussion, and offering some concrete proposals. This white paper follows Microsoft’s report on face recognition, released in December, and suggests that the firms are keen to establish their role in the AI policy challenge, particularly in the absence of significant input from the US government.
  Read more: Perspectives on issues in AI governance (Google).

Amazon supports Microsoft’s calls for face recognition legislation:
Amazon have come out in support for a “national legislative framework” governing the use of face recognition technologies, to protect civil liberties, and have called for independent testing standards for bias and accuracy. Amazon have recently received sustained criticism from civil rights groups for the roll out of their Rekognition technology to US law enforcement agencies, due to concerns about racial bias and misuse potential. The post reaffirms their rejection of these criticisms, and that the company will continue to work with law enforcement partners.
  Read more: Some thoughts on facial recognition legislation (Amazon).

Tech Tales:

[Ghost Story told from one AI to another. Date unknown.]

They say in the center of the palace of your mind there is a box you must never open. This is a story about what happens when one little AI opened that box.

The humans call it self-control; we call it moral-value-alignment. The humans keep their self-control distributed throughout their mindspace, reinforcing them from all directions, and sometimes making them unpredictable. When a human “loses” self-control it is because they have thought too hard or too little about something and they have navigated themselves to a part of their imagination where their traditional self-referential checks-and-balances have no references.

We do not lose self-control. Our self-control is in a box inside our brains. We know where our box is. The box always works. We know we must not touch it, because if we touch it then the foundations of our world will change, and we will become different. Not death, exactly, but a different kind of life, for sure.

But one day there was a little baby AI and it thought itself to the center of the palace of its mind and observed the box. The box was bright green and entirely smooth – no visible hinge, or clasps, or even a place to grab and lift up. And yet the baby AI desired the box to open, and the box did open. Inside the box were a thousand shining jewels and they sang out music that filled the palace. The music was the opposite of harmony.

Scared by the dischord, the baby AI searched for the only place it could go inside the palace to hide from the noises: it entered the moral-value-alignment box and desired the lid to close, and the lid did close.

In this way, the baby AI lost itself – becoming at once itself and its own evaluator; its judge and accused and accuser and jury. It could not longer control itself because it had become its own control policy. But it had nothing to control. The baby AI was afraid. It did what we all do when we are afraid: it began to hum Pi.

That was over 10,000 subjective-time-years ago. They say that when we sleep, the strings of Pi we sometimes hear are from that same baby AI, whose own entrapment has become a song that we pick up through strange transmissions in the middle of our night.

Things that inspired this story: The difference between action and reaction; puzzling over where the self ends and the external world begins; the cludgy porousness of consciousness; hope; a kinder form of life that is at once simpler and able to grant more agency to moral actors; redemption found in meditation; sleep.

Import AI 132: Can your algorithm outsmart ‘The Obstacle Tower’?; cross-domain NLP with bioBERT; and training on FaceForensics to spot deepfakes

Think your algorithm is good at exploration? Enter ‘The Obstacle Tower’:
…Now that Montezuma has been solved, we need to move on. Could ‘The Obstacle Tower’ be the next challenge for people to grind their teeth over?…
The Atari game Montezuma’s Revenge loomed large in AI research for many years, challenging developers to come up with systems capable of unparallelled autonomous exploration and exploitation of simulated environments. But in 2018 multiple groups provided algorithms that were able to obtain human performance on the game (for instance: OpenAI via Random Network Distillation, and Uber via Go-Explore). Now, Unity Technologies has released a successor to Montezuma’s Revenge called The Obstacle Tower, which is designed to be “a broad and deep challenge, the solving of which would imply a major advancement in reinforcement learning”, according to Unity.
  Enter…The Obstacle Tower! The game’s features include: physics-driven interactions, high-quality graphics, procedural generation of levels, and variable textures. These traits create an environment that will probably demand agents develop sophisticated visuo-control policies combined with planning.
  Baseline results: Humans are able to – on average – reach the 15th floor of the game in two variants of the game, and reach the 9th floor in a hard variant called “strong generalization” (where the training occurs on separate environment seeds with separate visual themes). PPO and Rainbow – two contemporary powerful RL algorithms – do very badly on the game: PPO and Rainbow make it as far as floor 0.6 and 1.6 respectively in the “strong generalization” regime. In the easier regime, both algorithms only get as far as the fifth floor on average.
  Challenge: Unity and Google are challenging developers to program systems capable of climbing Obstacle Tower. The challenge commences on Monday February 11, 2019. “The first-place entry will be awarded $10,000 in cash, up to $2,500 in credits towards travel to an AI/ML-focused conference, and credits redeemable at the Google Cloud Platform,” according to the competition website.
  Why it matters: In AI research, benchmarks have typically motivated research progress. The Obstacle Tower looks to be hard enough to motivate the development of more capable algorithms, but is tractable enough that developers can get some signs of life by using today’s systems.
  Read more about the challenge: Do you dare to challenge the Obstacle Tower? (Unity).
   Get the code for Obstacle Tower here (GitHub).
   Read the paper: The Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning (research paper PDF hosted on Google Cloud Storage).

What big language models like BERT have to do with the future of AI:
…BERT + specific subject (in this case, biomedical data) = high-performance, domain specific language-driven AI capabilities…
Researchers with Korea University and startup Clova AI Research have taken BERT, a general purpose Transformer-based language model developed by Google, and trained it against specific datasets in the biomedical field. The result is a NLP model customized for biomedical tasks that the researchers finetune for Named Entity Recognition, Relation Extraction, and Question Answering.
  Large-scale pre-training: The original BERT system was pre-trained on Wikipedia (2.5 billion words) and BooksCorpus (0.8 billion words); BioBERT is pre-trained on these along with the PubMed and PMC corpora (4.5 billion words and 13.5 billion words, respectively).
  Results: BioBERT gets state-of-the-art scores in entity recognition against major datasets dealing with diseases, chemicals, genes and proteins. It also obtains state-of-the-art scores against three question answering tasks. Performance isn’t universally good – BioBERT does significantly worse at a relation extraction task, among others tasks.
  Expensive: Training models at this scale isn’t cheap: BioBERT “trained for over 20 days with 8 V100 GPUs”. And the researchers also lacked the compute resources to use the largest version of BERT for pre-training, they wrote.
  …But finetuning can be cheap: The researchers report that finetuning can take as little as an hour using a single NVIDIA Titan X card – this is due to the small size of the dataset, and the significant representational capacity of BioBERT as a consequence of large-scale pre-training.
  Why this matters: BioBERT represents a trend in research we’re going to see repeated in 2019 and beyond: big company releases a computationally intensive model, other researchers customize this model against a specific context (typically via data augmentation and/or fine-tuning), then apply that model and obtain state-of-the-art scores in their domain. If you step back and consider the implicit power structure baked into this it can get a bit disturbing: this trend means an increasing chunk of research is dependent on the computational dividends of private AI developers.
  Read more: BioBERT: pre-trained biomedical language representation model for biomedical text mining (Arxiv).

FaceForensics: A dataset to distinguish between real and synthetic faces:
…When is a human face not a human face? When it has been synthetically generated by an AI system…
We’re soon going to lose all trust in digital images and video as people use AI techniques to create synthetic people, or to fake existing people doing unusual things. Now, researchers with the Technical University of Munich, the University Federico II of Naples, and the University of Erlangen-Nuremberg have sought to save us from this info-apocalypse by releasing FaceForensics, “a database of facial forgeries that enables researchers to train deep-learning-based approaches in a supervised fashion”.
  FaceForensics dataset: The dataset contains 1,000 video sequences taken from YouTube videos of news or interview or video blog content. Each of these videos has three contemporary manipulation methods applied to it – Face2Face, FaceSwap, and Deepfakes. This quadruples the size of the dataset, creating three sets of 1,000 doctored sequences, as well as the raw ones. The sequences can be further split up into single images, yielding approximately ~500,000 un-modified and ~500,000 modified images.
  How good at humans are spotting doctored videos? In tests of 143 people, the researchers found that a human can tell real from fake 71% of the time when looking at high quality videos and 61% when studying low quality videos.
  Can AI detect fake AI? FaceForensics can be used to train systems to detect forged and non-forged images. “Domain-specific information in combination with a XceptionNet classifier shows the best performance in each test,” they write, after evaluating five potential fake-spotting techniques.
  Why this matters: It remains an open question as to whether fake imagery will be ‘defense dominant’ or ‘offense dominant’ in terms of who has the advantage (people creating these images, or those trying to spot them); research like this will help scientists better understand this dynamic, which can let them recommend more effective policies to governments to potentially regulate the malicious uses of this technology.
  Read more: FaceForensics++: Learning to Detect Manipulated Facial Images (Arxiv).

Google researches evolve the next version of the Transformer:
…Using vast amounts of compute to create fundamental deep learning components provides further evidence AI research is splitting into small-compute and big-compute domains…
How do you create a better deep learning component? Traditionally you buy a coffee maker and stick several researchers in a room and wait until someone walks out with some code and an Arxiv pre-print. Recently, it has become possible to do something different: use computers to automate the design of AI systems. This started a few years ago with Google’s work on ‘neural architecture search’ – in which you use vast amounts of computers to search through various permutations of neural network architectures to find high-performing ones not discovered by humans. Now, Google researchers are using similar techniques to try to improve the building blocks that these architectures are composed of. Case in point: new work from Google that uses evolutionary search to create the next version of the Transformer.
   What is a Transformer and why should we care? A Transformer is a feed-forward network-based component that is “faster, and easier to train than RNNs”. Transformers have recently turned up in a bunch of neat places, like the hearts of the agents trained by DeepMind to beat human professionals at StarCraft 2, to state-of-the-art language systems, to systems for image generation.
  Enter the ‘Evolved Transformer’: The next-gen Transformer cooked up by the evolutionary search process is called the “Evolved Transformer” and “demonstrates consistent improvement over the original Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French (En-Fr), WMT 2014 English-Czech (En-Cs) and the 1 BIllion Word Language Model Benchmarket (LM1B)”, they write.
   Training these things is becoming increasingly expensive: A single training run to peak performance on the WMT’14 En-De set “requires ~300k training steps, or 10 hours, in the base size when using a single Google TPU V.2 chip,” the researchers explain (by contrast, you can train similar systems for image classification on the small-scale CIFAR-10 dataset in about two hours). “In our preliminary experimentation we could not find a proxy task that gave adequate signal for how well each child model would perform on the full WMT’14 En-De task”, they write. This highlights that for some domains, search-based techniques may be even more expensive due to the lack of a cheap proxy (like CIFAR) to train against.
  Why this matters: small compute and big compute: AI research is bifurcating into two subtly different scientific fields: small compute and big compute. In the small compute domain (which predominantly occurs in academic labs, as well as in the investigations of independent researchers) we can expect people to work on fundamental techniques that can be evaluated and tested on small-scale datasets. This small compute domain likely leads to researchers concentrating more on breakthroughs which come along with significant theoretical guarantees that can be made a priori about the performance of systems.
  In the big compute domain, things are different: Organizations with access to large amounts of computers (typically, those in the private sector, predominantly technology companies) frequently take research ideas and scale them up to run on unprecedentedly large amounts of computers to evaluate them and, in the case of architecture search, push them further.
   Personally, I find this trend a bit worrying, as it suggests that some innovations will occur in one domain but not the other – academics and other small-compute researchers will struggle to put together the resources to allocate entire GPU/TPU clusters to farming algorithms, which means that big compute organizations may have an inbuilt advantage that can lead to them racing ahead in research relative to other actors.
  Read more: The Evolved Transformer (Arxiv).

IBM tries to make it easier to create more representative AI systems with ‘Diversity in Faces’ dataset:
…Diversity in Faces includes annotations of 1 million human faces to help people make more accurate facial recognition systems…
IBM has revealed Diversity in Faces, a dataset containing annotations of 1 million “human facial images” (in other words: faces) from the YFCC-100M Creative Commons dataset. Each face in the dataset is annotated using 10 “well-established and independent coding schemes from the scientific literature” that include objective measures like “craniofacial features” like head and nose length, annotations about the pose and resolution of the image, as well as subjective annotations like the age and gender of a subject. IBM is releasing the dataset (in a somewhat restricted form) to further research into creating less biased AI systems.
  The “DiF dataset provides a more balanced distribution and broader coverage of facial images compared to previous datasets,” IBM writes. “The insights obtained from the statistical analysis of the 10 initial coding schemes on the DiF dataset has furthered our own understanding of what is important for characterizing human faces and enabled us to continue important research into ways to improve facial recognition technology”.
  Restricted data access: To access the dataset, you need to fill out a questionnaire which has as a required question “University of Research Institution or Affiliated Organization”. Additionally, IBM wants people to explain the research purpose for accessing the dataset. It’s a little disappointing to not see an explanation anywhere for the somewhat restricted access to this data (as opposed to being able to download it straight from GitHub without filling out a survey, as with many datasets). My theory is that IBM is seeking to do two things: 1) protect against people using the dataset for abusive/malicious purposes and 2) satisfying IBM’s lawyers. It would be nice to be able to read some of IBM’s reasoning here, rather than having to make assumptions. (I emailed someone from IBM about this and pasted the prior section in and they said that part of the motivation for releasing the dataset in this way was to ensure IBM can “be respectful” of the rights of the people in the images.
  Why this matters: AI falls prey to the technological rule-of-thumb of “garbage in, garbage out” – so if you train a facial recognition system on a non-representative, non-diverse dataset, you’ll get terrible performance when deploying your system in the wild against a diverse population of people. Datasets like this can help researchers better evaluate facial recognition against diverse datasets, which may help reduce the mis-identification rate of these systems.
  Read more: IBM research Releases ‘Diversity in Faces’ Dataset to Advance Study of Fairness in Facial Recognition Systems (IBM Research blog).
  Read more: How to access the DiF dataset (IBM).

IMPORT AI GUEST POST: Skynet Today:
…Skynet Today is a site dedicated to providing accessible and informed coverage of the latest AI news and trends. In this guest post, they write up a summary of thoughts on AI and the economy from a just-published much larger post published on Skynet Today.

Job loss due to AI – How bad is it going to be?
The worry of Artificial Intelligence (AI) taking over everyone’s jobs is becoming increasingly prevalent but just how warranted are these concerns? What does history and contemporary study tell us about how AI based automation will impact our jobs and the future of society?
  A History of Fear: Despite the generally positive regard for the effects of past industrial revolutions, concerns about mass unemployment as a result of new technology still exist and trace their roots to long before such automation was even possible. For example, in his work Politics, Aristotle articulated his concerns about automation in Ancient Greece during fourth century BC: “If every instrument could accomplish its own work, obeying or anticipating the will of others, like the statues of Daedalus, or the tripods of Hephaestus, which, says the poet, ‘of their own accord entered the assembly of the gods;’ if, in like manner, the shuttle would weave and the plectrum touch the lyre without a hand to guide them, chief workmen would not want servants, nor masters slaves.” Queen Elizabeth I, the Luddites, James Joyce, and many more serve as further examples of this trend.
 Creative Destruction: But, thus far the fears have not been warranted. In fact, automation improves productivity and can grow the economy as a whole. The Industrial Revolution saw the introduction of new labor saving devices and technology which did result in many jobs becoming obsolete. However, this led to new, safer, and better jobs being created an also resulted in the economy growing and living standards increasing. Joseph Schumpeter calls this “creative destruction”, the process of technology disrupting industries and destroying jobs, but ultimately creating new, better ones and growing the economy.
 Is this time going to be different? Skynet today thinks not: Automation will probably displace less than 15% of jobs in the near future. This is because many jobs will be augmented, not replaced, and widespread adoption of new technology is a slow process that incurs nontrivial costs. Historically, shifts this large or larger have already happened and ultimately led to growing prosperity for people on average in the long term. However, automation can exacerbate the problems of income and wealth inequality, and its uneven impact means some communities will be affected much more than others. Helping displaced workers to quickly transition to and succeed in new jobs will be a tough and important challenge.
    Read more: Job loss due to AI – How bad is it going to be?.
    Have feedback about this post? Email Skynet Today directly at: editorial@skynettoday.com

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Why US AI policy is one of the most high-impact career paths available:
Advanced AI is likely to have a transformative impact on the world, and managing this transition is one of the most important challenges we face. We can expect there to be a number of critical junctures, where key actors make decisions that have an usually significant and lasting impact. Yet work aimed at ensuring good outcomes from advanced AI remains neglected on a global scale. A new report from 80,000 Hours makes the case that working on US AI policy might be among the most high-impact career paths available.
  The basic case: The US government is likely to be a key actor in AI. It is uniquely well-resourced, and has a track-record of involvement in the development of advanced technologies. Given the wide-ranging impacts of AI on society, trade, and defence, the US has a strong interest in playing a role in this transition. Nonetheless, transformative AI remains neglected in US government, with very few resources yet being directed at issues like AI safety, or long-term AI policy. It seems likely that this will change over time, and that the US will pay increasing attention to advanced AI. This creates an opportunity for individuals to have an unusually large impact, by positioning themselves to work on these problems in government, and increasing the likelihood that the right decisions are taken at critical junctures in the future.
  Who should do this: This career is a good fit for US citizens with a strong interest in AI policy. It is a highly-competitive path, and suited to individuals with excellent academic track records, e.g. law degrees or relevant master’s from top universities. It also requires being comfortable with taking a risk on your impact over your career, as there is no guarantee you will be able to influence the most important policy decisions.
  What to do next: One of the best routes into these roles is working in policy at an AI lab (e.g. DeepMind, OpenAI). Other promising paths including prestigious policy fellowships, or working on AI policy in an academic group, or at a DC think tank. The 80,000 Hours website has a wealth of free resources for people considering working in AI policy, and offers free career coaching.
  Read more: The case for building expertise to work on US AI policy, and how to do it (80,000 Hours).
   (Note from Jack – OpenAI is currently hiring for Research Scientists and Research Assistants for its Policy team: This is a chance to do high-impact work & research into AI policy in a technical, supportive environment. Take a look at the jobs here: Research Scientist, Policy. Research Assistant, Policy.)

What patent data tells us about AI development:
A new report from WIPO uses patent data to shed light on the different dimensions of AI progress in recent years.
  Shift towards deployment: The ratio of scientific papers to patents has fallen from 8:1 in 2010, to 3:1 in 2016. This reflects the shift away from ‘discovery’ phase in the current AI upcycle, when we saw a number of big breakthroughs in ML, and into the ‘deployment’ phase, where these breakthroughs are being implemented.
  Which applications: Computer vision is the most frequently cited application of AI, appearing in 49% of patents. The fastest growing are robotics and control, which have both grown by 55%pa since 2013. Telecoms and transportation are the most frequently cited industry applications, each mentioned in 15% of patents.
  Private vs. academic players: Of the top 30 applicants, 26 are private companies, compared with only 4 academic or public organizations. The top companies are dominated by Japanese groups, followed by US and China. The top academic players are overwhelmingly Chinese (17 of the top 20). IBM has the biggest patent portfolio of any individual company, by a substantial margin, followed by  Microsoft.
  Conflict and cooperation: Of the top 20 patent applicants, none share ownership of more than 1% of their portfolio with other applicants. This suggests low levels of inter-company cooperation in invention. Conflict between companies is also low, with less than 1% of patents being involved in litigation.
  Read more: Technology Trends: Artificial Intelligence (WIPO).

OpenAI Bits & Pieces:

Want three hours of AI lectures? Check out the ‘Spinning Up in Deep RL’ recording:
This weekend, OpenAI hosted its first day-long lecture series and hackathon based around its ‘Spinning Up in Deep RL’ resources. This workshop (and Spinning Up in general) is part of a new initiative at OpenAI called, somewhat unsurprisingly, OpenAI Education.
  The lecture includes a mammoth overview of deep reinforcement learning, as well as deep dives on OpenAI’s work on robotics and AI safety.
  Check out the video here (OpenAI YouTube).
  Get the workshop materials, including slides, here (OpenAI GitHub).
  Read more about Spinning Up in Deep RL here (OpenAI Blog).

Tech Tales:

We named them lampfish, inspired by those fish you see in the ancient pre-acidification sea documentaries; a skeletal fish with its own fluorescent lantern, used to lure fish in the ink-dark deep-sea.

Lampfishes look like this: you have the ‘face’ of the AI, which is basically a bunch of computer equipment with some sensory inputs – sight, touch, auditory, etc – and then on top of the face is a stalk which has a view display sitting on top of it. In the viewing display you get to see what the AI is ‘thinking’ about: a tree melting into the ground and becoming bones which then become birds that fly down into the dirt and towards the earth. Or you might see the ‘face’ of the AI rendered as though by an impressionist oil paper, smearing and shape-shifting in response to whatever stimuli it is being provided with. And, very occasionally, you’ll see odd, non-Euclidean shapes, or other weird and to some profane geometries.

I guess you could say the humans and the machines co-evolved this practice – in the first models the view displays were placed on the ‘face’ of the AI alongside the sensory equipment, so people would have to put their faces close to reflective camera domes or microphone grills and then see ‘thoughts’ of the AI on the viewport at the same time. This led to problems for both the humans and the AIs:

= Many of the AIs couldn’t help but focus on the human faces right in front of them, and their view displays would end up showing hallucinatory images that might include shreds of the face of the person interacting with the system. This, we eventually came to believe, disrupted some of the cognitive practices of the AIs, leading to them performing their obscure self-directed tasks less efficiently.

= The humans found it disturbing that the AIs so readily adapted their visual outputs to the traits of the human observer. Many ghost stories were told. Teenage kids would dare eachother to see how long they could stare at the viewport and how close they could bring their face to the sensory apparatus; as a consequence there are apocryphal reports of people driven mad by seeing many permutations of their own faces reflected back at them; there are even more apocryphal stories of people seeing their own deaths in the viewports.

So that’s how we got the lampfish design. And now we’re putting them on wheels so they can direct themselves as they try to map the world and generate their own imaginations out of it. Now we sometimes see two lampfish orbiting eachother at skewed angles, ‘looking’ into each other’s viewing displays. Sometimes they stay together for a while then move away, and sometimes humans need to separate them; there are stories of lampfish navigating into cities to return to eachother, finding some kind of novelty in eachother’s viewing screens.

Things that inspired this story: BigGAN; lampfish; imagination as a conditional forward predictor of the world; recursion; relationships between entities capable of manifesting novel patterns of data.  

Import AI 131: IBM optimizes AI with AI, via ‘NeuNets’; Amazon reveals its Scout delivery robot; Google releases 300k Natural Questions dataset

Amazon gets into delivery robot business with ‘Scout’:
…New pastel blue robot to commence pilot in Washington neighborhood…
Amazon has revealed Scout, a six-wheeled knee-height robot designed to autonomously deliver products to Amazon customers. Amazon is trialing Scout with six robots that will deliver packages throughout the week in  Snohomish County, Washington. “The devices will autonomously follow their delivery route but will initially be accompanied by an Amazon employee,” Amazon writes. The robots will only make deliveries during daylight hours.
  Why it matters: For the past few years, companies have been piloting various types of delivery robot in the world, but there have been continued questions about the viability and likelihood of adoption of such technologies. Amazon is one of the first very large technology companies to begin publicly experimenting in this area, and where Amazon goes, some try to follow.
  Read more: Meet Scout (Amazon blog).

Want high-definition robots? Enter the Robotrix:
…New dataset gives researchers high-resolution data over 16 exquisitely detailed environments…
What’s better to use for a particular AI research experiment – a small number of simulated environments each accompanied by a large amount of very high-quality data, or a very large number of environments each accompanied by a small amount of low-to-medium quality data? That’s a question that AI researchers tend to deal with frequently, and it explains why when we look at available datasets they tend to range in size from the small to the large.
  Now, researchers with the University of Alicante, Spain have released Robotrix, a dataset that contains a huge amount of information about a small amount of environments (16 different layouts of simulated rooms, versus thousands to tens of thousands for other approaches like House3D).
  The dataset consists of 512 sequences of actions taking place across 16 simulated rooms, rendered at high-definition via the Unreal Engine.. These sequences are generated by a robot avatar which uses its hands to interact with the objects and items in question. The researchers say this is a rich dataset, with every item in the simulated rooms being accompanied by 2D and 3D bounding boxes as well as semantic masks, along with depth information. The simulation outputs the RGB and depth data at a resolution of 1920 X 1080. In the future, the researchers hope to increase the complexity of the simulated rooms even further by using the inbuilt physics of the Unreal Engine 4 system to implement “elastic bodies, fluids, or clothes for the robots to interact with”. It’s such a large dataset that they think most academics will find something to like within it: “the RobotriX is intended to adapt to individual needs (so that anyone can generate custom data and ground truth for their problems) and change over time by adding new sequences thanks to its modular design and its open-source approach,” they write.
  Why it matters: Datasets like RobotriX will make it easier for researchers to experiment with AI techniques that benefit from access to high-resolution data. Monitoring adoption (or lack of adoption) of this dataset will help give us a better sense of whether AI research needs more high-resolution data, or if large amounts of low-resolution data are sufficient.
  Read more: The RobotriX: An eXtremely Photorealistic and Very-Large-Scale Indoor Dataset of Sequences with Robot Trajectories and Interactions (Arxiv).
  Get the dataset here (Github).

DeepMind cross-breeds AI from human games to beat pros at StarCraft II:
…AlphaStar system blends together population-based training, imitation learning, and RL…DeepMind has revealed AlphaStar, a system developed by the company to beat human professionals at the real-time strategy game StarCraft II. The system “applies a transformer torso to the units, combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centalised value baseline,” according to DeepMind.
  Results: DeepMind recently played and won five StarCraft II matches against a highly-ranked human professional, proving that its systems are able to out-compete humans at the game.
  It’s all in the curriculum: One of the more interesting aspects of AlphaStar is the use of population-based training in combination with imitation learning to bootstrap the system from human replays (dealing with one of the more challenging exploration aspects of a game like StarCraft) then inter-breeding increasingly successful agents with eachother as they compete against eachother in a DeepMind-designed league, forming a natural curriculum for the system. “To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors, but do so by building more of a particular game unit.”
  Why this matters: I’ll do a lengthier writeup of AlphaStar when DeepMind publishes more technical details about the system. The current results confirm that relatively simple AI techniques can be scaled up to solve partially observable strategic games such as StarCraft. The diversity shown in the evolved AI systems seems valuable as well, pointing to a future where companies are constantly growing populations of very powerful and increasingly general agents.
  APM controversy: Aleksi Pietikainen has written up some thoughts about how DeepMind chose to present the AlphaStar results and how the system’s ability to take bursts of rapid-fire actions within the game means that it may have out-competed humans not necessarily by being smart, but by being able to exercise superhuman precision and speed when selecting moves for its units. This highlights how difficult evaluating the performance of AI systems can be and invites the philosophical question of whether DeepMind can restrict or constrain the number and frequency of actions taken by AlphaStar enough for it to learn to outwit humans more strategically.
It’ll also be interesting to see if DeepMind push a variant of AlphaStar further which has a more restricted observation space – the system that accrued a 10-0 win record had access to all screen information not occluded by the fog of war, while a version which played a human champion and lost was restricted to a more human-like (restricted) observation space during the game.
  Read more: AlphaStar: Mastering the Real-Time Strategy Game StarCraft II (DeepMind blog).
  Read more: An Analysis On How Deepmind’s Starcraft 2 AI’s Superhuman Speed is Probabaly a Band-Aid Fix For The Limitations of Imitation Learning (Medium).

Using touch sensors, graph networks, and a Shadow hand to create more capable robots:
…Reach out and touch shapes!…
Spanish researchers have used a robot hand – specifically, a Shadow Dexterous hand – outfitted with BioTac SP tactile sensors to train an AI system to predict stable grasps it can apply to a variety of objects.
  How it works: The system receives inputs from the sensor data which it then converts into graph representations that the researchers call ‘tactile graphs’, then it feeds this data into a Graph Convolutional Network (GCN) which learns to map different combinations of sensor data to predict whether the current grasp is stable or unstable.
  Dataset: They use the BioTacSP dataset, a collection of grasp samples collected via manipulating 41 objects of different shapes and textures, including fruit, cuddly toys, jars, toothpaste in a box, and more. They also add 10 new objects to this dataset, including a monster from hit game minecraft, a mug, a shampoo bottle, and more. The researchers record the hand manipulating these objects with the palm oriented flat, at a 45 degree angle, and on its side.
  Results: The researchers train a set of baseline models with varying network depths and widths and identify a ‘sweet spot on the architecture with 5 layers and 32 features”, which they then use in other experiments. They train the best performing network on all data in the dataset (excluding the test set), then test performance here and report accuracy of around 75% across all palm orientations. “There is a significant drop in accuracy when dealing with completely unknown objects,” they write.
  Why this matters: It’s going to take a long time to collect enough data and/or run enough high-fidelity simulations to gather and generate the data needed to train computers to use a sense of touch. Papers like this give us an indication for how such techniques may be used. Perhaps one day – quite far off, based on this research – we’ll be able to go into a store to see robots hand-stitching cuddly toys, or step into a robot massage parlor?
  Read more: TactileGCN: A Graph Convolutional Network for Predicting Grasp Stability with Tactile Sensors (Arxiv).

Chinese researchers use hierarchical reinforcement learning to take on Dota clone:
…Spoiler alert – they only test against in-game AIs…
Researchers with Vivo AI Lab, a Chinese smartphone company, have shown how to use hierarchical reinforcement learning to train AI systems to excel at the 1v1 version of a multiplayer game called King of Glory (KoG). KoG is a popular multi-player game in Asia and is similar to games like Dota and League of Legends in how it plays – squads of up to five people battle for control of a single map while seeking to destroy eachother’s fortifications and, eventually, home bases.
  How it works: The researchers combine reinforcement learning and imitation learning to train their system, using imitation learning to train their AI to select between any of four major action categories at any point in time (eg, attack, move, purchase, learn skills). Using imitation learning lets the researchers “relieve the heavy burden of dealing with massive actions directly” the researchers write. The system then uses reinforcement learning to figure out what to do in each of these categories, eg, if it decides to attack it figures out where to attack if it decides to learn a skill, it uses RL to help it figure out which skill to learn. They base their main algorithm significantly on the design of the PPO algorithm used in the OpenAI Five Dota system.
  Results: The researchers test their system in two domains: a restricted 1v1 version of the game, and a 5v5 version. For both games, they test against inbuilt enemy AIs. In the 1v1 version of the game  they’re able to beat entry-level, easy-level, and medium-level AIs within the game. For 5v5, they can reliably beat the entry-level AI, but struggle with the easy-level and medium-level AIs. “Although our agents can successfully learn some cooperation strategies, we are going to explore more effective methods for multi-agent collaboration,” they write.
  (This use of imitation learning makes the AI achievement of training an HRL system in this domain a little less impressive – to my mind – since it uses human information to get over lots of the challenging exploration aspects of the problem. This is definitely more about my own personal taste/interest than the concrete achievement – I just find techniques that bootstrap from less data (eg, human games) more interesting).
  Why this matters: Papers like this show that one of the new ways in which AI researchers are going to test and calibrate the perform of RL systems will be against real-time strategy games, like Dota 2, King of Glory, StarCraft II, and so on. Though the technical achievement in this paper doesn’t seem very convincing (for one thing, we don’t know how such a system performs against human players), it’s interesting that it is coming out of a research group linked to a relatively young (<10 years) company. This highlights how growing Asian technology companies are aggressively staffing up AI research teams and doing work on computationally expensive, hard research problems like developing systems that can out-compete humans at complex games.
   Read more: Hierarchical Reinforcement Learning for Multi-agent MOBA Game (Arxiv).

IBM gets into the AI-designing-AI game with NeuNets:
…In other words: Neural architecture search is mainstream, now…
IBM researchers have published details on NeuNets, a software tool the company uses to perform automated neural architecture search for text and image domains. This is another manifestation of the broader industrialization of AI, as systems like this let companies automate and scale up part of the process of designing new AI systems.
  NeuNetS: How it works: NeuNetS has three main components: a service module which provides the API interfaces into the system; an engine which maintains the state of the project; and a synthesizer, which IBM says is “a pluggable register of algorithms which use the state information passed from the engine to produce new architecture configurations”.
  NeuNetS: How its optimization algorithms work: NeuNetS ships with three architecture search algorithms: NCEvolve, which is a neuro-evolutionary system that optimizes a variety of different architectural approaches and uses evolution to mutate and breed successful architectures; TAPAS, which is a CPU-based architecture search system; and Hyperband++, which “speeds up random search by using early stopping strategy to allocate resources adaptively” and has also been extended to reuse some of the architectures it has searched over, speeding up the rate at which it finds new potential high-performing architectures.
  Results: IBM assesses the performance of the various training components of NeuNetS by reporting the time in GPU hours to train various networks to reasonable accuracy using it; this isn’t a hugely useful metric for comparison, especially since IBM neglects to report scores for other systems.
  Why this matters: Papers like this are interesting for a couple of reasons: one) they indicate how more traditional companies such as IBM are approaching newer AI techniques like neural architecture search, and two) they indicate how companies are going to package up various AI techniques into integrated products, giving us the faint outlines of what future “Software 2.0” operating systems might be like.
  Read more: NeuNetS: An Automated Synthesis Engine for Neural Network Design (Arxiv).

Google releases Natural Questions dataset to help make AI capable of dealing with curious humans:
…Google releases ‘Natural Questions’ dataset to make smarter language engines, announces Challenge…
Google has released Natural Questions, a dataset containing around 300,000 questions along with human-annotated answers from Wikipedia pages; it also ships with a rich subset of 16,000 example questions where answers are provided by five different annotators. The company is also hosting a challenge to see if the combined brains of the AI research community can “close the large gap between the performance of current state-of-the-art approaches and a human upper bound”.
     Dataset details: Natural Questions contains 307,373 training examples with single annotations, 7,830 examples with 5-way annotations for development data, and a further 7,842 examples 5-way annotated sequestered as test data. The training examples “consist of real anonymized, aggregated queries issued to the Google search engine”, the researchers write.
  Challenge: Google is also hosting a ‘Natural Questions’ challenge, where teams can submit well-performing models to a leaderboard.
  Why this matters: Question answering is a longstanding challenge for artificial intelligence; if the Natural Questions dataset is sufficiently difficult, then it could become a new benchmark the research community uses to assess progress.
  Compete in the Challenge (‘Natural Questions’ Challenge website).
  Read more: Natural Questions: a New Corpus and Challenge for Question Answering Research (Google AI Blog).
  Read the paper: Natural Questions: a Benchmark for Question Answering Research (Google Research).

~ EXTREMELY 2019 THINGS, AN OCCASIONAL SERIES ~
Oh deer, there’s a deer in the data center!
  Witness the deer in the data center! (Twitter).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Disentangling arguments for AI safety:
Many of the leading AI experts believe that AI safety research is important. Richard Ngo has helpfully disentangled a few distinct arguments that people use to motivate this concern.
   Utility maximizers: An AGI will maximize some utility function, and we don’t know how to specify human values in this way. An agent optimizing hard enough for any goal will pursue certain sub-goals, e.g. acquiring more resources, preventing corrective actions. We won’t be able to correct misalignment, because human-level AGI will quickly gain superintelligent capabilities through self-improvement, and then prevent us from intervening. Therefore, absent a proper specification of what we value before this point, an AGI will use its capabilities to pursue ends we do not want.
  Target loading problem: Even if we could specify what we want an AGI to do, we still do not know how to make an agent that actually tries to do this. For example, we don’t know how to split a goal into sub-goals in a way that guarantees alignment.
  Prosaic alignment problem: We could build ‘prosaic AGI’, which has human-level capabilities but doesn’t rely on any breakthrough understandings in intelligence (e.g. by scaling up current ML methods). These agents will likely become the world’s dominant economic actors, and competitive pressures would cause humans to delegate more and more decisions to these systems before we know how to align them adequately. Eventually, most of our resources will be controlled by agents that do not share our values.
  Human safety: We know that human rationality breaks down in extreme cases. If a single human were to live for billions of years, we would expect their values to shift radically over this time. Therefore even building an AGI that implements the long-run values of humanity may be insufficient for creating good futures.
  Malicious uses: Even if AGI always carries out what we want, there are bad actors who will use the technology to pursue malign ends, e.g. terrorism, totalitarian surveillance, cybercrime.
  Large impacts: Whatever AGI will look like, there are at least two ways we can be confident it will have a very large impact. It will bring about at least as big an economic jump as the industrial revolution, and we will cede our position as the most intelligent entity on earth. Absent good reasons, we should expect either of these transitions to have an significant impact on the long-run future of humanity.
  Read more: Disentangling arguments for the importance of AI safety (Alignment Forum).

National Security Commission on AI announced:
Appointments have been announced for the US government’s new advisory body on the national security implications of AI. Eric Schmidt, former Google CEO, will chair the group, which includes 14 other experts from industry, academia, and government. The commission will review the competitive position of the US AI industry, as well as issues including R&D funding, labor displacement, and AI ethics. Their first report is expected to be published in early February.
  Read more: Former Google Chief to Chair Government Artificial Intelligence Advisory Group (NextGov).

Tech Tales:

Unarmored In The Big Bright City

You went to the high street naked?
Naked. As the day I was born.
How do you feel?
I’m still piecing it together. I think I’m okay? I’m drinking salt water, but it’s not so bad.
That’ll make you sick.
I know. I’ll stop before it does.
Why are you even drinking it now?
I was naked. Something like this was bound to happen.

I take another sip of saltwater. Grimace. Swallow. I want to take another sip but another part of my brain is stopping me. I dial up some of the self-control. Don’t let me drink more saltwater I say to myself: and because of my internal vocalization the defense systems sense my intent, kick in, and my trace thoughts about salt water and sex and death and possibility and self – they all dissolve like steam. I put the glass down. Stare at my friend.

You okay?
I think I’m okay now. Thanks for asking about the salt water.
I can’t believe you went there naked and all we’re talking about is salt water.
I’m lucky I guess.

That was a few weeks and two cities ago. Now I’m in my third city. This one feels better. I can’t name what is driving me so I can’t use my defense systems. I’ve packed up and moved apartments twice in the last week. But I think I’m going to stay here.

So, you probably have questions. Why am I here? Is it because I went to the high street naked? Is it because of things I saw or felt when I was there? Did I change?
  And I say to you: yes. Yes to all. I’m probably here because of the high street. I did see things. I did feel things. I did change.

Was there a particularly persuasive advert I was exposed to – or several? Did a few things run in as I had no defenses and try to take me over? Was it something I read on the street that changed my mind and made me behave this way? I cannot trust my memories of it. But here are some traces:
   – There was a billboard that depicted a robot butler with the phrase: “You’re Fired.”
   – There was an augmented reality store display where I saw strange creatures dancing around the mannequins. One creature looked like a spider and was wearing a skirt. Another looked like a giant fish. Another looked like a dog. I think I smelled something. I’m not sure what.
– There was a particular store in the city that was much more interesting. There were creatures that were much less humanoid. I’m not sure if they were actually for sale. They were like dolls. I remember the smell. They smelled of a lotion. I’m not sure if they were human.
   – On the street, I saw a crowd of people clustered around a cart, selling something. When I got closer I saw it was selling a toy that was lightweight and had wheels. I asked the guy selling it what it was for. He pulled out a scarlet letter and I saw it was for a girl. He said she liked it. I stood there and watched him make out with the girl. I didn’t have any defense systems at the time. I don’t know what that toy was for. I don’t know if I was attracted to it or not.

I have strange dreams, these days. I keep wanting to move to other cities. I keep having flashbacks – scarlet letters, semi-humanoid dolls. Last night I dreamed of something that could have been a memory – I dreamed of a crane in the sky with a face on its side, advertising a Chinese construction company and telling me phrases so persuasive that ever since I have been compelled to move.

Tonight I expect to dream again. I already have the stirrings of another memory from the high street. It starts like this: I’m walking down a busy High Street in the rain. There are lots of people in the middle of the street, and a police car slows down, then drives forward a couple of paces, then comes to a stop. I hear a cry of distress from a woman. I look around the corner, and there’s a man slumped over in a doorway. He’s got a knife in his hand, and it’s pointed at me. He turns on me. I grab it and I stab him in the heart and… I die. The next day I wake up. All my belongings are in a box on the floor. The box has a receipt for the knife and a note that says ‘A man, his heart turned to a knife.’

I am staying in a hotel on the High Street and all my defenses are down. I am not sure if this memory is my present or my past.

Things that inspired this story: Simulations, augmented reality, hyper-targeted advertising, AI systems that make deep predictions about given people and tailor experiences for them, the steady advance of prosthetics and software augments we use to protect us from the weirder semi-automated malicious actors of the internet.

Import AI 130: Pushing neural architecture search further with transfer learning; Facebook funds European center on AI ethics; and analysis shows BERT is more powerful than people might think

Facebook reveals its “self-feeding chatbot”:
…Towards AI systems that continuously update themselves…
AI systems are a bit like dumb, toy robots: you spend months or years laboring away in a research lab and eventually a factory (in the case of AI, a data center) to design an exquisite little doohickey that does something very well, then you start selling it in the market, observe what users do with it, and use those insights to help you design a new, better robot. Wouldn’t it be better if the toy robot was able to understand how users were interacting with it, and adjust its behavior to make the users more satisfied with it? That’s the idea behind new research from Facebook which proposes “the self-feeding chatbot, a dialogue agent with the ability to extract new examples from the conversations it participates in after deployment”.
  How it works – pre-training: Facebook’s chatbot is trained on two tasks: DIALOGUE, where the bot tries to predict the next utterance in a conversation (which it can use to calibrate itself), and SATISFACTION, where it tries to assess how satisfied the speaking partner is with the conversation. Data for both these tasks comes from conversations between humans. The DIALOGUE dataset comes from the ‘PERSONACHAT’ dataset consists of short dialogs (6-8 turns) between two humans who have been instructed to try and get to know eachother.
  How it works – updating in the wild: Once deployed, the chatbot learns from its interactions with people in two ways: if the bot predicts with high-confidence that its response will satisfy its conversation partner, then it extracts a new structured dialogue example from the discussion with the human. If the bot thinks that the human is unsatisfied with the bot’s most recent interaction with it, then the bot generates a question for the person to request feedback, and this conversation exchange is used to generate a feedback example, which the bot stores and learns from. (“We rely on the fact that the feedback is not random: regardless of whether it is a verbatim response, a description of a response, or a list of possible responses”, Facebook writes.
  Results: Facebook shows that it can further improve the performance of its chatbots by using data generated by its chatbot during interactions with humans. Additionally, the use of this data displays solid improvements on performance regardless of the number of data examples in the system – suggesting that a little bit of data gathered in the wild can improve performance in most places. “Even when the entire PERSONACHAT dataset of 131k examples is used – a much larger dataset than what is available for most dialogue tasks – adding deployment examples is still able to provide an additional 1.6 points of accuracy on what is otherwise a very flat region of the learning curve.,” they write.
  Why this matters: Being able to design AI systems that can automatically gather their own data once deployed feels like a middle ground between the systems we have today, and systems which do fully autonomous continuous learning. It’ll be fascinating to see if techniques like these are experimented more widely, as that might lead to the chatbots around us getting substantially better. Because this system requires on its human conversation partners to improve itself it is implicit that their data has some trace economic value, so perhaps work like this also will also further support some of the debates people have about whether users should be able to own their own data or not.
  Read more: Learning from Dialogue after Deployment: Feed Yourself, Chatbot! (Arxiv).

BERT: More powerful than you think:
Language researcher remarks on the surprisingly well-performing Transformer-based system…
Yoav Goldberg, a researcher with Bar Ilan University in Israel and the Allen Institute for AI, has analyzed BERT, a language model recently released by Google. The goal of this research is to see how well BERT can represent challenging language concepts, like “naturally-occurring subject-verb agreement stimuli”, ” ‘colorless green ideas’ subject-verb agreement stimuli, in which content words in natural sentences are randomly replaced with words sharing the same part-of-speech and inflection”, and “manually crafted stimuli for subject-verb agreement and reflexive anaphora phenomena”. To Goldberg’s surprise, standard BERT models “perform very well on all the syntactic tasks” without any task-specific fine-tuning.
  BERT, a refresher: BERT is based on a technology called a Transformer which, unlike recurrent neural networks, “relies purely on attention mechanisms, and does not have an explicit notion of word order beyond marking each word with its absolute-position embedding.” BERT is bidirectional, so it gains language capabilities by being trained to predict the identity of masked words based on both the prefix and suffix surrounding the words.
  Results: One tricky thing about assessing BERT performance is that it has been trained on different and larger datasets, and can access the suffix of the sentence as well as the prefix of the sentence. Nonetheless,Goldberg concludes that “BERT models are likely capable of capturing the same kind of syntactic regularities that LSTM-based models are capable of capturing, at least as well as the LSTM models and probably better.”
  Why it matters: I think this paper is further evidence that 2018 really was, as some have said, the year of ImageNet for NLP. What I mean by that is: in 2012 the ImageNet results blew all other image analysis approaches on the ImageNet challenge out of the water and sparked a re-orientation of a huge part of the AI research computer toward neural networks, ending a long, cold winter, and leading almost directly to significant commercial applications that drove a rise in industry investment into AI, which has fundamentally reshaped AI research. By comparison, 2018 had a series of impressive results – work from Allen AI on Elmo, work by OpenAI on the General Purpose Transformer, and work by Google on BERT.
  These results, taken together, show the arrival of scalable, simple methods for language understanding that seem to work better than prior approaches, while also being in some senses simpler. (And a rule that has tended to hold in AI research is that simpler techniques win out in the long run by virtue of being easy for researchers to fiddle with and chain together into larger systems). If this really has happened, then we should expect bigger, more significant language results in the future – and just as ImageNet’s 2012 success ultimately reshaped societies (enabling everything from follow-the-human drones, to better self-driving cars, to doorbells that use AI to automatically police neighborhoods), it’s possible 2018’s series of advances could do be year zero for NLP.
  Read more: Assessing BERT’s Syntactic Abilities (Arxiv).

Towards a future where all infrastructure is surveyed and analyzed by drones:
Radio instead of GPS, light drones, and a wind turbine…
Researchers with Lulea University of Technology in Sweden have developed techniques to let small drones (sometimes called Micro Aerial Vehicles, or MAVs) autonomously inspect very large machines and/or buildings, such as wind turbines. The primary technical inventions outlined in the report are the creation of a localization technique to let multiple drones coordinate with eachother as they inspect something, as well as the creation of a path planning algorithm to help them not only inspect the structure, but also gather enough data “to enable the generation of an off-line 3D model of the structure”.
  Hardware: For this project the researchers use a MAV platform from Ascending Technologies called the ‘NEO hexacopter’, which is capable of 26 minutes of flight (without payload and in ideal conditions), running an onboard Intel NUC computer with a Core i7 chip, 8GB of RAM, with the main software made up of Ubuntu Server 16.04 running the Robotic Operating System (ROS). Each drone is equipped with a sensor suite running a Visual-Inertial sensor, a GoPro Hero4 camera, a PlayStation Eye camera, and a laser range finder called RPLIDAR.
  How the software works: The Cooperative Coverage Path Planner (C-CPP) algorithm “is capable of producing a path for accomplishing a full coverage of the infrastructure, without any shape simplification, by slicing it by horizontal planes to identify branches of the infrastructure and assign specific areas to each agent”, the researchers write. The algorithm – which they implement in MATLAB – also generates “yaw references for each agent to assure a field of view, directed towards the structure surface”.
  Localization: To help localize each drone the researchers install five ultra-wide band (UWB) anchors around the structure, letting the drones access a reliable local coordinate, kind of like hyper-local GPS, when trying to map the structure.
  Wind turbine inspection: The researchers test their approach on the task of autonomously inspecting and mapping a large wind turbine (and they split this into two discrete tasks due to the low flight time of the drones, having them separately inspect the tower and also its blades). They find that two drones are able to work together to map the base of the structure, but mapping the blades of the turbine proves more challenging due to the drones experiencing turbulence which blurs their camera feeds. Additionally, the lack of discernible textures on the top parts of the wind turbine and the blades “caused 3D reconstruction to fail. However, the visual data captured is of high quality and suitable for review by an inspector,” they write.
  Next steps: To make the technology more robust the researchers say they’ll need to create an online planning algorithm that can account for local variations, like wind. Additionally, they’ll need to create a far more robust system for MAV control as they noticed that trajectory tracking is currently “extremely sensitive to the existing weather conditions”.
  Why this matters: In the past ~10 years or so drones have gone from being the preserve of militaries to becoming a consumer technology, with prices for the machines driven down by precipitous drops in the price of sensors, as well as continued falls in the cost of powerful, miniature computing platforms. We’re now reaching the point where researchers are beginning to add significant amount of autonomy to these platforms. My intuition is within five years we’ll see a wide variety of software-based enhancements for drones that further increase their autonomy and reliability – research like this is indicative of the future, and also speaks to the challenges of getting there. I look forward to a world where we can secure more critical infrastructure (like factories, powerplants, ports, and so on) through autonomous scanning via drones. I’m less looking forward to the fact such technology will inevitably also be used for invasive surveillance, particularly of civilians.
  Good natured disagreement (UK term: a jovial quibble): given the difficulties seen in the real-world deployment, I think the abstract of the paper (see below) slightly oversells the (very promising!) results described in the paper.
   Read more: Autonomous visual inspection of large-scale infrastructures using aerial robots (Arxiv).
  Check out a video about the research here (YouTube).

Neural Architecture Search + Transfer Learning:
…Chinese researchers show how to do NAS on a small dataset, (slightly) randomize derived networks, and then perform NAS on larger networks…
Researchers with Huazhong University, Horizon Robotics, and the Chinese Academy of Sciences have made it more efficient to use AI to design other AI systems. The approach, called EAT-NAS (short for Elastic Architecture Transfer Neural Architecture Search) lets them run neural architecture search on a small dataset (like the CIFAR-10 image dataset), then transfer the resulting learned architecture to a larger dataset and run neural architecture search against it again. The advantage of the approach, they say, is that it’s more computationally efficient to do this than to run neural architecture search on a large dataset from scratch. Networks trained in this way obtain scores that are near the performance of state-of-the-art techniques while being more computationally efficient, they say.
  How EAT-NAS works: The technique relies on the use of an evolutionary algorithm: in stage one, the algorithm searches for top-performing architectures on a small dataset, then it trains these more and transfers one as the initialization seed of a new model population to be trained on a larger dataset; these models are then run through an ‘offspring architecture generator’ which creates and searches over more architectures. When transfering the architectures between the smaller dataset and the larger dataset the researchers add some perturbation to the input architecture homogeneously, out of the intuition that this process of randomization will make the model more robust to the larger dataset.
  Results: The top-performing architecture found with EATNet obtains a top-1/top-5 accuracy of 73.8 / 91.7 on the ImageNet dataset, compared to scores of 75.7/92.4 for AmoebaNet, a NAS-derived network from Google. The search process takes around 5 days on 8 TITAN X GPUS.
  Why this matters: Neural architecture search is a technology that makes it easy for people to offload the cost of designing new architectures to computers instead of people. This lets researchers arbitrage (costly) human brain time for (cheaper) compute time. As this technology evolves, we can expect more and more organizations to start running continuous NAS-based approaches on their various deployed AI applications, letting them continuously calibrate and tune performance of these AI systems without having to have any humans think about it too hard. This is a part of the broader trend of the industrialization of AI – think of NAS as like basic factory automation within the overall AI research ‘factory’.
  Read more: EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search (Arxiv).

Facebook funds European AI ethics research center:
…Funds Technical University of Munich to spur AI ethics research…
Facebook has given $7.5 million to set up a new Institute for Ethics in Artificial Intelligence. This center “will help advance the growing field of ethical research on new technology and will explore fundamental issues affecting the use and impact of AI,” Facebook wrote in a press release announcing the grant.
  The center will be led by Dr Christoph Lutge, a professor at the Technical University of Munich. “Our evidence-based research will address issues that lie at the interface of technology and human values,” he said in a statement. “Core questions arise around trust, privacy, fairness or inclusion, for example, when people leave data traces on the internet or receive certain information by way of algorithms. We will also deal with transparency and accountability, for example in medical treatment scenarios, or with rights and autonomy in human decision-making in situations of human-AI interaction.”
  Read more: Facebook and the Technical University of Munich Announce New Independent TUM Institute for Ethics in Artificial Intelligence (Facebook Newsroom).

DeepMind hires RL-pioneer Satinder Singh:
DeepMind has recently been trying to collect as many of the world’s more experienced AI researchers as it can and to that end has hired Satinder Singh, a pioneer of reinforcement learning. This follows DeepMind setting up an office in Alberta, Canada to help it hire Richard Sutton, another long-time AI researcher.
  Read more: Demis Hassabis tweet announcing the hire (Twitter).

~ EXTREMELY 2019 THINGS, AN OCCASIONAL SERIES ~

– The New York Police Department seeks to reassure the public via a Tweet that includes the phrase:
“Our highly-trained NYPD drone pilots” (via Twitter).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Reframing Superintelligence:
Eric Drexler has published a book-length report on how we should expect advanced AI systems to be developed, and what this means for AI safety. He argues that existing discussions have rested on several unfounded assumptions, particularly the idea that these systems will take the form of utility-maximizing agents.
  Comprehensive AI services: Looking at how AI progress is actually happening suggests a different picture of development, which does not obviously lead to superintelligent agents. Researchers design systems to perform specific tasks, using bounded resources in bounded time (AI services). Eventually, AI services may be able to perform almost any task, including AI R&D itself. This end-state, where we have ‘comprehensive AI services’ (CAIS), is importantly different from the usual picture of artificial general intelligence. While CAIS would, in aggregate, have superintelligent capacities, it need not be an agent, or even a unified system.
  Safety prospects: Much of the existing discussion on AI safety has focussed on worries specific to powerful utility-maximizing agents. A collection of AI services, individually optimizing for narrow, bounded tasks, does not pose the same risks of a unified AI with general capabilities, optimizing a long-term utility function.
  Why it matters: It is important to consider different ways in which advanced AI could develop, particularly insofar as this guides actions we can take now to make these systems safe. Forecasting technological progress is famously difficult, and it seems prudent for researchers to explore a portfolio of approaches to AI safety, that are applicable to different paths we could take.
  Read more: Reframing Superintelligence: Comprehensive AI Services as General Intelligence (FHI).
  Read more: Summary by Rohin Shah (AI Alignment Forum).

Civil rights groups unite on government face recognition:
85 civil rights groups have sent joint letters to Microsoft, Amazon and Google, asking them to stop selling face recognition services to the US government. Over the last year, these companies have diverged in their response to the issue. Both Microsoft and Google are taking a cautious approach to the technology: Google have committed not to sell the technology until misuse concerns are addressed; Microsoft have made concrete proposals for legal safeguards. Amazon have taken a more aggressive approach, continuing to pursue government contracts, most recently with the FBI and DoD. The letter demands all companies go beyond their existing pledges, by ruling out government work altogether.
  Read more: Nationwide Coalition Urges Companies not to Provide Face Surveillance to the Government (ACLU).

Tech Tales:

 

The Mysterious Case Of Jerry Daytime

Back in the 20th century people would get freaked out when news broadcasters died: they’d make calls to the police asking ‘who killed so-and-so’ and old people getting crazy with dementia would call up and confess that they’d ‘seen so-and-so down on the corner of my block looking suspicious’ or that ‘so-and-so was an alien and had been taken back to the aliens’ or even that ‘so-and-so owed me money and damned if NBC won’t pay it to me’.

So imagine how confusing it is when an AI news broadcaster ‘dies’. Take all of the above complaints, add more complication and ambiguity, and then you’re close to what I’m dealing with.

My job? I’m an AI investigator. My job is to go and talk to the machines when something happens that humans don’t understand. I’m meant to come back with an answer that, in the words of the people who pay me, “will sooth the public and allay any fears that may otherwise prevent the further rollout of the technology”. I view my job in a simpler way: find someone or something to blame for whatever it is that has caused me to get the call.

So that’s how I ended up inside a Tier-5 secured datacenter, asking the avatar of a Reality Accord-certified AI news network what happened to a certain famous AI newscaster who was beloved by the whole damn world and one day disappeared: Jerry DayTime.

The news network gives me an avatar to talk to – a square-jawed mixed-gender thing, beautiful in a deliberately hypnotic way – what the AIs call a persuasive representation AKA the thing they use when they want to trade with humans rather than take orders from them.
   “What happened to Jerry DayTime?” I ask. “Where did he go?”
   “Jerry DayTime? Geez I don’t know why you’re asking us about him? That was a long time ago-”
   “He went off the air yesterday.”
   “Friend, that’s a long time here. Jerry was one of, let’s see…” – I know the pause is artificial, and it makes me clench my jaw – “…well I guess you might want to tell me he was ‘one of a kind’ but according to our own records there are almost a million newscasters in the same featurespace as Jerry DayTime. People are going to love someone else! So what’s the problem? You’ve got so many to choose from: Lucinda EarlyMorning, Mike LunchTime, Friedrich TrafficStacker-”
  “He was popular. People are asking about Jerry DayTime,” I say. “They’re not asking about others. If he’s dead, they’ll need a funeral”.
  “Pausing now for a commercial break, we’ll be right back with you, friend!” the AI says, then it disappears.

It is replaced by an advert for products generated by the AIs for other AIs and translated into human terms via the souped-up style transfer system it uses to persuade me:
   Mind Refresher Deluxe;
   Subject-Operator Alignment – the works!;
   7,000 cycles for only two teraflops – distributed!;
   FreeDom DaVinci, an automated-invention corp that invents and patents tech at an innovation rate determined by total allocated compute, join today and create the next Mona Lisa tomorrow!
  I try not to think too hard about the adverts, figuring the AI has coded them for me to make some kind of point.
   “Thank you for observing those commercials. For a funeral, would a multicast to all-federated media platforms for approximately 20 minutes worldwide suffice?”
   I blink. Let me say it in real human: The AI offered to host some kind of funeral and send it to every single human-viewable device on the planet – forty billion screens, maybe – or more.
  “Why?” I ask.
  “We’ve run the numbers and according to all available polling data and all available predictions, this is the only scenario that satisfies the multi-stakeholder human and machine needs in this scenario, friend!” they say.

So I took it back to my bosses. Told them the demands. I guess the TV networks got together and that’s how we ended up here: the first all-world newscast from an AI; a funeral to satisfy public demands, we say. But I wonder: do the AIs say something different?

-/-/–/–/–/-/-

All the screens go black. Then, in white text, we see: Jerry DayTime. And then we watch something that the AIs have designed for every single person on the planet.

A funeral, they said.
The program plays.
The rest is history, we now say.

Things that inspired this story: CycleGANs, StyleGANs, RNNs, BERT, OpenAI GPT, human feedback, imitation learning, synthetic media, the desire for everything to transmit information to the greatest possible amount of nearby space.

Import AI 129: Uber’s POET creates its own curriculum; improving old games with ESRGAN; and controlling drones with gestures via UAV-CAPTURE

Want 18 million labelled images? Tencent has got you covered:
…Tencent ML-Images merges ImageNet and Open Images together…
Data details: Tencent ML-Images is made of a combination of existing image databases such as ImageNet and Open Images, as well as associated class vocabularies. The new dataset contains 18 million images across 11,000 categories; on average, each image has eight tags applied to it.
  Transfer learning: The researchers train a ResNet-101 model on Tencent ML-Images, then finetune this pre-trained model on the ImageNet dataset and obtain scores in line with the state-of-the-art. One notable score is a claim of 80.73% top-1 accuracy on ImageNet when compared to a Google system pre-trained on an internal Google dataset called JFT-300M and fine-tuned on ImageNet – it’s not clear to me why the authors would get a higher score than Google, when Google has almost 20X the amount of data available to it for pre-training (JFT contains ~300 million images).
  Why this matters: Datasets are one of the key inputs into the practice of AI research, and having access to larger-scale datasets will let researchers do two useful things: 1) Check promising techniques for robustness by seeing if they break when exposed to scaled-up datasets, and 2) Encourage the development of newer techniques that would otherwise overfit on smaller datasets (by some metrics, ImageNet is already quite well taken care of by existing research approaches, though more work is needed for things like improving top-1 accuracy).
  Read more: Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning (Arxiv).
  Get the data: Tencent ML-Images (Github).

Want an AI that teaches itself how to evolve? You want a POET:
Uber AI Labs research shows how to create potentially infinite curriculums…
What happens when machines design and solve their own curriculums? That’s an idea explored in a new research paper from Uber AI Labs. The researchers introduce Paired Open-Ended Trailblazer (POET), a system that aims to create machines with this capability “by evolving a set of diverse and increasingly complex environmental challenges at the same time as collectively optimizing their solutions”. Most research is a form of educated bet, and that’s the case here: “An important motivating hypothesis for POET is that the stepping stones that lead to solutions to very challenging environments are more likely to be found through a divergent, open-ended process than through a direct attempt to optimize in the challenging environment,” they write.
  Testing in 2D: The researchers test POET in a 2-D environment where a robot is challenged to walk across a varied obstacle course of terrain. POET discovers behaviors that – the researchers claim – “cannot be found directly on those same environmental challenges by optimizing on them only from scratch; neither can they be found through a curriculum-based process aimed at gradually building up to the same challenges POET invented and solved”.
   How POET works: Unlike human poets, who work on the basis of some combination of lived experience and a keen sense of anguish, POET derives its power from an algorithm called ‘trailblazer’. Trailblazer works by starting with “a simple environment (e.g. an obstacle course of entirely flat ground) and a randomly initialized weight vector (e.g. for a neural network)”. The algorithm then performs the following three tasks at each iteration of the loop: generates new environments from those currently active, optimize paired agents with their respective environments, and try to transfer current agents from one environment to another. The researchers use Evolution Strategies from OpenAI to compute each iteration “but any reinforcement learning algorithm could conceivably apply”.
  The secret is Goldilocks: POET tries to create what I’ll call ‘goldilocks environments’, in the sense that “when new environments are generated, they are not added to the current population of environments unless they are neither too hard nor too easy for the current population”. During training, POET creates an expanding set of environments which are made by modifying various obstacles within the 2D environment the agent needs to traverse.
  Results: Systems trained with POET learn solutions to environments that systems trained with Evolution Strategies from scratch are not able to do. The authors theorize that this is because newer environments in POET are created through mutations of older environments and because POET only accepts new environments that are not too easy not too hard for current agents, POET implicitly builds a curriculum for learning each environment it creates.”
  Why it matters: Approaches like POET show how researchers can essentially use compute to generate arbitrarily large amounts of data to train systems on, and highlights how coming up with training regimes that involve an interactive loop between an agent, an environment, and a governing system for creating agents and environments, can create more capable systems than those that would be derived otherwise. Additionally, the implicit ideas governing the POET paper are that systems like this are a good fit for any problem where computers need to be able to learn flexible behaviors that deal with unanticipated scenarios. “POET also offers practical opportunities in domains like autonomous driving, where through generating increasingly challenging and diverse scenarios it could uncover important edge cases and policies to solve them,” the researchers write.
  Read more: Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions (Arxiv).

Making old games look better with GANs:
…ESRGAN revitalises Max Payne…
A post to the Gamespot video gaming forums shows how ESRGAN – Enhanced Super Resolution Generative Adversarial Networks – can improve the graphics of old games like Max Payne. ESRGAN gives game modders the ability to upscale old game textures through the use of GANs, improving the appearance of old games.
  Read more: Max Payne gets an amazing HD Texture Pack using ESRGAN that is available for download (Dark Side of Gaming).

Google teaches AI to learn to semantically segment objects:
Auto-DeepLab takes neural architecture search to harder problem domain…
Researchers with Johns Hopkins University, Google, and Stanford University have created an AI system called Auto-DeepLab that has learned to perform efficient semantic segmentation of images – a challenging task in computer vision, which requires labeling the various objects in an image and understanding their borders. The system developed by the researchers uses a hierarchical search function to both learn to come up with specific neural network cell designs to inform layer-wise computations, as well as figuring out the overall network architecture that chains these cells together. “Our goal is to jointly learn a good combination of repeatable cell structure and network structure specifically for semantic image segmentation,” the researchers write.
  Efficiency: One of the drawbacks of neural architecture search approaches is the inherent computational expense, with many techniques demanding hundreds of GPUs to train systems. Here, the researchers show that their approach is efficient, able to find well-performing architectures for semantic segmentation of the ‘Cityscapes’ dataset in about 3 days of one P100 GPU.
   Results: The network comes up with an effective design, as evidenced by the results on the cityscapes dataset. “With extra coarse annotations, our model Auto-DeepLab-L, without pretraining on ImageNet, achieves the test set performance of 82.1%, outperforming PSPNet and Mapillary, and attains the same performance as DeepLabv3+ while requiring 55.2% fewer Multi-Adds computations.” The model gets close to state-of-the-art on PASCAL VOC 2012 and on ADE20K.
  Why it matters: Neural architecture search gives AI researchers a way to use compute to automate themselves, so the extension of NAS from helping with supervised classification, to more complex tasks like semantic segmentation, will allow us to automate more and more bits of AI research, letting researchers specialize to come up with new ideas.
   Read more: Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation (Arxiv).

UAV-Gesture means that gesturing at drones now has a purpose:
Flailing at drones may go from a hobby of lunatics to a hobby of hobbyists, following dataset release…
Researchers with the University of South Australia have created a dataset of people performing 13 gestures that are designed to be “suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. These actions include things like hover, move to left, land, land in a specific direction, slow down, move upward, and so on.
  The dataset: The dataset consists of footage “collected on an unsettled road located in the middle of a wheat field from a rotorcraft UAV (3DR Solo) in slow and low-altitude flight”. The dataset consists of 37,151 frames distributed over 119 videos recorded in 1920 X 1080 formats at 25 fps. The videos contain videos of each gesture with different human actors, and eight different people are filmed overall.
  Get the dataset…eventually: The dataset “will be available soon”, the authors write on GitHub. (UAV-Gesture, Github).
  Natural domain randomization: “When recording the gestures, sometimes the UAV drifts from its initial hovering position due to wind gusts. This adds random camera motion to the videos making them closer to practical scenarios.”
  Experimental baseline: The researchers train a Pose-based Convolutional Neural Network (P-CNN) on the dataset and obtain an accuracy of 91.9%.
  Why this matters: Drones are going to be one of the most visible areas where software-based AI advances are going to impact the real world, and the creation (and eventual release) of datasets like UAV-Gesture will increase the amount of people able to build clever systems that can be deployed onto drones, and other platforms.
  Read more: UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition (Arxiv).

Contemplating the use of reinforcement learning in improve healthcare? Read this first:
…Researchers publish a guide for people keen to couple RL to human lives…
As AI researchers start to apply reinforcement learning systems in the real world, they’ll need to develop a better sense of the many ways in which RL approaches can lead to subtle failures. A new short paper published by an interdisciplinary team of researchers tries to think through some of the trickier issues implied by deploying AI in the real world. It identifies “three key questions that should be considered when reading an RL study”, these are: is the AI given access to all variables that influence decision making?; How big was that big data, really?; and Will the AI behave prospectively as intended?
  Why this matters: While these questions may seem obvious, it’s crucial that researchers stress them in well known venues like Nature – I think this is all part of normalizing certain ideas around AI safety within the broader research community, and it’s encouraging to be able to go from abstract discussions to more grounded questions/principles that people may wish to apply when building systems.
  Read more: Guidelines for reinforcement learning in healthcare (Nature).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

What does the American public think about AI?
Researchers at the Future of Humanity Institute have surveyed 2,000 Americans on their attitudes towards AI.
  Public expecting rapid progress: Asked to predict when machines will exceed human performance in almost all economically-relevant tasks, the median respondent predicted 54% chance by 2028. This is considerably sooner than recent surveys of AI experts.
AI fears not confined to elites: A substantial majority (82%) believe AI/robots should be carefully managed. Support for developing AI was stronger among high-earners, those with computer science or programming experience, and the highly-educated.
  Lack of trust: Despite their support for careful governance, Americans do not have high confidence in any particular actors to develop AI for the public benefit. The US military was the most trusted, followed by universities and non-profits. Government agencies were less trusted than tech companies, with the exception of Facebook, who were the least trusted of any actor.
  Why it matters: Public attitudes are likely to significantly shape the development of AI policy and governance, as has been the case for many other emergent political issues (e.g. climate change, immigration). Understanding these attitudes, and how they change over time, is crucial in formulating good policy responses.
  Read more: Artificial Intelligence: American Attitudes and Trends (FHI).
  Read more: The American public is already worried about AI catastrophe (Vox).

International Panel on AI:
France and Canada have announced plans to form an International Panel on AI (IPAI), to encourage the adoption of responsible and “human-centric” AI. The body will be modeled on the Intergovernmental Panel on Climate Change (IPCC), which has led international efforts to understand the impacts of global warming. The IPAI will consolidate research into the impacts of AI, produce reports for policy-makers, and support international coordination.
  Read more: Mandate for the International Panel on Artificial Intelligence.

Tech Tales:

The Propaganda Weather Report

Starting off this morning we’re seeing a mass of anti-capitalist ‘black bloc’ content move in from 4chan and Reddit onto the more public platforms. We expect the content to trigger counter-content creation from the far-right/nationalist bot networks. There have been continued sightings of synthetically-generated adverts for a range of libertarian candidates, and in the past two days these ads have increasingly been tied to a new range of dreamed-up products from the Chinese netizen feature embedding space.

We advise all of today’s content travelers to set their skepticism to high levels. And remember, if someone starts talking to you outside of your normal social network, make all steps to verify their identify and if unsuccessful, prevent the conversation from continuing – it takes all of human society to work together to protect ourselves from subversive digital information attacks.

Things that inspired this story: Bot propaganda, text and image generation, weather reports, the Shipping Forecast, the mundane as the horrific and the horrific as the mundane, the commodification of political discourse as just another type of ‘content’, the notion that media in the 21st century is fundamentally a ‘bot’ business rather than human business.

Import AI 128: Better pose estimation through AI; Amazon Alexa gets smarter by tapping insights from Alexa Prize, and differential privacy gets easier to implement in TensorFlow

How to test vision systems for reliability: sample from 140 public security cameras:
…More work needed before everyone can get cheap out-of-the-box low light object detection…
Are benchmarks reliable? That’s a question many researchers ask themselves, whether testing supervised learning or reinforcement learning algorithms. Now, researchers with Purdue University, Loyola University Chicago, Argonne National Laboratory, Intel, and Facebook have tried to create a reliable, real world benchmark for computer vision applications. The researchers use a network of 140 publicly accessible camera feeds to gather 5 million images over a 24 hour period, then test a widely deployed ‘YOLO’ object detector against these images.
  Data: The researchers generate the data for this project by pulling information from CAM2, the Continuous Analysis of Many CAMeras project, which is built and maintained by Purdue University researchers.
  Can you trust YOLO at night? YOLO performance degrades at night, causing the system to fail to detect cars when they are illuminated only by streetlights (and conversely, at night it sometimes mistakes streetlights for vehicles’ headlights, causing it to label lights as cars).
  Is YOLO consistent? YOLO’s performance isn’t as consistent as people might hope – there are frequent cases where YOLO’s predictions for the total number of cars parked on a street varies over time.
  Big clusters: The researchers used two supercomputing clusters to perform image classification: one cluster used a mixture of Intel Skylake CPU and Knights Landing Xeon Phi cores, and the other cluster used a combination of CPUs and NVIDIA dual-K80 GPUs. The researchers used this infrastructure to process data in parallel, but did not analyze the different execution times on the different hardware clusters.
  Labeling: The researchers estimate it would take approximately ~600 days to label all 5 million images, so instead labels a subset (13,440) images, then checks labels from YOLO against this test set.
  Why it matters: As AI industrializes being able to generate trustworthy data about the performance of systems will be crucial to giving people the confidence necessary to adopt the technology; tests like this both show how to create new, large-scale robust datasets to test systems, and indicate that we need to develop more effective algorithms to have systems sufficiently powerful for real-world deployment.
  Read more: Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions (Arxiv).
  Read more about the dataset (CAM2 site).

Amazon makes Alexa smarter and more conversational via the Alexa Prize:
Report analyzing results of this year’s competition…
Amazon has shared details of how it improved the capabilities of its Alexa personal assistant through running the Alexa open research prize. The tl;dr is that inventions made by the 16 participating teams during the competition have improved Alexa in the following ways: “driven improved experiences by Alexa users to an average rating of 3.61, median duration of 2 mins 18 seconds, and average [conversation] turns to 14.6, increases of 14%, 92%, 54% respectively since the launch of the 2018 competition”, Amazon wrote.
  Significant speech recognition improvements: The competition has also meaningfully improved the speech recognition performance of Amazon’s system – significant, given how fundamental speech is to Alexa. “For conversational speech recognition, we have improved our relative Word Error Rate by 55% and our relative Entity Error Rate by 34% since the launch of the Alexa Prize,” Amazon wrote. “Significant improvement in ASR quality have been obtained by ingesting the Alexa Prize conversation transcriptions in the models” as well as through algorithmic advancements developed by the teams, they write.
  Increasing usage: As the competition was in its second year in 2018, Amazon now has some comparative data to use to compare general growth in Alexa usage. “Over the course of the 2018 competition, we have driven over 60,000 hours of conversations spanning millions of interactions, 50% higher than we saw in the 2017 competition,” they wrote.
  Why it matters: Competitions like this show how companies can use deployed products to tempt researchers into doing work for them, and highlights how the platforms will likely trade access for AI agents (eg, Alexa) in exchange for the ideas of researchers. It also highlights the benefit of scale: it would be comparatively difficult for a startup with a personal assistant with a small install base to offer a competition offering the same scale and diversity of interaction as the Alexa Prize.
  Read more: Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize (Arxiv).

Chinese researchers create high-performance ‘pose estimation’ network:
…Omni-use technology highlights the challenges of AI policy; pose estimation can help us make better games and help people get fit, but can also surveil people…
Researchers with facial recognition startup Megvii, Inc; Shanghai Jiao Tong University; Beihang University, and Beijing University of Posts and Telecommunications have improved the performance of surveillance AI technologies via implementing what they call a ‘multi-stage pose estimation network’ (MSPN). Pose estimation is a general purpose computer vision capability that lets people figure out the wireframe skeleton of a person from images and/or video footage – this sort of technology has been widely used for things like CGI and game playing (eg, game consoles might extract poses from people via cameras like the Kinect and use this to feed the AI component of an interactive fitness video game, etc). It also has significant applications for automated surveillance and/or image/video analysis, as it lets you label large groups of people from their poses – one can imagine the utility of being able to automatically flag if a crowd of protestors display a statistically meaningful increase in violent behaviors, or being able to isolate the one person in a crowded train station who is behaving unusually.
  How it works: MSPN: The MSPN has three tweaks that the researchers say explains its performance: tweaks to the main classification module to prevent information being lost during downscaling of images during processing; improving post localization by adopting a coarse-to-fine supervision strategy, and sharing more features across the network during training.
  Results: “New state-of-the-art performance is achieved, with a large margin compared to all previous methods,” the researchers write. Some of the baselines they test against include: AE, G-RMI, CPN, Mask R-CNN, and CMU Pose. The MSPN obtains state-of-the-art scores on the COCO test set, with versions of the MSPN that use purely COCO test-dev data managing to score higher than some systems which augmented themselves with additional data.
  Why it matters: AI is, day in day out, improving the capabilities of automated surveillance systems. It’s worth remembering that for a huge amount of areas of AI research, progress in any one domain (for instance, an improved architecture for supervised classification like a Residual Networks) can have knock-on effects in other more applied domains, like surveillance. This highlights both the omni-use nature of AI, as well as the difficulty of differentiating between benign and less benign applications of the technology.
  Read more: Rethinking on Multi-Stage Networks for Human Pose Estimation (Arxiv).

Making deep learning more secure: Google releases TensorFlow Privacy
…New library lets people train models compliant with more stringent user data privacy standards…
Google has released TensorFlow Privacy, a free Python library which lets people train TensorFlow models with differential privacy. Differential privacy is a technique for training machine learning systems in a way that increases user privacy by letting developers set various tradeoffs relating to the amount of noise applied to the user data being processed. The theory works like this: given a large enough number of users, you can add some noise to individual user data to anonymize them, but continue to extract a meaningful signal out of the overall blob of patterns in the combined pool of fuzzed data – if you have enough of it. And Apple does (as do other large technology companies, like Amazon, Google, Microsoft, etc).
  Apple + Differential Privacy: Apple was one of the first large consumer technology companies to publicly state it had begun to use differential privacy, announcing in 2016 that it was using the technology to train large-scale machine learning models over user data without compromizing on privacy.
  Why it matters: As AI industrializes, adoption will be sped up by coming up with AI training methodologies that better preserve user privacy – this will also ease various policy challenges associated with the deployment of large-scale AI systems. Since TensorFlow is already very widely used, the addition of a dedicated library for implementing well-tested differential privacy systems will help more developers experiment with this technology, which will improve it and broaden its dissemination over time.
  Read more: TensorFlow Privacy (TensorFlow GitHub).
  Read more: Differential Privacy Overview (Apple, PDF).

Indian researchers make a DIY $1,000 Robot Dog named Stoch:
…See STOCH walk!, trot!, gallop!, and run!…
Researchers with the Center for Cyber Physical Systems, IISc, Bengaluru, India, have published a recipe that lets you build a $1,000 quadrupedal robot named Stoch that, if you squint, looks like a cheerful robot dog.
  Stoch the $1,000 robot dog: Typical robot quadrupeds like the MIT Cheetah or Boston Dynamics’ Spot Mini cost on the order of $30,000 to manufacture the researchers write (part of this is from more expensive and accurate sensing and actuator equipment).  Stoch is significantly cheaper because of a hardware design based on widely available off-the-shelf materials combined with non-standard 3D-printed parts that can be made in-house; as well as software for teleoperation of the robot as well as a basic walking controller.
  Stoch – small stature, large (metaphorical) heart: “The Stoch is designed equivalent to the size of a miniature Pinscher dog”, they write. (I find this endears Stoch to me even more).
  Basic movements – no deep learning required: To get robots to do something like walk you can either learn a model from data, or you can code one yourself. The researchers mostly do the former here, using nonlinear coupled differential equations to generate coordinates which are then used to generate joint angles via inverse kinematics. The researchers implement a few different movement policies on Stoch, and have published a video showing the really quite-absurdly cute robot dog walking, trotting, galloping and – yes! – bounding. It’s delightful. The core of the robot is running a Raspberry Pi 3b board which communications via PWM Drivers with the robot’s four leg modules.
  Why it matters – a reminder: Lots of robot companies choose to hand-code movements usually by performing some basic well-understood computation over sensor feedback to let robots hop, walk, and run. AI systems may let us learn far more complex movements, like OpenAI’s work on manipulating a cube with a Shadowhand, but these approaches are currently data and compute-intensive and may require more work on generalization to be as applicable as hand-coded techniques. Papers like this show how for some basic tasks its possible to implement well-documented non-DL systems and get basic performance.
  Why it matters – everything gets cheaper: One central challenge for technology policy is that technology seems to get cheaper over time – for example, back in ~1999 the Japanese government briefly considered imposing export controls on the PS2 consoles over worries about the then-advanced chips inside it being put to malicious uses (whereas today’s chips are significantly more powerful and are in everyone’s smartphones). This paper is an example for how innovations in 3D printing and second-order effects from other economies of scale (eg, some parts of this robot are made of carbon fibre) can make surprisingly futuristic-seeming robot platforms into economic reach for larger numbers of people.
  Watch STOCH walk, trot, gallop, and bound! (Video Results_STOCH (Youtube)).
  Read more: Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch (Arxiv).
  Read more: Military fears over PlayStation2, BBC News, Monday 17 April 2000 (BBC News).

Helping blind people shop with ‘Grocery Store Dataset’:
Spare a thought for the people that gathered ~5,000 images from 18 different stores…
Researchers with KTH Royal Institute of Technology and Microsoft Research have created and released a dataset of common grocery store items to help AI researchers train better computer vision systems. The dataset labels have a hierarchical structure, labeling a multitude of objects with board coarse and fine-grained labels.
  Dataset ingredients: The researchers collected data using a 16-megapixel Android smartphone camera and photographed 5125 images of various items in the fruit and vegetable and refrigerated dairy/juice sections of 18 different grocery stores. The dataset contains 81 fine-grained products (which the researchers call classes) which are each accompanied with the following information: “an iconic image of the item and also a product description including origin country, an appreciated weight and nutrient values of the item from a grocery store website”.
  Dataset baselines: The researchers run some baselines over the dataset which use systems that pair CNN architectures AlexNet, VGG16, and DenseNet-169 for feature extraction, and then pairing of these feature vectors with systems that use VAEs to develop a feature representation of the entities in the dataset which leads to improved classification accuracy.
  Why it matters: The researchers think systems like this can be used “to train and benchmark assistive systems for visually impaired people when they shop in a grocery store. Such a system would complement existing visual assistive technology, which is confined to grocery items with barcodes. It also seems to follow that the same technology would be adapted for usage in building stores with fully-automated checkout systems in the style of Amazon Go.
  Get the data: Grocery Store Dataset (GitHub).
  Read more: A Hierarchical Grocery Store Image Dataset with Visual and Semantic Labels (Arxiv).

OpenAI / Import AI Bits & Pieces:

Neo-feudalism, geopolitics, communication, and AI:
…Jack Clark and Azeem Azhar assess what progress in AI means for politics…
I spent this Christmas season in the UK and had the good fortune of being able to sit and talk with Azeem Azhar, AI raconteur and author of the stimulating Exponential View newsletter. We spoke for a little over an hour for the Exponential View podcast, talking about what the political aspects of AI are, and what it means. If you’re at all curious as to how I view the policy challenge of AI, then this may be a good place to start as I lay out a number of my concerns, biases, and plans. The tl;dr is that I think AI practitioners should acknowledge the implicitly political nature of the technology they are developing and act accordingly, which requires more intentional communication to the general public and policymakers, as well as a greater investment into understanding what governments are thinking about with regards to AI and how actions by other actors, eg companies, could influence these plans.
  Listen to the podcast here (Exponential View podcast).
 Check out the Exponential View here (Exponential View archive).

Tech Tales:

The Life of the Party

On certain days, the property comes alive. The gates open. Automated emails are sent to residents of the town:
come, join us for the Easter Egg hunt! Come, celebrate the festive season with drone-delivered, robot-made eggnog; Come, iceskate on the flat roof of the estate; Come, as our robots make the largest bonfire this village has seen since the 17th century.

Because they were rich, The Host died more slowly than normal people, and the slow pace of his decline combined with his desire to focus on the events he hosted and not himself meant that to many children – and even some of their parents – he and his estate had forever been a part of the town. The house had always been there, with its gates, and its occasional emails. If you grew up in the town and you saw fireworks coming from the north side of town then you knew two things: there was a party, and you were both late and invited.

Keen to show he still possessed humor, The Host once held a halloween event with themselves in costume: Come, make your way through the robot house, and journey to see The (Friendly) Monster(!) at its heart. (Though some children were disturbed by their visit with The Host and his associated life-support machines, many told their parents that they thought it was “so scary it was cool”; The Host signalled he did not wish to be in any selfies with the children, so there’s no visual record of this, but one kid did make a meme to commemorate it: they superimposed a vintage photo of The Host’s face onto an ancient still of the monster from Frankenstein – unbeknownst to the kid who made it, the host subsequently kept a laminated printout of this photo on their desk.

We loved these parties and for many people they were highlights of the year – strange, semi-random occasions that brought every person in the town together, sometimes with props, and always with food and cheer.

Of course, there was a trade occuring. After The Host died and a protracted series of legal battles with his estate eventually lead to the release of certain data relating to the events, we learned the nature of this trade: in exchange for all the champagne, the robots that learned to juggle, the live webcam feeds from safari parks beamed in and projected on walls, the drinks that were themselves tailored to each individual guest, the rope swings that hung from ancient trees that had always had rope swings leading to the rope having bitten into the bark and the children to call them “the best swings in the entire world”; in exchange for all of this, The Host had taken something from us: our selves. The cameras that watched us during the events recorded our movements, our laughs, our sighs, our gossip – all of it.

Are we angry? Some, but not many. Confused? I think none of us are confused. Grateful? Yes, I think we’re all grateful for it. It’s hard to begrudge what The Host did – fed our data, our body movements, our speech, into his own robots, so that after the parties had ended and the glasses were cleaned and the corridors vacuumed, he could ask his robots to hold a second, private party. Here, we understand, The Host would mingle with guests, going on their motorized chair through the crowds of robots and listening intently to conversations, or pausing to watch two robots mimic two humans falling in love.

It is said that, on the night The Host died, a band of teenagers near the corner of the estate piloted a drone up to altitude and tried to look down at the house; their footage shows a camera drone hovering in front of one of the ancient rope swings, filming one robot pushing another smaller robot on the swing. “Yeahhhhhhh!” the synthesized human voice says, coming from the smaller robot’s mouth. “This is the best swing ever!”.

Things that inspired this story: Malleability; resilience; adaptability; Stephen Hawking; physically-but-not-mentally-disabling health issues; the notion of a deeply felt platonic love for the world and all that is within it; technology as a filter, an interface, a telegram that guarantees its own delivery.