Import AI

Import AI 120: The Winograd test for commonsense reasoning is not as hard as we thought; Tencent learns to spot malware with AiDroid data;and what a million people think about the trolley problem

Want almost ten million images for machine learning? Consider Open Images V4:
…Latest giant dataset release from Google annotates images with bounding boxes, visual relationships, and image-level labels for 20,000 distinct concepts…
Google researchers have released Open Images Dataset V4, a very large image dataset collected from photos from Flickr that had been shared with a Creative Commons Attribution license.
  Scale: Open Images V4 contains 9.2 million heavily-annotated images. Annotations include bounding boxes, visual relationship annotations, and 30 million image-level labels for almost 20,000 distinct concepts. “This [scale] makes it ideal for pushing the limits of the data-hungry methods that dominate the state of the art,” the researchers write. “For object detection in particulate, the scale of the annotations is unprecedented”.
  Automated labeling: “Manually labeling a large number of images with the presence or absence of 19,794 different classes is not feasible not only because of the amount of time one would need, but also because of the difficulty for a human to learn and remember that many classes”, they write. Instead, they use a partially-automated method to first predict labels for images, then have humans provide feedback on these predictions. They also implemented various systems to more effectively add the bounding boxes to different images, which required them to train human annotators in a technique called “fast clicking”.
  Scale, and Google scale: The 20,000 class names selected for use in Open Images V4 are themselves a subset of all the names used by Google for an internal dataset called JFT, which contains “more than 300 million images”.
  Why it matters: In recent years, the release of new, large datasets has been (loosely) correlated with the emergence of new algorithmic breakthroughs that have measurably improved the efficiency and capability of AI algorithms. The large-scale and dense labels of Open Images V4 may serve to inspire more progress in other work within AI.
  Get the data: Open Images V4 (Official Google website).
  Read more: The Open Images Dataset V4 (Arxiv).

What happens when plane autopilots go bad:
…Incident report from England gives us an idea of how autopilots bug-out and what happens when they do…
A new incident report from the UK about an airplane having a bug with its autopilot gives us a masterclass in the art of writing bureaucratic reports about terrifying subjects.
  The report in full: “After takeoff from Belfast City Airport, shortly after the acceleration altitude and at a height of 1,350 ft, the autopilot was engaged. The aircraft continued to climb but pitched nose-down and then descended rapidly, activating both the “DON’T SINK’ and “PULL UP” TAWS (EGPWS) warnings. The commander disconnected the autopilot and recovered the aircraft into the climb from a height of 928 ft. The incorrect autopilot ‘altitude’ mode was active when the autopilot was engaged causing the aircraft to descend toward a target altitude of 0 ft. As a result of this event the operator has taken several safety actions including revisions to simulator training and amendments to the taxi checklist.”
  Read more: AAIB investigation to DHC-8-402 Dash 8, G-ECOE (UK Gov, Air Accidents Investigation Branch).

China’s Xi Jinping: AI is a strategic technology, fundamental to China’s rise:
…Chinese leader participates in Politburo-led AI workshop, comments on its importance to China…
Chinese leader Xi Jinping recently led a Politburo study session focused on AI, as a continuation of the country’s focus on the subject following the publication of its national strategy last year. New America recently translated Chinese-language official media coverage of the event, giving us a chance to get a more detailed sense of how Xi views AI+China.
  AI as a “strategic technology”: Xi described AI as a strategic technology, and said it is already imparting a significant influence on “economic development, social progress, and the structure of international politics and economics”, according to remarks paraphrased by state news service Xinhua. “Accelerating the development of a new generation of AI is an important strategic handhold for China to gain the initiative in global science and technology competition”.
  AI research imperatives: China should invest in fundamental theoretical AI research, while growing its own education system. It should “fully give rein to our country’s advantages of vast quantities of data and its huge scale for market application,” he said.
  AI and safety: “It is necessary to strengthen the analysis and prevention of potential risks in the development of AI, safeguard the interests of the people and national security, and ensure that AI is secure, reliable, and controllable,” he said. “Leading cadres at all levels must assiduously study the leading edge of science and technology, grasp the natural laws of development and characteristics of AI, strengthen overall coordination, increase policy support, and form work synergies.”
  Why it matters: Whatever the United States government does with regard to artificial intelligence will be somewhat conditioned by the actions of other countries, and China’s actions will be of particular influence here given the scale of the country’s economy and its already verifiable state-level adoption of AI technologies. I believe it’s also significant to have such detailed support for the technology emanate from the top of China’s political system, as it indicates that AI may be becoming a positional geopolitical technology – that is, state leaders will increasingly wish to demonstrate superiority in AI to help send a geopolitical message to rivals.
  Read more: Xi Jinping Calls for ‘Healthy Development’ of AI [Translation] (New America).

Manchester turns on SpiNNaker spiking neuron supercomputer:
…Supercomputer to model biological neurons, explore AI…
Manchester University has switched on SpiNNaker, one-million processor supercomputer designed with a network architecture to help it better model biological neurons in brains, specifically by implementing spiking networks. SpiNNaker “mimics the massively parallel communication architecture of the brain, sending billions of small amounts of information simultaneously to thousands of different destinations”, according to Manchester University.
  Brain-scale modelling: SpiNNaker’s ultimate goal is to model one billion neurons at once. One billion neurons are about 1% of the total number of neurons in the average human brain. Initially, it should be able to model around a million neurons “with complex structure and internal dynamics”. But SpiNNaker boards can also be scaled down and used for other purposes, like in developing robotics. “A small SpiNNaker board makes it possible to simulate a network of tens of thousands of spiking neurons, process sensory input and generate motor output, all in real time and in a low power system”.
  Why it matters: Many researchers are convinced that if we can figure out the right algorithms, spiking networks are a better approach to AI than today’s neural networks – that’s because a spiking network can propagate messages that are both fuzzier and more complex than those made possible by traditional networks.
  Read more: ‘Human brain’ supercomputer with 1 million processors switched on for first time (Manchester).
  Read more: SpiNNaker home page (Manchester University Advanced Processor Technologies Research Group).

Learning to spot malware at China-scale with Tencent AiDroid:
…Tencent research project shows how to use AI to spot malware on phones…
Researchers with West Virginia University and Chinese company Tencent have used deep neural networks to create AiDroid, a system for spotting malware on Android. AiDroid has subsequently “been incorporated into Tencent Mobile Security product that serves millions of users worldwide”.
  How it works: AiDroid works like this: First, the researchers extract the API call sequences from runtime executions of Android apps in users’ smartphones, then they try to model the relationships between different mobile applications, phones, apps, and so on, via a heterogeneous information network (HIN). They then learn a low-dimensional representation of all the different entities within HIN, and use these features as inputs to a DNN model, which learns to classify typical entities and relationships, and therefore can learn to spot erroneous entities or relationships – which typically correspond to malware.
  Data fuel: This research depends on access to a significant amount of data. “We obtain the large-scale real sample collection from Tencent Security Lab, which contains 190,696 training app (i.e., 83,784 benign and 106,912 malicious).
  Results: The researchers measure the effectiveness of their system and show it is better at in-sample embedding than other systems such as DeepWalk, LINE, and metapath2vec, and that systems trained with the researchers’ HINembedding display superior performance to those trained with others. Additionally, their system is better at prediction malicious applications than other somewhat weaker baselines.
  Why it matters: Machine learning approaches are going to augment many existing cybersecurity techniques. AiDroid gives us an example for how large platform operators, like Tencent, can create large-scale data generation systems (like the basis AiDroid app) then use that data to conduct research – bringing to mind the question, if this data has such obvious value, why aren’t the users being paid for its use?
  Read more: AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection (Arxiv).

The Winograd Schema Challenge is not as smart as we hope:
…Researchers question robustness of Winograd Schema’s for assessing language AIs after breaking the evaluation method with one tweak…
Researchers with McGill University and Microsoft Research Montreal have shown how the Winograd Schema Challenge (WSC) – thought by many to be a gold standard for evaluating the ability of language systems to perform common sense reasoning – is deeply flawed, and for researchers to truly test for general cognitive capabilities they need to apply a different evaluation criteria when studying performance on the dataset.
  Whining about Winograd: WSC is a dataset of almost three hundred sentences where the language model is tasked with working out which pronoun is being referred to in a given sentence. For example, WSC might challenge a computer to figure out which of the entities in the following sentence is the one going fast: “The delivery truck zoomed by the school bus because it was going so fast”. (The correct answer is that the delivery truck is the one going fast). People have therefore assumed WSC might be a good way to test the cognitive abilities of AI systems.
  Breaking Winograd with one trick: The research shows that if you do one simple thing in WSC you can meaningfully damage the success rate of AI techniques when applied to the dataset. The trick? Switching the order of different entities in sentences. What does this look like in practice? An original sentence in Winograd might be “Emma did not pass the ball to Janie although she saw that she was open”, and the authors might change it to “Janie did not pass the ball to Emma although she saw that she was open”.
  Proposed Evaluation Protocol: Models should first be evaluated against their accuracy score on the original WSC set, then researchers should analyze the accuracy on the switchable subset of WSC (before and after switching the candidates), as well as the accuracy on the associative and non-associative subsets of the dataset. Combined, this evaluation technique should help researchers distinguish models that are robust and general from ones which are brittle and narrow.
  Results: The researchers test a language model, an ensemble of ten language models, an ensemble of 14 language models, and a “knowledge hunting method” against the WSC using the new evaluation protocol. “We observe that accuracy is stable across the different subsets for the single LM. However, the performance of the ensembled LMs, which is initially state-of-the-art by a significant margin, falls back to near random on the switched subset.” The tests also show that performance for the language models drops significantly on the non-associative portion of WSC “when information related to the candidates themselves does not give away the answer”, further suggesting a lack of a reasoning capability.
  Why it matters: “Our results indicate that the current state-of-the-art statistical method does not achieve superior performance when the dataset is augmented and subdivided with our switching scheme, and in fact mainly exploits a small subset of highly associative problem instances”. Research like this shows how challenging it is to not just develop machines capable of displaying “common sense”, but how tough it can be to setup the correct sort of measurement schemes to test for this capability in the first place. Ultimately, this research shows that “performing at a state-of-the-art level on the WSC does not necessarily imply strong common-sense reasoning”.
  Read more: On the Evaluation of Common-Sense Reasoning in Natural Language Understanding (Arxiv).
  Read more about the Winograd Schema Challenge here.

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

Microsoft president wants rules on face recognition:
Brad Smith, Microsoft’s president, has reiterated his calls for regulation of face recognition technologies at the Web Summit conference in Portugal. In particular, he warned of potential risks to civil liberties from AI-enabled surveillance. He urged societies to decide on the acceptable limits of governments on our privacy, ahead of widespread proliferation of the technology.
  “Before we wake up and find that the year 2024 looks like the book “1984”, let’s figure out what kind of world we want to create, and what are the safeguards and what are the limitations of both companies and governments for the use of this technology”, he said.
  Earlier this year, Smith made similar calls via a Microsoft blogpost.
Read more: Microsoft’s president says we need to regulate facial recognition (Recode).
Read more: Facial recognition technology: The need for public regulation and corporate responsibility (Microsoft blog).

Machine ethics for self-driving cars via survey:
Researchers asked respondents to decide on a range of ‘trolley problem’-style ethical dilemmas for autonomous vehicles, where vehicles must choose between (e.g.) endangering 1 pedestrian and endangering 2 occupants. Several million subjects were drawn from over 200 countries. The strongest preferences were for saving young lives over old, humans over animals, and more lives over fewer.
  Why this matters: Ethical dilemmas in autonomous driving are unlikely to be the most important decisions we delegate to AI systems. Nonetheless, these are important issues, and we should use them to develop solutions that are scalable to a wider range of decisions. I’m not convinced that we should want machine ethics to mirror widely-held views amongst the public, or that this represents a scalable way of aligning AI systems with human values. Equally, other solutions come up against problems of consent and might increase the possibility of a public backlash.
  Read more: The Moral Machine Experiment (Nature).

Tech Tales:

[2020: Excerpt from an internal McGriddle email describing a recent AI-driven marketing initiative.]

Our ‘corporate insanity promotion’ went very well this month. As a refresher, for this activity we had all external point-of-contact people for the entire McGriddle organization talk in a deliberately odd-sounding ‘crazy’ manner for the month of march. We began by calling all our Burgers “Borblers” and when someone asked us why the official response was “What’s borbling you, pie friend?” And so on. We had a team of 70 copywriters working round the clock on standby generating responses for all our “personalized original sales interactions” (POSIs), augmented by our significant investments in AI to create unique terms at all locations around the world, trained on local slang datasets. Some of the phrase creations are already testing well enough in meme-groups that we’re likely to use them on an ongoing basis. So when you next hear “Borble Topside, I’m Going Loose!” shouted as a catchphrase – you can thank our AIs for that.

Things that inspired this story: the logical next-step in social media marketing, GANs, GAN alchemists like Janelle Shane, the arms race in advertising between normalcy and surprise, conditional text generation systems, Salesforce / CRM systems, memes.   

 

Import AI 119: How to benefit AI research in Africa; German politician calls for billions in spending to prevent country being left behind; and using deep learning to spot thefts

African AI researchers would like better code switching, maps, to accelerate research:
The research needs of people in Eastern Africa tells us about some of the ways in which AI development will differ in that part of the world…
Shopping lists contain a lot of information about a person, and I suspect the same might be true of scientific shopping lists that come from a particular part of the world. For that reason a paper from Caltech which outlines requests for machine learning research from members of the East African Tech Scene gives us better context when thinking about the global impact of AI.
  Research needs: Some of the requests include:

  • Support for code-switching within language models; many East Africans rapidly code-switch (move between multiple languages during the same sentence) making support for multiple languages within the same model important.
  • Named Entity Recognition with multiple-use words; many English words are used as names in East Africa, eg “Hope, Wednesday, Silver, Editor”, so it’s important to be able to learn to disambiguate them.
  • Working with contextual cues; many locations in Africa don’t have standard addressing schemes so directions are contextual (eg, my house is the yellow one two miles from the town center) and this is combined with numerous misspellings in written text, so models will need to be able to fuse multiple distinct bits of information to make inferences about things like addresses.
  • Creating new maps in response to updated satellite imagery to help augment coverage of the East African region, accompanied by the deliberate collection of frequent ground-level imagery of the area to account for changing businesses, etc.
  • Due to poor internet infrastructure, spotty cellular service, and the fact “electrical power for devices is carce” one of the main types of request is for more efficient systems, such as models that are designed to run on low-powered devices, and on thinking about ways to add adaptive learning to processes involving surveying so that researchers can integrate new data on-the-fly to make up for its sparsity.

    Reinforcement learning, what reinforcement learning? “No interviewee reported using any reinforcement learning methods”.
      Why it matters; AI is going to be developed and deployed globally, so becoming more sensitive to the specific needs and interests of parts of the world underrepresented in machine learning should further strengthen the AI research community. It’s also a valuable reminder that many problems which don’t generate much media coverage are where the real work is needed (for instance, supporting code-switching in language models).
      Read more: Some Requests for Machine Learning Research from the East African Tech Scene (Arxiv).

DeepMap nets $60 million for self-driving car maps:
…Mapping startup raises money to sell picks and shovels for another resource grab…
A team of mapmakers who previously worked on self-driving-related efforts at Google, Apple, and Baidu, have raised $60 million for DeepMap, in a Series B round. One notable VC participant: Generation Investment Management, a VC firm which includes former vice president Al Gore as a founder. “DeepMap and Generation share the deeply-held belief that autonomous vehicles will lead to environmental and social benefits,” said DeepMap’s CEO, James Wu, in a statement.
  Why it matters: If self-driving cars are, at least initially, not winner-take-all-markets, then there’s significant money to be made for companies able to create and sell technology which enables new entrants into the market. Funding for companies like DeepMap is a sign that VCs think such a market could exist, suggesting that self-driving cars continue to be a competitive market for new entrants.
  Read more: DeepMap, a maker of HD maps for self-driving cars, raised at least $60 million at a $450 million valuation (Techcrunch).

Spotting thefts and suspicious objects with machine learning:
…Applying deep learning to lost object detection: promising, but not yet practical…
New research from the University of Twente, Leibniz University, and Zheijiang University shows both the possibility and limitations of today’s deep learning techniques applied to surveillance. The researchers attempt to train AI systems to detect abandoned objects in public places (eg, offices) and try to work out if these objects have been abandoned, moved by someone who isn’t the owner, or are being stolen.
  How does it work: The system takes in video footage and compares the footage against a continuously learned ‘background model’ so it can identify new objects in a scene as they appear, while automatically tagging these objects with one of three potential states: “if a object presents in the long-term foreground but not in the short-term foreground, it is static. If it presents in both foreground masks, it is moving. If an object has ever presented in the foregrounds but disappears from both of the foregrounds later, it means that it is in static for a very long time.” The system then links these objects with human owners by identifying the people that spend the largest amount of time with them, then they track these people, while trying to guess at whether the object is being abandoned, has been temporarily left by its owner, or is being stolen.
  Results: They evaluate the system on the PETS2006 benchmark, as well as on the more challenging new SERD dataset which is composed of videos taken from four different scenes of college campuses. The model outlined in the paper gets top scores on PETS2006, but does poorly on the more modern SERD dataset, obtaining accuracies of 50% when assessing if an object is moved by a non-owner, though it does better at detecting objects being stolen or being abandoned. “The algorithm for object detection cannot provide satisfied performance,” they write. “Sometimes it detects objects which don’t exist and cannot detect the objects of interest precisely. A better object detection method would boost the framework’s performance.”  More research will be necessary to develop models that excel here, or potentially to improve performance via accessing large datasets to use during pre-training.
  Why it matters: Papers like this highlight the sorts of environments in which deep learning techniques are likely to be deployed, though also suggest that today’s models are still inefficient for some real-world use cases (my suspicion here is that if the SERD dataset was substantially larger we may have seen performance increase further).
  Read more: Security Event Recognition for Visual Surveillance (Arxiv).

Facebook uses modified DQN to improve notification sending on FB.
…Here’s another real-world use case for reinforcement learning…
I’ve recently noticed an increase in the numbers of Facebook recommendations I receive and a related rise in the number of time-relevant suggestions for things like events and parties. Now, research published by Facebook indicates why that might be: the company has recently used an AI platform called ‘Horizon’ to improve and automate aspects of how it uses notifications to tempt people to use its platform.
  Horizon is an internal software platform that Facebook uses to deploy AI onto real-world systems. Horizon’s job is to let people train and validate reinforcement learning models at Facebook, analyze their performance, and run them at large-scale. Horizon also includes a feature called Counterfactual Policy Evaluation, which makes it possible to evaluate the estimated performance of models before deploying them into production. Horizon also incorporates the implementations of the following algorithms: Discrete DQN, Parametric DQN, and DDPG (which is sometimes used for tuning hyperparameters within other domains).
  Scale: “Horizon has functionality to conduct training on many GPUs distributed over numerous machines… even for problems with very high dimensional feature sets (hundreds or thousands of features) and millions of training examples, we are able to learn models in a few hours”, they write.
  RL! What is it good for? Facebook says it recently moved from a supervised learning model that predicted click-through rates on notifications, to “a new policy that uses Horizon to train a Discrete-Action DQN model for sending push notifications”. This system tailors the selection and sending of notifications to individual users based on their implicit preferences, expressed by their interaction with the notifications and learned via incremental RL updates. “We observed a significant improvement in activity and meaningful interactions by deploying an RL based policy for certain types of notifications, replacing the previous system based on supervised learning”, Facebook writes. They also conducted a similar experiment based on giving notifications to administrators of Facebook pages. “After deploying the DQN model, we were able to improve daily, weekly, and monthly metrics without sacrificing notification quality,” they write.
  Why it matters: This is an example for how a relatively simple RL system (Discrete DQN) can yield significant gains against hard-to-specify business metrics (eg, “meaningful interactions”). It also shows how large web platforms can use AI to iteratively improve their ability to target individual users while increasing their ability to predict user behavior and preferences over longer time horizons – think of it as a sort of ever-increasing ‘data&compute dividend’.
  Read more: Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform (Facebook Research).

German politician calls for billions of dollars for national AI strategy:
…If Germany doesn’t invest boldly enough, it risks falling behind…
Lars Klingbeil, general secretary of the Social Democratic Party in Germany, has called for the country to invest significantly in its own AI efforts. “We need a concrete investment strategy for AI that is backed by a sum in the billions,” wrote Klingbeil in an article for Tagesspiegel. “We have to stop taking it easy”.
  Why it matters: AI has quickly taken on a huge amount of symbolic political power, with politicians typically treating success in AI as being a direct sign of the competitiveness of a country’s technology industry; comments like this from the SPD reinforce that image, and are likely to incentivize other politicians to talk about it in a similar way, further elevating the role AI plays in the discourse.
  Read more: Germany needs to commit billions to artificial intelligence: SPD (Reuters).

Faking faces for fun with AI:
…”If we can generate realistic looking faces of any type, what are the implications for our ability to trust in what we see”…
One of the continued open questions around the weaponization of fake imagery is how easy it needs to become for people to do this for it to become economically sensible for people to weaponize the technology (eg, through making faked images of politicians in specific politically-sensitive situations). New work by an independent researcher gives us an indication of what the state of these things is today. The good news: it’s still way too hard to do for us to worry about many actors abusing the technology. The bad news: All of this stuff is getting cheaper to build and easier to operate over time.
  How it works: Shaobo Guan’s research shows how to build a conditional image generation system. The way this works is you can ask your computer to synthesize a random face for you, then you can tweak a bunch of dials to let you change latent variables from which the image is composed, allowing you to manipulate, for instance, the spacing apart of a “person’s” eyes, the coloring of their hair, the size of their sideburns, whether they are wearing glasses, and so on. Think of this as like a combination of an etch-a-sketch, a Police facial composite machine, and an insanely powerful Photoshop filter.
  “A word about ethics”: The blog post is notable for its inclusion of a section that specifically considers the ethical aspects of this work in two ways: 1) because the underlying dataset for the generative tool is limited then if such a tool were put into production it wouldn’t be very representative; 2) “If we can generate realistic looking faces of any type, what are the implications for our ability to trust in what we see”? It’s encouraging to see these acknowledgements in a work like this.
  Why it matters: Posts like this give us a valuable point-in-time sense of what a motivated researcher is able to build relying on relatively small amounts of resources (the project was done during three week as part of an Insight Data Science ‘AI fellow program’). They also help us understand the general difficulties people face when working with generative models.
  Read more: Generating custom photo-realistic faces using AI (Insight Data Science).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

EU AI ethics chief urges caution on regulation:
The chairman of the EU’s new expert body on AI, Pekka Ala-Pietilä, has cautioned against premature regulation, arguing Europe should be focussed now on developing “broad horizontal principles” for ethical uses of AI. He foresees regulations on AI as taking shape as the technology is deployed, and as courts react to emergent issues, rather than ex ante. The high-level expert group on AI plans to produce a set of draft ethical principles in March, followed by a policy and investment strategy.
  Why this matters:  This provides some initial indications of Europe’s AI strategy, which appears to be focussed partly on establishing leadership in the ethics of AI. The potential risks from premature and ill-judged interventions in such a fast-moving field seem high. This cautious attitude is probably a good thing, particularly given Europe’s proclivity towards regulation. Nonetheless, policy-makers should be prepared to react swiftly to emergent issues.
  (Note from Jack: It also fits a pattern common in Europe of trying to regulate for the effects of technologies developed elsewhere – for example, GDPR was in many ways an attempt to craft rules to apply controls to non-European mega platforms like Google and Facebook).
  Read more: Europe’s AI ethics chief: No rules yet, please.

Microsoft will bid on Pentagon AI contract:
Microsoft has reaffirmed its intention to pursue a major contract with the US Department of Defense. The company’s bid on the $10bn cloud-computing project, codenamed JEDI, had prompted some protest from employees. In a blog post, the company said it would “engage proactively” in the discussion around laws and policies to ensure AI is used ethically, and argued that to withdraw from the market (for example, for US military contracts) would reduce the opportunity to engage in these debates in the future. Google withdrew its JEDI bid on the project earlier this year, after significant backlash from employees (though the real reason for the pull out could be that Google lacked all the gov-required data security certifications necessary to field a competitive bid)
  Read more: Technology and the US military (Microsoft).
  Read more: Microsoft Will Sell Pentagon AI (NYT).

Assumptions in ML approaches to AI safety:
Most of the recent growth in AI safety has been in ML-based approaches, which look at safety problems in relation to current, ML-based, systems. The usefulness of this work will depend strongly on the type of advanced AI systems we end up with, writes DeepMind AI safety researcher Victoria Krakovna.
  Consider the transition from horse-carts to cars. Some of the important interventions in horse-cart safety, such as designing roads to avoid collisions, scaled up to cars. Others, like systems to dispose of horse-waste, did not. Equally, there are issues in car safety, e.g. air  pollution, that someone thinking about horse-cart safety could not have foreseen. In the case of ML safety, we should ask what assumptions we are making about future AI systems, how much we are relying on them, and how likely they are to hold up. The post outlines the authors opinions on a few of these key assumptions.
  Read more: ML approach to AI safety (Victoria Krakovna).

Baidu joins Partnership on AI:
Chinese tech giant Baidu has become the first Chinese member of the Partnership on AI. The Partnership is a consortium of AI leaders, which includes all the major US players, focussed on developing ethical best practices in AI.
  Read more: Introducing Our First Chinese Member (Partnership on AI).

Tech Tales:

Generative Adversarial Comedy (CAN!)

[2029: The LinePunch, a “robot comedy club” started 2022 in the South Eastern corner of The Muddy Charles, a pub tucked inside a building near the MIT Media Lab in Boston, Massachusetts]

Two robot comedians are standing on stage at The LinePunch and, as usual, they’re bombing.

“My Face has no nose, how does it smell?” says one of the robots. Then it looks at the crowd, pauses for two seconds, and says: “It smells using its face!”
  The robot opens its hands, as though beckoning for applause.
  “You suck!” jeers one of the humans.
  “Give them a chance,” says someone else.
  The robot that had told the nose joke bows its head and hands the microphone to the robot standing next to it.
  “OK, ladies and germ-till-men,” says the second robot, “why did the Chicken move across the road?”
  “To get uploaded into the matrix!” says one of the spectating humans.
  “Ha-Ha!” says the robot. “That is incorrect. The correct answer is: to follow its friend.”
  A couple of people in the audience chuckle.
  “Warm crowd!” says the robot. “Great joke next joke: three robots walk into a bar. The barman says “Get out, you need to come in sequentially!”
  “Boo,” says one of the humans in the audience.
  The robot tilts its head, as though listening, then prepares to tell another joke…

The above scene will happen on the third tuesday of every month for as long as MIT lets its students run The LinePunch. I’d like to tell you the jokes have gotten better since its founding, but in truth they’ve only gotten stranger. That’s because robots that tell jokes which seem like human jokes aren’t funny (in fact, they freak people out!), so what the bots end up doing at the LinePunch is a kind of performative robot theater, where the jokes are deliberately different to those a human would tell – learned via complex array of inverted feature maps, but funny to the humans nonetheless – learned via human feedback techniques. One day I’m sure the robots will learn to tell jokes to amuse eachother as well.

Things that inspired this story: Drinks in The Muddy Charles @ MIT; synthetic text generation techniques; recurrent neural networks; GANs; performance art; jokes; learning from human preferences.

Import AI 118: AirBnB splices neural net into its search engine; simulating robots that touch with UnrealROX; and how long it takes to build a quadcopter from scratch

Building a quadcopter from scratch in ten weeks:
Modeling the drone ecosystem by what it takes to build one…
The University of California at San Diego recently ran a course where students got the chance to design, build, and program their own drones. A writeup of the paper outlines how the course is structured and gives us a sense of what it takes to build a drone today.
   Four easy pieces: The course breaks building the drones into four phases: designing the PCB, implementing the flight control software, assembling the PCB, and getting the quadcopter flying. Each of this phases has numerous discrete steps which are detailed in the report. One of the nice things about the curriculum is the focus on the cost of errors: “Students ‘pay’ for design reviews (by course staff or QuadLint) with points deduced from their lab grade,” they write. “This incentivizes them to find and fix problems themselves by inspection rather than relying on QuadLint or the staff”.
  The surprising difficulty of drone software: Building the flight controller software for the drone proves to be one of the most challenging aspects of the research because of the numerous potential causes for bugs, so root cause analysis can be challenging.
  Teaching tools: While developing the course the instructors noticed that they were spending a lot of time checking and evaluating PCB designs for correctness, so they designed their own program called ‘QuadLint’ to try to auto-analyze and grade these submissions. “QuadLint is, we believe, the first autograder that checks specific design requirements for PCB designs,” they write.
  Costs: The report includes some interesting details on the cost of these low-powered drones, with the quadcopter itself costing about $35 per PCB plus $40 for the components. Currently, the most expensive component of the course is the remote ($150) and for the next course the teachers are evaluating cheaper options.
  Small scale: The quadcopters all use a PCB to host their electronics and serve as an airframe. They measure less than 10 cm on a side and are suitable for flight indoors over short distances. “The motors are moderately powerful, “brushed” electric motors powered by a small lithium-polymer (LiPo) battery, and we use small, plastic propellers. The quadcopters are easy to operate safely, and a blow from the propeller at full speed is painful but not particularly dangerous. Students wear eye protection around their flying quadcopters.”
  Why it matters: In paper notes that the ‘killer apps’ of the future “will lie at the intersection of hardware, software, sensing, robotics, and/or wireless communications”. This seems true – especially when we look at the chance for major uptake from the success of companies like DJI and the possibility for unit economics driving the price down. Therefore, tracking and measuring the cost and ease with which people can build and assemble them out of (hard to track, commodity) components gives us better intuitions about this aspect of drones+security. While the hardware and software is under-powered and somewhat pricey today it won’t stay that way for long.
  Read more: Trial by Flyer: Building Quadcopters From Scratch in a Ten-Week Capstone Course (Arxiv).

Amazon tries to make Alexa smarter via richer conversational data:
…Who needs AI breakthroughs when you’ve got a BiLSTM, lots of data, and patience?…
Amazon researchers are trying to give personal assistants like Alexa the ability to have long-term, conversations about specific topics. The (rather unsurprising) finding they make in a new research paper is is that you can “extend previous work on neural topic classification and unsupervised topic keyword detection by incorporating conversational context and dialog act features”, yielding personal assistants capable of longer and more coherent conversations than their forebears, if you can afford to annotate the data.
  Data used: The researchers used data collected during the 2017 ‘Alexa Prize’ competition, which consists of over 100,000 utterances containing interactions between users and chatbots. They augmented this data by classifying the topic for each utterance into one of 12 categories (eg: politics, fashion, science & technology, etc), and also trying to classify the goal of the user or chatbot (eg: clarification, information request, topic switch, etc). They also asked other annotators to rank every single chatbot response with metrics relating to how comprehensible  it was, how relevant the response was, how interesting it was, and whether a user might want to continue the conversation with the bot.
  Baselines and BiLSTMs: The researchers implement two baselines (DAN, based on a bag-of-words neural model; ADAN, which is DAN extend with attention), and then develop two versions of a bidirectional LSTM (BiLSTM) system, where one uses context from the annotated dataset and the other doesn’t. They then evaluate all these methods by testing their baselines (which contain only the current utterance) against systems which incorporate context, systems which incorporate data, and systems which incorporate both context and data. The results show that a BiLSTM fed with context in sequence does almost twice as well as a baseline ADAN system that uses context and dialog, and almost 25% better than a DAN fed with both context and dialog.
  Why it matters: The results indicate that – if a developer can afford the labeling cost – it’s possible to augment language interaction datasets with additional information about context and topic to create more powerful systems, which seems to imply that in the language space we can expect to see large companies invest in teams of people to not just transcribe and label text at a basic level, but also perform more elaborate meta-classifications as well. The industrialization of deep learning continues!
  Read more: Contextual Topic Modeling For Dialog Systems (Arxiv).

Why AI won’t be efficiently solving a 2D gridworld quest soon:
…Want humans to be able to train AIs? The key is curriculum learning and interactive learning, says BabyAI creators…
Researchers with the Montreal Institute for Learning Algorithms (MILA) have designed a free tool called BabyAI to let them test AI systems’ ability to learn generalizable skills from curriculums of tasks set in an efficient 2D gridworld environment – and the results show that today’s AI algorithms display poor data efficiency and generalization at this sort of task.
  Data efficiency: BabyAI uses gridworlds for its environment, which the researchers have written to be efficient enough that researchers can use the platform without needing access to vast pools of compute; the BabyAI environments can be run at up to 3,000 frames per second “on a modern multi-core laptop” and can also be integrated with OpenAI Gym).
  A specific language: BabyAI uses “a comparatively small yet combinatorially rich subset of English” called Baby Language. This is meant to help researchers write increasingly sophisticated strings of instructions for agents, while keeping the state space from exploding too quickly.
  Levels as a curriculum: BabyAI ships with 19 levels which increase in difficulty of both the environment, and the complexity of the language required to solve it. The levels test each agent on a variety of 13 different competencies, ranging from things like being able to unlock doors, navigating to locations, ignoring distractors placed into the environment, navigating mazes, and so on. The researchers also design a bot which can solve any of the levels using a variety of heuristics – this bot serves as a baseline against which to train a model.
  So, are today’s AI techniques sophisticated enough to solve BabyAI? The researchers train an imitation learning-based baseline for each level and and assess how well it does. The systems are able to learn to perform basic tasks, but struggle to imitate the expert at tasks that require multiple actions to solve. One of the most intriguing parts of a paper is the analysis of the relative efficiency of systems trained via both imitation and from pure reinforcement learning, which shows that today’s algorithms are wildly inefficient at learning pretty much anything: simple tasks like learning to go to a red ball hidden within a map take 40,000-60,000 demos when using imitation learning, and around 453,000 to 470,000 when learning using reinforcement learning without an expert teacher to attempt to mimic. The researchers also show that using pre-training (where you learn on other tasks before attempting certain levels) does not yield particularly impressive performance, with pre-training yielding at most a 3X speedup.
  Why it matters: Platforms like BabyAI give AI researchers fast, efficient tools to use when tackling hard research projects, while also highlighting the deficiency of many of today’s algorithms. The transfer learning results “suggest that current imitation learning and reinforcement learning methods scale and generalize poorly when it comes to learning tasks with a compositional structure,” they write. “An obvious direction of future research to find strategies to improve data efficiency of language learning.”
  Get the code for BabyAI (GitHub).
  Read more: BabyAI: First Steps Towards Grounded Language Learning with a Human In the Loop (Arxiv).

Simulating robots that touch and see in AAA-game quality detail:
The new question AI researchers will ask: But Can It Simulate Crysis?…
Researchers with the 3D Perception Lab at the University of Alicante have designed UnrealROX, a high-fidelity simulator based on Unreal Engine 4, built for simulating and training AI agents embodied in (simulated) touch-sensitive robots.
  Key ingredients: UnrealROX has the following main ingredients: a simulated grasping system that can be applied to a variety of finger configurations; routines for controlling robotic hands and bodies using commercial VR setups like the Oculus Rift and HTC Vive; a recorder to store full sequences from scenes; and customizable camera locations.
  Drawback: The overall simulator can run at 90 frames-per-second, the researchers note. While this may sound impressive it’s not particularly useful for most AI research unless you can run it far faster than that (compare this with BabyAI, which runs at 3,000 FPS).
  Simulated robots with simulated hands: UnrealROX ships with support for two robots: a simulated ”Pepper’ robot from company Aldebaran, and a spruced-up version of the mannequin that ships with UE4. Both of these robots have been designed with extensible, customizable grasping systems, letting them reach out and interact with the world around them. “The main idea of our grasping subsystem consists in manipulating and interacting with different objects, regardless of their geometry and pose.”
  Simulators, what are they good for? UnrealROX may be of particular interest to researchers that need to create and record very specific sequences of behaviors on robots, or who wish to test the ability to learn useful policies from a relatively small amount of high-fidelity information. But it seems likely that the relative slowness of the simulator will make it difficult to use for most AI research.
  Why it matters: The current proliferation of simulated environments represents a kind of simulation-boom in AI research that will eventually produce a cool historical archive of the many ways in which we might think robots could interact with each other and the world. Whether UnrealROX is used or not, it will contribute to this historical archive.
  Read more: UnrealROX An eXtremely Photorealistic Virtual Realty Environment for Robotics Simulations and Synthetic Data Generation (Arxiv).

AirBnB augments main search engine with neural net, sees significant performance increase:
…The Industrialization of Deep Learning continues…
Researchers with home/apartment-rental service AirBNB have published details on how they transitioned AirBnB’s main listings search engine to a neural network-based system. The paper highlights how deploying AI systems in production is different to deploying AI systems in research. It also sees AirBnB follow Google, which in 2015 augmented its search engine with ‘RankBrain’, a neural network-based system that almost overnight became one of the most significant factors in selecting which search results to display to a user. “”This paper is targeted towards teams that have a machine learning system in place and are starting to think about neural networks (NNs),” the researchers write.
  Motivation: “The very first implementation of search ranking was a manually crafted scoring function. Replacing the manual scoring function with a gradient boosted decision tree (GBDT) model gave one of the largest step improvements in homes bookings in Airbnb’s history,” the researchers write. This performance boost eventually plateaued, prompting them to implement neural network-based approaches to improve search further.
  Keep it simple, (& stupid): One of the secrets about AI research is the gulf between frontier research and production use-cases, where researchers tend to prioritize novel approaches that work on small tasks, and industry and/or large-scale operators prioritize simple techniques that scale well. This fact is reflected in this research, where the researchers started work with a single layer neural net model, moved on to a more sophisticated system, then opted for a scale-up solution as their final product. “We were able to deprecate all that complexity by simply scaling the training data 10x and moving to a DNN with 2 hidden layers.”
  Input features: For typical configurations of the network the researchers gave it 195 distinct input ‘features’ to learn about, which included properties of listings like price, amenities, historical booking count; as well as features from other smaller models.
  Failure: The paper includes a quite comprehensive list of some of the ways in which the Airbnb researchers failed when trying to implement new neural network systems. Many of these failures are due to things like overfitting, or trying to architect too much complexity into certain parts of the system.
  Results: AirBNB doesn’t reveal the specific quantitative performance boost as this would leak some proprietary commercial information, but does include a couple of graphs that shows that the usage of the 2-layer simple neural network leads to a very meaningful relative gain in the number of bookings made using the system, indicating that the neural net-infused search is presenting people with more relevant listings which they are more likely to book. “Overall, this represents one of the most impactful applications of machine learning at Airbnb.,” they write.
  Why it matters: AirBNB’s adoption of deep learning for its main search engine further indicates that deep learning is well into its industrialization phase, where large companies adopt the technology and integrate it into their most important products. Every time we get a paper like this the chance of an ‘AI Winter’ decreases, as it creates another highly motivated commercial actor that will continue to invest in AI research and development, regardless of trends in government and/or defence funding.
  Read more: Applying Deep Learning to AirBNB Search (Arxiv).
  Read more: Google Turning Its Lucrative Web Search Over to AI Machines (Bloomberg News, 2015).

Refining low-quality web data with CurriculumNet:
…AI startup shows how to turn bad data into good data, with a multi-stage weakly supervised training scheme…
Researchers with Chinese computer vision startup Malong have released code and data for CurriculumNet, a technique to train deep neural networks on large amounts of data with variable annotations, collected from the internet. Approaches like this are useful if researchers don’t have access to a large, perfectly labeled dataset for their specific task. But the tradeoff is that the labels on datasets gathered in this way are far noisier than those from hand-built datasets, presenting researchers with the challenge of extracting enough signal from the noise to be able to train a useful network.
  CurriculumNet: The researchers train their system on the WebVision database, which contains over 2,400,000 images with noisy labels. Their approach works by training an Inception_v2 model over the whole dataset, then studying the feature space which all the images are mapped into; CurriculumNet then sorts these images into clusters, then sorts each cluster these into three subsets according to how similar all the images in the set are to eachother in featurespace, with the intuition being that subsets with lots and lots of similar images will be easier to learn from than those which are very diverse. They then start to train a model over this where they start by using the subsets with similar image features, then mix in the noisier subsets. By iteratively learning a classifier from good labels, then adding in ones with noisier ones, the researchers say they are able to increase the generalization of their trained systems.
  Testing: They test CurriculumNet on four benchmarks: WebVision, ImageNet, Clothing1M, and Food101. They find that systems trained using the largest amount of noisy data converge to higher accuracies than those trained without, seeing reductions in error of multiple percentage points on WebVision (“these improvements are significant on such a large-scale challenge,” they write). CurriculumNet gets state-of-the-art results for top-1 accuracy on WebVision, with performance increasing even further when they train on more data (such as combining ImageNet and WebVision).
  Why it matters: Systems like CurriculumNet show how researchers can use poorly-labeled data, combined with clever training ideas, to increase the value of lower-quality data. Approaches like this can be viewed as being analogous to a clever refinement process applied when extracting a natural resources.
  Read more: CurriculumNet: Weakly Supervized Learning from Large-Scale Web Images (Arxiv).
  Get the trained models from Malong’s Github page.

Tech Tales:

[2025: Podcast interview with the inventor of GFY]

Reality Bites, So Change It.
Or: There Can Be Hope For Those of Us Who Were Alone And Those We Left Behind

My Father was struck by a truck and killed while riding his motorbike in the countryside; no cameras, no witnesses; he was alone. There was an investigation but no one was ever caught. So it goes.

At the funeral I told stories about the greatness of my Father and I helped people laugh and I helped people cry. But I could not help myself because I could not see his death. It was as though he opened a door and disappeared before walking through it and the door never closed again; a hole in the world.

I knew many people who had lost friends and parents to cancer or other illnesses and their stories were quite horrifying: black vomit before the end; skeletons with the faces of parents; tales of seeing a dying person when they didn’t know they were being watched and seeing rage and fear and anguish on their face. The retellings of so many bad jokes about not needing to pay electricity bills, wheezed out over hospital food.

I envied these people, because they all had a “goodbye story” – that last moment of connection. They had the moment when they held a hand, or stared at a chest as it heaved in one last breath, or confessed a great secret before the chance was gone. Even if they weren’t there at the last they had known it was coming.

I did not have my goodbye, or the foreshadowing of one. Imagine that.

So that is why I built Goodbye For You(™), or GFY for short. GFY is software that lets you simulate and spend the last few moments with a loved one. It requires data and effort and huge amounts of patience… but it works. And as AI technology improves, so does the ease of use and fidelity of GFY.

Of course, it is not quite real. There are artifacts: improbable flocks of birds, or leaves that don’t fall quite correctly, or bodies that don’t seem entirely correct. But the essence is there: With enough patience and enough of a record of the deceased, GFY can let you reconstruct their last moment, put on a virtual reality haptic-feedback suit, and step into it.

You can speak with them… at the end.  you can touch them and they can touch you. We’re adding smell soon.

I believe it has helped people  Let me try to explain how it worked the first time, all those years ago.

I was able to see the truck hit his bike. I saw his body fly through the air. I heard him say “oh no” the second after impact as he was catapulted off his bike and towards the side of the road. I heard his ribs break as he landed. I saw him crying and bleeding. I was able to approach his body. He was still breathing. I got on my knees and bent over him and I cried and the VR-helmet saw my tears in reality and simulated these tears falling onto his chest – and he appeared to see them, then looked up at me and smiled.
   He touched my face and said “my child” and then he died.

Now I have that memory and I carry it in my heart as a candle to warm my soul. After I experienced this first GFY my dreams changed. It felt as though I had found a way to see him open the door – and leave. And then the door shut.

Grief is time times memory times the rejuvenation of closure: of a sense of things that were once so raw being healed and knitted back together. If you make the memory have closure things seem to heal faster.

Yes, I am still so angry. But when I sleep now I sometimes dream of that memory, and in my imagination we say other things, and in this way continue to talk softly through the years.

Things that inspired this story: The as-yet-untapped therapeutic opportunities afford by synthetic media generation (especially high-fidelity conditional video); GAN progression from 2014 to 2018; compute growth both observed and expected for the next few years; Ander Monson’s story “I am getting comfortable with my grief”.

Import AI: 117: Surveillance search engines; harvesting real-world road data with hovering drones; and improving language with unsupervised pre-training

Chinese researchers pursue state-of-the-art lip-reading with massive dataset:
…What do I spy with my camera eyes? Lips moving! Now I can figure out what you are saying…
Researchers with the Chinese Academic of Sciences and Huazhong University of Science and Technology have created a new dataset and benchmark for “lip-reading in the wild” for Mandarin. Lip-reading gives people a new sensory capability to imbue AI systems with. For instance, lip-reading systems can be used for “aids for hearing-impaired persons, analysis of silent movies, liveness verification in video authentication systems, and so on” the researchers write.
  Dataset details: The lipreading dataset contains 745,187 distinct samples from more than 2,000 speakers, grouped into 1,000 classes, where each class corresponds to the syllable of a Mandarin word composed of one or several Chinese characters. “To the best of our knowledge, this database is currently the largest word-level lipreading dataset and the only public large-scale Mandarin lipreading dataset”, the researchers write. The dataset has also been designed to be dverse so the footage in it consists of multiple different people taken from multiple different camera angles, along with perspectives taken from television broadcasts. This diversity makes the benchmark more closely approximate real world situations whereas previous work in this domain has involved stuff taken from a fixed perspective. They build the dataset by annotating Chinese television using a service provided by iFLYTEK, a Chinese speech recognition company.
  Baseline results: They train three baselines on this dataset – a fully 2D CNN, a fully 3D CNN (modeled on LipNet, research covered in ImportAI #104 from DeepMind and Google) , and a model that mixes 2D and 3D convolutional layers. All of these approaches perform poorly on the new dataset, despite having obtained performances as high as 90% on other more restricted datasets. The researchers implement their models in PyTorch and train them on servers containing four Titan X GPUs with 12GB of memory. The resulting top-5 accuracy results for the baselines on the new Chinese dataset LRW-1000 are as follows:
– LSTM-5: 48.74%
– D3D: 59.80%
– 3D+2D: 63.50%
  Why it matters: Systems for stuff like lipreading are going to have a significant impact on applications ranging from medicine to surveillance. One of the challenges posed by research like this is its inherently ‘dual use’ nature; as the researchers allude to in the introduction of this paper, this work can be used both for healthcare uses as well for surveillance uses (see: “analysis of silent movies”). How society deals with the arrival of these general AI technologies will have a significant impact on the types of societal architectures that will be built and developed throughout the 21st Century. It is also notable to see the emergence of large-scale datasets built by Chinese researchers in Chinese language – perhaps one could measure the relative growth in certain language datasets to model AI interest in the associated countries?
  Read more: LRW-1000: A Naturally Distributed Large-Scale Benchmark for Lip Reading in the Wild (Arxiv).

Want to use AI to study the earth? Enter the PROBA-V Super Resolution competition:
…European Space Agency challenges researchers to increase the usefulness of satellite-gathered images…
The European Space Agency has launched the ‘PROBA-V Super Resolution” competition, which challenges researchers to take in a bunch of photos from a satellite of the same region of the Earth and stitch them together to create a higher-resolution composite.
  Data: The data contains multiple images taken in different spectral bands of 74 locations around the world at each point in time. Images are annotated with a ‘quality map’ to indicate any parts of them that may be occluded or otherwise hard to process. “Each data-point consists of exactly one 100m resolution image and several 300m resolution images from the same scene,” they write.
  Why it matters: Competitions like this provide researchers with novel datasets to experiment with and have a chance of improving the overall usefulness of expensive capital equipment (such as satellites).
Find out more about the competition here at the official website (PROBA-V Super Resolution challenge).

Google releases BERT, obtains state-of-the-art language understanding scores:
…Language modeling enters its ImageNet-boom era…
Google has released BERT, a natural language processing system that uses unsupervised pre-training and task fine-tuning to obtain state-of-the-art scores on a large number of distinct tasks.
  How it works: BERT, which stands for Bidirectional Encoder Representations from Transformers, builds on recent developments in language understanding ranging from techniques like ELMO to ULMFiT to recent work by OpenAI on unsupervised pre-training. BERT’s major performance gains come from a specific structural modification (jointly conditioning on the left and right context in all layers), as well as some other minor tweaks, plus – as is the trend in deep learning these days – training on a larger model using more compute. The approach it is most similar to is OpenAI’s work using unsupervised pre-training for language understanding, as well as work from Fast.ai using similar approaches.
  Major tweak: BERT’s use of joint conditioning likely leads to its most significant performance improvement. They implement this by adding in an additional pre-training objective called the ‘masked language model’ which involves randomly masking input tokens, then asking the model to predict the contents of the masked token based on context – this constraint encourages the network to learn to use more context when completing task, which seems to lead to greater representational capacity and improved performance. They also use Next Sentence Prediction during pre-training to try to train a model that has a concept of relationships of concepts across different sentences. Later they conduct significant ablation studies of BERT and show that these two pre-training tweaks are likely responsible for the majority of the observed performance increase.
  Results: BERT obtains state-of-the-art performance on the multi-task GLUE benchmark, setting new state-of-the-art scores on a wide range of challenging tasks. It also sets a new state-of-the-art score on the ‘SWAG’ dataset – significant, given that SWAG was released earlier this year and was expressly designed to challenge AI techniques, like DL, which may gather a significant amount of performance by deriving subtle statistical relationships within datasets.
  Scale: The researchers train two models, BERTBASE and BERTLARGE. BERTBASE was trained on 4 Cloud TPUs for approximately four days, and BERTLARGE was trained on 16 Cloud TPUs also for four days.
  Why it matters – Big Compute and AI Feudalism: Approaches like this show how powerful today’s deep learning based systems are, especially when combined with large amounts of compute and data. There are legitimate arguments to be made that such approaches are bifurcating research into low-compute and high-compute domains – one of these main BERT models took 16 TPUs (so 64 TPU chips total) trained for four days, putting it out of reach of low-resource researchers. On the plus side, if Google releases things like the pre-trained model then people will be able to use the model themselves and merely pay the training cost to finetune for different domains. Whether we should be content with researchers getting the proverbial crumbs from rich organizations’ tables is another matter, though. Maybe 2018 is the year in which we start to see the emergence of ‘AI Feudalism’.
  Read more: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Arxiv).
Check out this helpful Reddit BERT-explainer from one of the researchers (Reddit).

Using drones to harvest real world driving data:
…Why the future involves lightly-automated aerial robot data collection pipelines…
Researchers with the Automated Driving Department of the Institute for Automotive Engineering at Aachen University have created a new ‘highD’ dataset that captures the behavior of real world vehicles on German highways (technically: autobahns).
  Drones + data: The researchers created the dataset via DJI Phantom 4 Pro Plus drones hovering above roadways which they used to collect natural vehicle trajectories from vehicles driving on German highways around Cologne. The dataset includes post-processed trajectories of 110,000 vehicles including cars and trucks. The datasets consists of 16.5 hours of video spread across 60 different recordings which were were made at six different locations between 2017 and 2018, with each recording having an average length of 17 minutes.
  Augmented dataset: The researchers provide additional labors in the dataset beyond trajectories, categorizing vehicles’ behavior into distinct detected maneuvers, which include: free driving, vehicle following, critical maneuvers, and lane changes.
highD VS NGSIM: The dataset most similar to highD is NGSIM, a dataset developed by the US Department of Transport. highD contains a significantly greater diversity of vehicles as well as being significantly larger, but the recorded distances which the vehicles travel along are shorter, and the German roads have fewer lanes than the American ones used in highD.
  Why it matters: Data is likely going to be crucial for the development of real world robot platforms, like self-driving cars. Techniques like those outlined in this paper show how we can use newer technologies, like cheap consumer drones, to automate significant chunks of the data gathering process, potentially making it easier for people to gather and create large datasets. “Our plan is to increase the size of the dataset and enhance it by additional detected maneuvers for the use in safety validation of highly automated driving,” the researchers write.
Get the data from the official website (highD-dataset.com).
You can access the Matlab and Python code used to handle the data, create visualizations, and extract maneuvers from here (Github).
Read more: The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems (Arxiv).

Building Google-like search engines for surveillance, using AI:
…New research lets you search via a person’s height, color, and gender…
Deep learning based techniques are fundamentally changing how surveillance architecture are being built. Case in point: A new paper from Indian researchers gives a flavor of how deep learning can expand the capabilities of security technology for ‘person retrieval’, which is the task of trying to find a particular person within a set of captured CCTV footage.
  The system: The researchers use Mask R-CNN pre-trained on Microsoft COCO to let people search over CCTV footage from the SoftBioSearch dataset for people via specific height, color, and ‘gender’ (for the purpose of this newsletter we won’t go into the numerous complexities and presumed definitions inherent in the use of ‘gender’ here).
  Results: “The algorithm correctly retrieves 28 out of 41 persons,” the researchers write. This isn’t yet quite to the level of performance where I can imagine people implementing it, but it certainly seems ‘good enough’ for many surveillance cases, where you don’t really care about a few false positives as you’re mostly trying to find candidate targets backed up by human analysis.
  Why it matters: The deployment of artificial intelligence systems is going to radically change how governments relate to their citizens by giving them greater abilities than before to surveil and control them. Approaches like this highlight how flexible this technology is and how it can be used for the sorts of surveillance work that people typically associate with large teams of human analysts. Perhaps we’ll soon hear about intelligence analysts complaining about the automation of their own jobs as a consequence of deep learning.
  Read more: Person Retrieval in Surveillance video using Height, Color and Gender (Arxiv).

Tech Tales:

[2028: A climate-protected high-rise in a densely packed ‘creatives’ district of an off-the-charts Gini-coefficient city]

We Noticed You Liked This So Now You Have This And You Shall Have This Forever

The new cereal arrived yesterday. I’m already addicted to it. It is perfect. It is the best cereal I have ever had. I would experience large amounts of pain to have access to this cereal. My cereal is me; it has been personalized and customized. I love it.

I had to invest to get here. Let us not speak of the first cereals. The GAN-generated “Chocolate Rocks” and “Cocoa Crumbles” and “Sweet Black Bean Flakes”. I shudder to think of these. Getting to the good cereal takes time. I gave much feedback to the company, including giving them access to my camera feeds, so their algorithms can watch me eat. Watch me be sick.

One day I got so mad that the cereal had been bad for so long that I threw it across the room and didn’t have anything else for breakfast.

Thank You For Your Feedback, Every Bit of Feedback Gets us Closer to Your Perfect Cereal, they said.
I believed them.

Though I do not have a satisfying job, I now start every morning with pride. Especially now, with the new cereal. This cereal reflects my identity. The taste is ideal. The packaging reminds me of my childhood and also simulates a new kind of childhood for me, filling the hole of no-kids that I have. I am very lonely. The cereal has all of my daily nutrients. It sustains me.

Today, the company sent me a message telling me I am so valuable they want me to work on something else. Why Not Design Your Milk? They said. This makes sense. I have thrown up twice already. One of the milks was made with seaweed. I hated it. But I know because of the cereal we can get there: we can develop the perfect milk. And I am going to help them do it and then it will be mine, all mine.

And they say our generation is less exciting than the previous ones. Answer me this: did any of those old generations who fought in wars design their own cereal in companion with a superintelligence? Did any of them know the true struggle of persisting in the training of something that does not understand you and does not care about you, but learns to? No. They had children, who already like you, and partners, who want to be with you. They did not have this kind of hardness.

The challenge of our lifetime is to suffer enough to make the perfect customization. Why not milk? They ask me. Why not my own life, I ask them? Why not customize it all?

And they say religion is going out of fashion!

Things that inspired this story: GANs; ad-targeting; the logical end point of Google and Facebook and all the other stateless multinationals expanding into the ‘biological supply chain’ that makes human life possible; the endless creation of new markets within capitalism; the recent proliferation of various ‘nut milks’ taken to their logical satirical end point; hunger; the shared knowledge among all of us alive that our world is being replaced by simulcras of the world and we are the co-designers of these paper-thin realities.

Import AI 116: Think robots are insecure? Prove it by hacking them; why the UK military loves robots for logistics; Microsoft bids on $10bn US DoD JEDI contract while Google withdraws

‘Are you the government? Want to take advantage of AI in the USA? Here’s how!’ says thinktank:
….R-Street recommends politicians focus on talent, data, hardware, and other key areas to ensure America can benefit from advances in AI…
R-Street, a Washington-based thinktank whose goal is to “promote free markets and limited, effective government” has written a paper recommending how the US can take advantage of AI.
  Key recommendations: R Street says that the scarce talent market for AI disproportionately benefits deep pocketed incumbents (such as Google) that can outbit other companies. “If there were appropriate policy levers to increase the supply of skilled technical workers available in the United States, it would disproportionately benefit smaller companies and startups,” they write.
  Talent: Boost Immigration: In particular, they highlight immigration as an area where the government may want to consider instituting changes, for instance by creating a new class of technical visa, or expanding H-1Bs.
  Talent: Offset Training Costs: Another approach could be to allow employers to detect the full costs of training staff in AI, which would further incentivize employers to increase the size of the AI workforce.
  Data: “We can potentially create high-leverage opportunities for startups to compete against established firms if we can increase the supply of high-quality datasets available to the public,” R Street writes. One way to do this can be to analyze data held by the government with “a general presumption in favor of releasing government data, even if the consumer applications do not appear immediately obvious”.
  Figure out (fair use X data X copyright): One of the problems AI is already causing is how it intersects with our existing norms and laws around intellectual property, specifically copyright law. A key question that needs to be resolved is figuring out how to assess data in terms of fair use when looking at AI systems – which will tend to consume vast amounts of data and use this data to create outputs that could, in certain legal lights, be viewed as ‘derivative works’, which would provide disincentives to people looking to develop AI.
   “Given the existing ambiguity around the issue and the large potential benefits to be reaped, further study and clarification of the legal status of training data in copyright law should be a top priority when considering new ways to boost the prospects of competition and innovation in the AI space,” they write.
   Hardware: The US government should be mindful about how the international distribution of semiconductor manufacturing infrastructure could come into conflict with national strategies relating to AI and hardware.
  Why it matters: Analyses like this show how traditional policymakers are beginning to think about AI and highlights the numerous changes needed for the US to fully capitalize on its AI ecosystem. At a meta level, the broadening of discourse around AI to extend to Washington thinktanks seems like a further sign of the ‘industrialization of AI’, in the sense that the technology is now seen as having significant enough economic impacts that policymakers should start to plan and anticipate the changes it will bring.
  Read more: Reducing Entry Barriers in the Development and Application of AI (R Street).
  Get the PDF directly here.

Tired: Killer robots.
Wired: Logistics robots for military re-supply!
…UK military gives update on ‘Autonomous Last Mile Resupply’ robot competition…
The UK military is currently experimenting with new ways to deliver supplies to frontline troops – and it’s looking to robots to help it out. To spur research into this area a group of UK government organizations are hosting the The Autonomous Last Mile ReSupply (ALMRS) competition.
  ALRMS is currently in phase two, in which five consortiums led by Animal Dynamics, Barnard Microsystems, Fleetonomy, Horiba Mira, and Qinetic, will build prototypes of their winning designs for testing and evaluation, receiving funding of around ~£3.8million over the next few months.
  Robots are more than just drones: Some of the robots being developed for ALMRS include autonomous powered paragliders, a vertical take-off and land (VTOL) drone, autonomous hoverbikes, and various systems for autonomous logistics resupply and maintenance.
  Why it matters: Research initiatives like this will rapidly mature applications at the intersection of robotics and AI as a consequence of military organizations creating new markets for new capabilities. Many AI researchers expect that contemporary AI techniques will significantly broaden the capabilities of robotic platforms, but so far hardware development has lagged software. With schemes like ALMRS, hardware may get a boost as well.
  Read more: How autonomous delivery drones could revolutionise military logistics (Army Technology news website).

Responsible Computer Science Challenge offers $3.5million in prizes for Ethics + Computer Science courses:
…How much would you pay for a more responsible future?…
Omidyar Network, Mozilla, Schmidt Futures and Craig Newmark Philanthropies are putting up $3.5million to try to spur the development of more socially aware computer scientists. The challenge has two phases:
– Stage 1 (grants up to $150,000 per project): “We will seek concepts for deeply integrating ethics into existing undergraduate computer science courses”. Winners announced April 2019.
– Stage 2 (grants up to $200,000): “We will support the spread and scale of the most promising approaches”.
   Deadline: Applications will be accepted from now through to December 13 201.
   Why it matters: Computers are general purpose technologies, and so encouraging computer science practitioners to think about the ethical component of their work in a holistic, coupled manner, may yield to radical new designers for more positive and aware futures.
  Read more: Announcing a Competition for Ethics in Computer Science, with up to $3.5 Million in Prizes (Mozilla blog).

Augmenting human game designers with AI helpers:
…Turn-based co-design system lets an agent learn how you like to design levels…
Researchers with the Georgia Institute of Technology have developed a 2D platform game map editor which is augmented with a deep reinforcement learning agent that learns to suggest level alterations based on the actions of the designer.
  An endearing, frustrating experience: Like most things involving the day-to-day use of AI the process can be a bit frustrating: after the level designer tries to create a series of platforms with gaps to open space below the AI persists in filling these holes in with its suggestions – despite getting a negative RL reward each time. “As you can see this AI loves to fill in gaps, haha,” says Matthew at one point.
  Creative: But it can also come up with interesting ideas. At one point the AI suggests a pipe flanked at the top on each side by single squares. “I don’t hate this. And it’s interesting because we haven’t seen this before,” he says. At another point he builds a mirror image of what the AI suggests, creating an enclosed area.
  Learning with you: The AI learns to transfer some knowledge between levels, as shown in the video. However, I expect it needs greater diversity and potentially larger game spaces to show what it can really do.
  Why it matters: AI tools can give all types of artists new tools with which to augment their own intelligence, and it seems like the adaptive learning capabilities of today’s RL+supervised learning techniques can make for potentially useful allies. I’m particularly interested in these kind of constrained environments like level design where you ultimately want to follow a gradient towards an implicit goal.
  Watch the video of Matthew Guzdial narrating the level editor here (Youtube).
 Check out the research paper here: Co-Creative Level Design with Machine Learning (Arxiv).

Think robots are insecure? Think you can prove it? Enter a new “capture the flag” competition:
…Alias Robotics’ “Robot CTF” gives hackers nine challenges to test their robot-compromising skills…
Alias Robotics, a Spanish robot cybersecurity company,, has released the Robotics Capture The Flag (RCTF), a series of nine scenarios designed to challenge wannabe-robot hackers. “The Robotics CTF is designed to be an online game, available 24/7, launchable through any web browser and designed to learn robot hacking step by step,” they write.
  Scenarios: The RCTF consists of nine scenarios that will challenge hackers to exfiltrate information from robots, snoop on robot operating system (ROS) traffic, find hardcoded credentials in ROS source code, and so on. One of the scenarios is listed as “coming soon!” and promises to give wannabe hackers access to “an Alias Robotics’ crafted offensive tool”.
  Free hacks! The researchers have released the scenarios under an open source TK license on GitHub. “We envision that as new scenarios become available, the sources will remain at this repository and only a subset of them will be pushed to our web servers http://rctf.aliasrobotics. com for experimentation. We invite the community of roboticists and security researchers to play online and get a robot hacker rank,” they write.
  Why it matters: Robotics are seen as one of the next frontiers for contemporary AI research and techniques, but as this research shows – and other research on hacking physical robots published in ImportAI #109 – the substrates on which many robots are built are still quite insecure.
  Read more: Robotics CTF (RCTF), A Playground for Robot Hacking (Arxiv).
  Check out the competition and sign-up here (Alias Robotics website).

Fighting fires with drones and deep reinforcement learning:
…Forest fire: If you can simulate it, perhaps you can train an AI system to monitor it?…
Stanford University researchers have used reinforcement learning to train drones in simulators to spot wildfires better than supervised baselines. The project highlights how many complex real world tasks, like wildfire monitoring, can be represented as POMDPs (partially observable markov decision processes) which are tractable for reinforcement learning algorithms.
  The approach works like this: The researchers build a simulator that lets them simulate wildfires in a grid-based way. They then populate this system with some simulated drones and use reinforcement learning to train the drones to effectively survey the fire and, most crucially, stay with the ‘fire front’, which is the expanding frontier of it and therefore the part with the greatest potential safety impact. “Each aircraft will get an observation of the fire relative to its own location and orientation. The observations are modeled as an image obtained from the true wildfire state given the aircraft’s current position and heading direction,” they write.
  Rewards: The reward function is structured as follows: The aircraft gets penalties for distances from fire front, for high bank angles, for closeness to other aircraft, and for being near too many non-burning cells.
  Belief: The researchers also experiment with what they call a “belief-based approach” which involves training the drones to create a shared “belief map”, which is a map of their environment indicating whether they believe particular cells will contain fire or not, and this map is updated with real data taken during the simulated flight. This is different to an observation-based approach, which purely focuses on the observations seen by these drones.
  Results: Two aircraft with nominal wildfire seed: Both the belief-based and observation-based methods obtain significantly higher rewards than a hand-programmed ‘receding horizon’ baseline. There is no comparison to human performance, though. The belief-based technique does eventually obtain a slightly higher final performance than the observation-based version, but it takes longer to converge to a good solution.
  Results: Greater than two aircraft: The system is able to scale to dealing with numbers of aircraft greater than two, but this requires the tweaking of a proximity-based reward to discourage collisions.
  Results: Different wildfires: The researchers test their system on two differently shaped wildfires (a t-shape and an arc) and show that both RL-based methods exceed performance of the baseline, and that the belief-based system in particular does well.
  Why it matters: We’ve already seen states like California use human-piloted drones to help emergency responders deal with wildfires. As we head into a more dangerous future defined by an increase in the rate of extreme weather events driven by global warming I am curious to see how we might use AI techniques to create certain autonomous surveillance and remediation abilities, like those outlined in this study.
  Caveat: Like all studies that show success in simulation, I’ll retain some skepticism till I see such techniques tested on real drones in physical reality.
   Read more: Distributed WIldfire Surveillance With Autonomous Aircraft Using Deep Reinforcement Learning (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Pentagon’s AI ethics review taking shape:
   The Defense Innovation Board met last week to present some initial findings of their review of the ethical issues in military AI deployment. The DIB is the Pentagon’s advisory panel of experts drawn largely from tech and academia. Speakers covered issues ranging from autonomous weapons systems, to the risk posed by incorporating AI into existing nuclear weapons systems.
   The Board plans to present their report to Congress in April 2019.
  Read more: Defense Innovation Board to explore the ethics of AI in war (NextGov)
  Read more: DIB public meeting (DoD)

Google withdraws bid for $10bn Pentagon contract:
Google has withdrawn its bid for the Pentagon’s latest cloud contract, JEDI, citing uncertainty over whether the work would align with its AI principles.
  Read more: Google drops out of Pentagon’s $10bn cloud competition (Bloomberg).

Microsoft employees call for company to not pursue $10bn Pentagon contract:
Following Google’s decision to not bid on JEDI, people identifying themselves as employees at Microsoft published an open letter asking the company to follow suit, and remove their own bid on the project. (Microsoft submitted a bit for JEDI following the publication of the letter.)
  Read more: An open letter to Microsoft (Medium).

Future of Humanity Institute receives £13m funding:
FHI, the multidisciplinary institute at the University of Oxford led by Nick Bostrom, has received a £13.3m donation from the Open Philanthropy Project. This represents a material uptick in funding for AI safety research. The field as a whole, including work done in universities, non-profits and industry, spent c.$10m in 2017, $6.5m and c.$3m in 2015, according to estimates from the Center for Effective Altruism.
  Read more: £13.3m funding boost for FHI (FHI).
  Read more: Changes in funding in the AI safety field (CEA).

Tech Tales:

The Watcher We Nationalized

So every day when you wake up as the head of this government you check The Watcher. It has an official name – a lengthy acronym that expands to list some of the provenance of its powerful technologies – but mostly people just call it The Watcher or sometimes The Watch and very rarely Watcher.

The Watcher is composed of intelligence taps placed on most of the world’s large technology companies. Data gets scraped out of them and combined with various intelligence sources to give the head of state access to their own supercharged search engine. Spook Google! Is what British tabloids first called it. Fedbook! Is what some US press called it. And so on.

All you know is that you start your day with The Watcher and you finish your day with it. When you got into office, several years ago, you were met by a note from your predecessor. Nothing you do will show up in Watcher, unless something terrible happens; get used to it, read the note.

They were right, mostly. Your jobs bill? Out-performed by some viral memes relating to a (now disgraced) celebrity. The climate change investment? Eclipsed by a new revelation about a data breach at one of the technology companies. In fact, the only thing so far that registered on The Watcher from your part of the world was a failed suitcase bombing attempt on a bank.

Now, heading towards the end of your premiership, you hold onto this phrase and say it to yourself every morning, right before you turn on The Watcher and see what the rhythm of the world says about the day to come. “Nothing you do will show up in Watcher, unless something terrible happens; get used to it”, you say to yourself, then you turn it on.

Things that inspired this story: PRISM, intelligence services, governments built on companies like so many houses of cards, small states, Europe, the tedium of even supposedly important jobs, systems.

Import AI 115: What the DoD is planning for its robots over the next 25 years; AI Benchmark identifies 2018’s speediest AI phone; and DeepMind embeds graph networks into AI agents

UK military shows how tricky it’ll be to apply AI to war:
…Numerous AI researchers likely breathe a sigh of relief at new paper from UK’s Defence Science and Technology Laboratory…
Researchers with the UK’s Defence Science and Technology Laboratory, Cranfield Defense and Security Doctoral Training Centre, and IBM, have surveyed contemporary AI and thought about ways it can be integrated with the UK’s defence establishment. The report makes for sobering reading for large military organizations keen to deploy AI, highlighting the difficulties in terms of practical deployment (eg, procurement) and in terms of capability (many military situations require AI systems that can learn and update in response to sparse, critical data.
  Current problems: Today’s AI systems lack some key capabilities that militiaries need when deploying systems, like being able to configure systems to always avoid certain “high regret” occurrences (in the case of a military, you can imagine that firing a munition at an incorrect target (hopefully) yields such ‘high regret); being resilient to adversarial examples being weaponized against systems via another actor (whether a defender or aggressor); being able to operate effectively with very small or sparse data; being able to shard AI systems across multiple partners (eg, other militaries) in such a way that the system can be reverted to sovereign control following the conclusion of an operation; and begin able to deploy such systems into the harsh low-compute operational environment that militaries face.
  High expectations: “If it is to avoid the sins of its past, there is the need to manage stakeholder expectations very carefully, so that their demand signal for AI is pragmatic and achievable”.
  It’s the data, stupid: Militaries, like many large government organizations, have an unfortunate tendency to sub-contract much of their IT systems out to other parties. This tends to lead to systems that are:
a) moneypits
b) brittle
c) extremely hard to subsequently extend.
These factors add a confounding element to any such military deployment of AI. “Legacy contractual decisions place what is effectively a commercial blocker to AI integration and exploitation in the Defence Equipment Program’s near-term activity,” the researchers write.
  Procurement: UK defence will also need to change the way it does procurement so it can maximize the number of small-and-medium-sized enterprises it can buy its AI systems from. But buying from SMEs creates additional complications for militaries, as working out what to do with the SME-supported service if the SME stops providing it, or goes bankrupt, is difficult and imposes a significant burden on the part of the SME.
  Why it matters: Military usage of AI is going to be large-scale, consequential, and influential in terms of geopolitics. It’s also going to invite numerous problems from AI accidents as a consequence of poor theoretical guarantees and uneven performance properties, so it’s encouraging to see a military organization like representatives from UK defence seek to think through this.
  Read more: A Systems Approach to Achieving the Benefits of Artificial Intelligence in UK Defence (Arxiv).

Want to understand the mind of another? Get relational!
…DeepMind research into combining graph networks and relational networks shows potential for smarter, faster agents…
DeepMind researchers have tried to develop smarter AI agents by combining contemporary deep learning techniques with recent work by company on graph networks and relational networks. The resulting systems rely on a new module, which DeepMind calls a “Relational Forward Model”. This model obtains higher performance than pure-DL baselines, suggesting that fusing DL and more structured approaches is a viable approach which yields good performance.
  How it works: The RFM module consists of a graph network encoder, a graph network decoder, and a graph-compatible GRU. Combined, these components create a way to represent structured information in a relational manner, and to update this information in response to changes in the environment (or, theoretically, the inputs of other larger structured systems).
  Testing: The researchers test their approach on three distinct tasks: cooperative navigation, which requires agents to collaborate to efficiently navigate to both be on a distinct reward tile in an area; coin game, which requires agents to position themselves above some reward coins and to figure out by observing eachother which coins yield a negative reward and thus should be avoided; and stag hunt, where agents inhabit a map containing stags and apples, and need to work with one another to capture stags which yield a significant reward. “By embedding RFM modules in RL agents, they can learn to coordinate with one another faster than baseline agents, analogous to imagination-augmented agents in single-agent RL settings” , the researchers write.
   The researchers compare the performance of their systems against systems using Neural Relational Inference (NRI) and Vertex Attention Interaction Networks (VAIN) and find that their approach displays significantly better performance than other approaches. They also ablated their system by training versions without the usage of relational networks, and ones using feedforward networks only. These ablations showed that both components have a significant role in the performance of these systems.
  Why it matters: The research is an extension of DeepMind’s work on integrating graph networks with deep learning. This line of research seems promising because it provides a way to integrate structured data representations with differentiable learning systems, which might let AI researchers have their proverbial cake and eat it to by being able to marry the flexibility of learned systems with the desirable problem specification and reasoning properties of more traditional symbolic approaches.
  Read more: Relational Forward Models for Multi-Agent Learning (Arxiv).

Rethink Robotics shuts its doors:
…Collaborative robots pioneer closes…
Rethink Robotics, a robot company founded by MIT robotics legend Rodney Brooks, has closed. The company had developed two robots; Baxter, a two-armed bright red robot with expressive features and the pleasing capability to work around humans without killing them, and Sawyer, a one-armed successor to Baxter.
  Read more: Rethink Robotics Shuts Down (The Verge).

Want to know what the DoD plans for unmanned systems through to 2042? Read the roadmap:
….Drones, robots, planes, oh my! Plus, the challenges of integrating autonomy with military systems…
The Department of Defense has published its (non-classified) roadmap for unmanned systems through to 2042. The report identifies four core focus areas that we can expect DoD to focus on. These are: Interoperability, Autonomy, Network Security, and Human-Machine Collaboration.
  Perspective: US DoD spent ~$4.245 billion on unmanned systems in 2017 (inclusive of procurement and research, with a roughly equal split between them). That’s quite a substantial amount of money to spend and, if we can assume that this will remain the same (adjusted for inflation), then that means DoD can throw quite significant resources towards the capital R parts of unmanned systems research.
  Short-Term Priorities: DoD’s short-term priorities for its unmanned systems include: the use of standardized and/or open architectures; a shift towards modular, interchangeable parts; a greater investment in the evaluation, verification, and validation of systems; the creation of a “data transport” strategy to deal with the huge floods of data coming from such systems; among others.
  Autonomy priorities: DoD’s priorities for adding more autonomy to drones includes increasing private sector collaboration in the short term and then adding in augmented reality and virtual reality systems by the mid-term (2029), before creating platforms capable of persistent sensing with “highly autonomous” capabilities by 2042. As for the thorny issue of weaponizing such systems, DoD says that between the medium-term and long-term it hopes to be able to give humans an “armed wingman/teammate” with fire control remaining with the human.
  Autonomy issues: “Although safety, reliability, and trust of AI-based systems remain areas of active research, AI must overcome crucial perception and trust issues to become accepted,” the report says. “The increased efficiency and effectiveness that will be realized by increased autonomy are currently limited by legal and policy constraints, trust issues, and technical challenges.”
  Why it matters: The maturation of today’s AI techniques mean that it’s a matter of “when” not “if” for them to be integrated into military systems. Documents like this give us a sense of how large, military bureaucracies are reacting to the rise of AI, and it’s notable that certain concerns within the technical community about the robustness/safeness of AI systems has made its way into official DoD planning.
  Read the full report here: Pentagon Unmanned Systems Integrated Roadmap 2017-2042 (USNI News).

Should we take deep learning progress as being meaningful?
…UCLA Computer Science chair urges caution…
Adnan Darwiche, chairman of the Computer Science Department at UCLA and someone who studied AI mid-winter in the 1980s, has tried to lay out some of the reasons to be skeptical about whether deep learning will ever scale to let us build truly intelligent systems. The crux of his objection is: “Mainstream scientific intuition stands in the way of accepting that a method that does not require explicit modeling or sophisticated reasoning is sufficient for reproducing human-level intelligence”.
  Curve-fitting: The second component of the criticism is that people shouldn’t get too excited about neural network techniques because all they really do is curve-fitting, and instead we should be looking at using model-based approaches, or making hybrid systems.
  Time is the problem: “It has not been sustained long enough to allow sufficient visibility into this consequential question: How effective will function-based approaches be when applied to new and broader applications than those already targeted, particularly those that mandate more stringent measures of success?”
  Curve-fitting can’t explain itself: Another problem identified by the author is the lack of explanation inherent to these techniques, which they see as further justifying investment by the deep learning community into model-based approaches which include more assumptions and/or handwritten sections. “Model-based explanations are also important because they give us a sense of “understanding” or “being in control” of a phenomenon. For example, knowing that a certain diet prevents heart disease does not satisfy our desire for understanding unless we know why.”
  Giant and crucial caveat: Let’s be clear that this piece is essentially reacting to a cartoonish representation of the deep learning AI community that can be caricatured as having this opinion: Deep Learning? Yeah! Yeah! Yeah! Deep Learning is the future of AI! I should note that I’ve never met anyone technically sophisticated who has this position, and most researchers when pressed will raise somewhat similar concerns to those identified in this article. I think some of the motivation for this article stems more from dissatisfaction with the current state of (most) media coverage regarding AI which tends to be breathless and credulous – this is a problem, but as far as I can tell it isn’t really a problem being fed intentionally by people within the AI community, but is instead a consequence of the horrific economics of the post-digital news business and associated skill-rot that occurs.
  Why it matters: Critiques like this are valuable as they encourage the AI community to question itself. However, I think that these critiques need to be manufactured over significantly shorter timescales and should take into account more contemporary research; for instance, some of the objections here seem to be (lightly) rebutted by recent work in NLP which shows that “curve-fitting” systems are capable of feats of reasoning, among other examples. (In the conclusion of this article it says the first draft was written in 2016, then a draft was circulated in the summer of 2017, and now it has been officially published in Autumn 2018, rendering many of its technical references outdated.)
  Read more: Human-level intelligence or animal-like abilities (ACM Digital Library).

Major companies create AI Benchmark and test 10,000+ phones for AI prowess, and a surprising winner emerges:
…Another sign of the industrialization of AI…new benchmarks create standards and standards spur markets…
Researchers with ETH Zurich, Google, Qualcomm, Huawei, MediaTek, and ARM, want to be able to better analyze the performance of AI software on different smartphones and so have created “AI Benchmark” and tested over 10,000 devices against it. AI Benchmark is, a batch of nine tests for mobile devices which has been “designed specifically to test the machine learning performance, available hardware AI accelerators, chipset drivers, and memory limitations of the current Android devices”.
  The ingredients of the AI Benchmark: The benchmark consists of nine deep learning tests: Image Recognition tested on ImageNet using a lightweight MobileNet-V1 architecture, and the same test but implementing a larger Inception-V3 network; Face Recognition performance of an Inception-Resnet-V1 on the VGGFace2 dataset; Image Deblurring using the SRCNN network; Image Super-Resolution with a downscaling factor of 3 using a VSDR network, and the same test but with a downscaling factor of 4 and using an SRGAN; Image Semantic Segmentation via an ICNet CNN; and a general Image Enhancement problem (encompassing things like “color enhancement, denoising, sharpening, texture synthesis”); and a memory limitations test which uses the same network as in the deblurring task while testing it over larger and larger image sizes to explore RAM limitations.
  Results: The researchers tested “over 10,000 mobile devices” on the benchmark. The core test for each of the benchmarks nine evaluations is the millisecond time it takes to run the network. The researchers blend  results of each of the nine tests together into an overall “AI-Score”. The top results, when measured via AI Score, are (chipset, score):
#1: Huawei P20 Pro (HiSilicon Kirin 970, 6519)
#2: OnePlus 6 (Snapdragon 845/DSP, 2053)
#3: HTC U12+ (Snapdragon 845, 1708)
#4: Samsung Galaxy S9+ (Exynos 9810 Octa, 1628)
#5: Samsung Galaxy S8 (Exynos 8895 Octa, 1413)
   It’s of particular interest to me that the top-ranking performance seems to come from the special AI accelerator which chips with the HiSilicon chip, especially given that it is a Chinese semiconductor company so provides more evidence of Chinese advancement in this area. It’s also notable to me that Google’s ‘Pixel’ phones didn’t make the top 5 (though they did make the top 10).
  The future: This first version of the benchmark may be slightly skewed due to Huawei managing to chip a device incorporating a custom AI accelerator earlier than many other chipmakers. “The real situation will become clear at the beginning of the next year when the first devices with the Kirin 980, the MediaTek P80 and the next Qualcomm and Samsung Exynos premium SoCs will appear on the market,” the researchers note.
  Full results of this test are available at the official AI Benchmark website.
  Why this matters: I think the emergence of new large-scale benchmarks for applied AI applications represent further evidence for the current era being ‘the Industrialization of AI’. Viewed through this perspective, the creation (and ultimate adoption) of benchmarks gives us a greater ability to model macro progress indicators in the field and use those to better predict not only where hardware & software is today, but also to be able to develop better intuitions about underlying laws that condition the future as well.
  Read more: AI Benchmark: Running Deep Neural Networks on Android Smartphones (Arxiv).
  Check out the full results of the Benchmark here (AI Benchmark).

Toyota researchers propose new monocular depth estimation technique:
…Perhaps a cyclops can estimate depth just as well as a person with two eyes, if deep learning can help?…
Any robot expected to act within the world and around people needs some kind of depth-estimation capability. Such a capability will aid them in estimating the proximity of objects to the car and be a valuable data input for safety-critical calculations like modelling the other entities in the environment and also performing velocity calculations. Therefore, depth estimation systems can be viewed as a key input technology for any self-driving car.
  But depth estimation systems can be difficult to implement, and they can sometimes be expensive as the typical way to do it is to implement a binocular system similar to how humans have two eyes and then use software to offset the differences and use that to estimate depth. But what if you can only afford one sensor? And what if you have a certain accuracy threshold which can be satisfied by somewhat lower accuracy than you would expect to get with binocular vision, but still good enough for your use case?  Then you might want to estimate depth from a single sensor – if so, new deep learning techniques in monocular upscaling and super-resolution might be able to augment and manipulate the data to perform accurate depth estimation in a self-supervised manner.
  That’s the idea behind a technique from the Toyota Research Institute, which proposes a depth estimation technique that uses encoder and decoder networks to learn a good representation of depth that can be applied to new images. This new technique obtains higher accuracy scores for depths of various ranges, setting state of the art scores on 5 out of 6 benchmarks. It relies on the usage of a “sub-pixel convolutional layer based on ESPCN for depth super-resolution”. This component “synthesizes the high-resolution disparities from their corresponding low-resolution multi-scale model outputs”.
  Qualitative evaluation: Samples generated by the dataset display greater specificity and smoothness than others. This is in part due to the use of the sub-pixel resolution technique. This technique yields an effect in samples shown in the paper that strikes me as being visually similar to the outcomes of an anti-aliasing process within traditional computer graphics.
  Read more: SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

California considers Turing Test law:
California’s Senate is considering a bill making it unlawful to use bots to mislead individuals about their artificial identities in order to influence their purchases or voting behaviour. The bill appears to be focused on a few specific use-cases, particularly social media bots. The proposed law would come into force in July 2019.
  Why it matters: This law points to an issue that will become increasingly important as AI systems’ ability to mimic humans improves. This received attention earlier this year when Google demonstrated their Duplex voice assistant mimicking a human to book appointments. After significant backlash, Google announced the system would make a verbal disclosure that it was an AI. Technological solutions will be important in addressing issues around AI identification, particularly since bad actors are unlikely to be concerned with lawfulness.
  Read more: California Senate Bill 1001.

OpenAI Bits & Pieces:

Digging into AI safety with Paul Christiano:
Ever wondered about technical solutions to AI alignment, what the long-term policy future looks like when the world contains intelligent machines, and how we expect machine learning to interact with science? Yes? Then check out this 80,000 hours podcast with Paul Christiano of OpenAI’s safety team.
  Read more: Dr Paul Christiano on how OpenAI is developing real solutions to the ‘AI alignment problem’, and his vision of how humanity will progressively hand over decision-making to AI systems.

Tech Tales:

The Day We Saw The Shadow Companies and Ran The Big Excel Calculation That Told Us Something Was Wrong.

A fragment of a report from the ‘Ministry of Industrial Planning and Analysis, recovered following the Disruptive Event. See case file #892 for further information. Refer to [REDACTED] for additional context.

Aluminium supplier. Welder. Small PCB board manufacturer. Electronics contractor. Solar panel farm. Regional utility supplier. Mid-size drone designer. 3D world architect.

What do these things have in common? They’re all businesses, and they all have, as far as we can work out, zero employees. Sure, they employ some contractors to do some physical work, but mostly these businesses are run on a combination of pre-existing capital investments, robotic process automation, and the occasional short-term set of human hands.

So far, so normal. We get a lot of automated companies these days. What’s different about this is the density of trades between these companies. The more we look at their business records, the more intra-company activity we see.

One example: The PCB boards get passed to an electronics contractor which does… something… to them, then they get passed to a mid-size drone designer which does… something… to them, then a drone makes its way to a welder which does… something… to the drone, then the drone gets shipped to the utility supplier and begins survey flights of the utility field.

Another example: The solar panel gets shipped to the welder. Then the PCB board manufacturer ships something to the welder. Then out comes a solar panel with some boards on it. This gets shipped to the regional utility supplier which sub-contracts with the welder which comes to the site and does some welding at a specific location overseen by a modified drone.

None of these actions are illegal. And none of our automated algorithms pick these kinds of events up. It’s almost like they’re designed to be indistinguishable from normal businesses. But something about it doesn’t register right to us.

We have a tool we use. It’s called the human to capital ratio. Most organizations these days sit somewhere around 1:5. Big, intensive organizations, like oil companies, sit up around 1:25. When we analyze these companies individually we find that they sit right at the edges of normal distributions in terms of capital intensity. But when we perform an aggregate analysis out pops this number: 1:40.

We’ve checked and re-checked and we can’t bring the number down. Who owns these companies? Why do they have so much capital and so few humans? And what is it all driving towards.

Our current best theory, after some conversations with the people in the acronym agencies, is [REDACTED].

Things that inspired this story: Automated capitalism, “the blockchain”, hiding in plain sight, national economic metric measurement and analysis techniques, the reassuring tone of casework files.

Import AI 114: Synthetic images take a big leap forward with BigGANs; US lawmakers call for national AI strategy; researchers probe language reasoning via HotspotQA

Getting hip to multi-hop reasoning with HotpotQA:
New dataset and benchmark designed to test common sense reasoning capabilities…
Researchers with Carnegie Mellon University, Stanford University, the Montreal Institute for Learning Algorithms, and Google AI, have created a new dataset and associated competition designed to test the capabilities of question answering systems. The new dataset, HotspotQA, is far larger than many prior datasets designed for such tasks, and has been designed to require ‘multi-hop’ reasoning to thereby test the growing sophistication of newer NLP systems at performing increasing cognitive tasks.
  HotpotQA consists of around ~113,000 Wikipedia-based question-answer pairs. Answering these questions correctly is designed to test for ‘multi-hop’ reasoning – the ability for systems to look at multiple documents and perform basic iterative problem-solving to come up with correct answers. These questions were “collected by crowdsourcing based on Wikipedia articles, where crowd workers are shown multiple supporting context documents and asked explicitly to come up with questions requiring reasoning about all of the documents”. These workers also provide the supporting facts they use to answer these questions, providing a strong supervised training set.
  It’s the data, stupid: To develop HotpotQA the researchers needed to themselves create a kind of multi-hop pipeline to be able to figure out what documents to give cloud workers to use to compose questions for. To do this, they mapped the Wikipedia Hyperlink Graph and used this information to build a directed graph, then they try to detect correspondences between these pairs. They also created a hand-made list of categories to use to compare things of similar categories (eg, basketball players, etc).
  Testing: HotpotQA can be used to test models’ capabilities in different ways, ranging from information retrieval to question answering. The researchers train a system to give a baseline and the results show that the (relatively strong baseline) obtains performance significantly below that of a competent human across all tasks (with the exception of certain ‘supporting fact’ evaluations, in which it obtains performance on par with an average human).
  Why it matters: Natural language processing research is currently going through what some have called an ‘ImageNet moment’ following recent algorithmic developments relating to the usage of memory and attention-based systems, which have demonstrated significantly higher performance across a range of reasoning tasks compared to prior techniques, while also being typically much simpler. Like with ImageNet and the associated supervised classification systems, these new types of NLP approaches require larger datasets to be trained on and evaluated against, and as with ImageNet it’s likely that by scaling up techniques to take on challenges defined by datasets like HotpotQA progress in this domain will increase further.
  Caveat: As with all datasets with an associated competitive leaderboard it is feasible that HotpotQA could be relatively easy and systems could end up exceeding human performance against it in a relatively short amount of time – this happened over the past year with the Stanford SQuAD dataset. Hopefully the relatively higher sophistication of HotspotQA will protect against this.
  Read more: HotpotQA website with leaderboard and data (HotpotQA Github).
  Read more: HOTPOTQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Arxiv).

Administrative note regarding ICLR papers:
This week was the deadline for submissions for the International Conference on Learning Representations. These papers are published under a blind review process as they are currently under review. This year, there were 1600 submissions to ICLR, up from 1000 in 2017, 500 in 2016, and 250 in 2015. I’ll be going through some of these papers in this issue and others and will try to avoid making predictions about which organizations are behind which papers so as to respect the blind review process.

Computers can now generate (some) fake images that are indistinguishable from real ones:
BigGAN’s show significant progression in capabilities in synthetic imagery…
The researchers train GAN models with 2-4X the parameters and 8X the batch size compared to prior papers, and also introduce improve the stability of GAN training.
  Some of the implemented techniques mean that samples generated by such GAN models can be tuned, allowing for “explicit, fine-grained control of the trade-off between sample variety and fidelity”. What this means in practice is that you can ‘tune’ how similar the types of generated images are to specific sets of images within the dataset, so for instance if you wanted to generate an image of a field containing a pond you might pick a few images to prioritize in training that contain ponds, whereas if you wanted to also tune the generated size of the pond you might pick images containing ponds of various sizes. The addition of this kind of semantic dial seems useful to me, particularly for using such systems to generate faked images with specific constraints on what they depict.
  Image quality: Images generated via these GANs are of a far superior quality than prior systems, and and can be outputted at relatively large resolutions of 512X512pixels. I encourage you to take a look at the paper and judge for yourself, but it’s evident from the (cherry-picked) samples that given sufficient patience a determined person can now generate photoreal faked images as long as they have a precise enough set of data from which to train on.
  Problems remain: There are still some drawbacks to the approach; GANs are notorious for their instability during training, and developers of such systems need to develop increasingly sophisticated approaches to deal with the instabilities in training that manifest at increasingly larger scales, leading to a certain time-investment tradeoff inherent to the scale-up process. The researchers do devise some tricks to deal with this, but they’re quite elaborate. “We demonstrate that a combination of novel and existing techniques can reduce these instabilities, but complete training stability can only be achieved at a dramatic cost to performance,” they write.
  Why it matters: One of the most interesting aspects of the paper is how simple the approach is: take today’s techniques, try to scale them up, and conduct some targeted research into dealing with some of the rough edges of the problem space. This seems analogous to recent work on scaling up algorithms in RL, where both DeepMind and OpenAI have developed increasingly large-scale training methodologies paired with simple scaled-up algorithms (eg DQN, PPO, A2C, etc).
  “We find that current GAN techniques are sufficient to enable scaling to large models and distributed, large-batch training. We find that we can dramatically improve the state of the art and train models up to 512×512 resolution without need for explicit multiscale methods,” the researchers write.
  Read more: Large Scale GAN Training For High Fidelity Natural Image Synthesis (ICLR 2018 submissions, OpenReview).
  Check out the samples: Memo Akten has pulled together a bunch of interesting and/or weird samples from the model here, which are worth checking out (Memo Akten, Twitter).

Want better RL performance? Try remembering what you’ve been doing recently:
…Recurrent Replay Distributed DQN (R2D2) obtains state-of-the-art on Atari & DMLab by a wide margin…
R2D2 is based on a tweaked version of Ape-X, a large-scale reinforcement learning system developed by DeepMind which displays good performance and sample efficiency when trained at large-scale. Ape-X uses prioritized distributed replay, using a single learner to learn from the experience of numerous distinct actors (typically 256).
  New tricks for old algos: The researchers implement two relatively simple strategies to help them train the R2D2 algorithm to be smarter about how it uses its memory to learn more complex problem-solving strategies. These tweaks are to store the recurrent state in the replay buffer and use it to initialize the network at training time, and “allow the network a ‘burn-in period’ by using a portion of the replay sequence only for unrolling the network and producing a start state, and update the network only on the remaining part of the sequence.”
  Results: R2D2 obtains vastly higher scores than any prior system on these tasks, and, via the large-scale, can be trained to achieve ~1300% human-normalized scores on Atari (a median over 57 games, so it does even better on some, and substantially worse on others). However, in tests on DMLab-30, a set of 3D environments for training agents which is designed to be more difficult than Atari. Here, the system also displays extremely good performance when compared to prior systems.
  It’s all in the memory: The system does well here on some fairly difficult environments, and notably the authors show via some ablation studies that the agent does appear to be using its in-built memory to solve tasks. “We first observe that restricting the agent’s memory gradually decreases its performance, indicating its nontrivial use of memory on both domains. Crucially, while the agent trained with stored state shows higher performance when using the full history, its performance decays much more rapidly than for the agent trained with zero start states. This is evidence that the zero start state strategy, used in past RNN-based agents with replay, limits the agent’s ability to learn to make use of its memory. While this doesn’t necessarily translate into a performance difference (like in MS.PACMAN), it does so whenever the task requires an effective use of memory (like EMSTM WATERMAZE).,” they write.
  Read more: Recurrent Experience Replay In Distributed Reinforcement Learning (ICLR 2018 submissions, OpenReview).

US lawmakers call for national AI strategy and more funding:
…The United States cannot maintain its global leadership in AI absent political leadership from Congress and the Executive Branch…
Lawmakers from the US’s Subcommittee on Information Technology of the House Committee on Oversight and Government Reform have called for the creation of a national strategy for artificial intelligence led by the current administration, as well as more funding for basic research.
  The comments from Chairman Will Hurd and Ranking Member Robin Kelly are the result of a series of three hearings held by that committee in 2018 (Note: I testified at one of them). It’s a short paper and worth reading in full to get a sense of what policymakers are thinking with regard to AI.
  Notable quotes: “The United States cannot maintain its global leadership in AI absent political leadership from Congress and the Executive Branch.” + Government should “increase federal spending on research and development to maintain American leadership with respect to AI” + “It is critical the federal government build upon, and increase, its capacity to understand, develop, and manage the risks associated with this technology’s increased use” + “American competitiveness in AI will be critical to ensuring the United States does not lose any decisive cybersecurity advantage to other nationstates”.
  China: China looms large in the report as a symbol that ‘the United States’ leadership in AI is no longer guaranteed”. One analysis contained within the paper says China is likely “to pass the United States in R&D investments” by the end of 2018″ – significant, considering that the US’s annual outlay of approximately $500 billion makes it the biggest spender on the planet.
  Measurement: The report suggests that “at minimum” the government should develop “a widely agreed upon standard for measuring the safety and security of AI products and applications” and notes the existence of initiatives like The AI Index as good starts.
  Money: “There is a need for increased funding for R&D at agencies like the National Science Foundation, National Institutes of Health, Defense Advanced Research Project Agency, Intelligence Advanced Research Project Agency, National Institute of Standards and Technology, Department of Homeland Security, and National Aeronautics and Space Administration. As such, the Subcommittee recommends the federal government provide for a steady increase in federal R&D spending. An additional benefit of increased funding is being able to support more graduate students, which could serve to expand the future workforce in AI.”
  Leadership: “There is also a pressing need for conscious, direct, and spirited leadership from the Trump Administration. The 2016 reports put out by the Obama Administration’s National Science and Technology Council and the recent actions of the Trump Administration are steps in the right direction. However, given the actions taken by other countries—especially China— Congress and the Administration will need to increase the time, attention, and level of resources the federal government devotes to AI research and development, as well as push for agencies to further build their capacities for adapting to advanced technologies.”
  Read more: Rise of the Machines: Artificial Intelligence and its Growing Impact on US Policy (Homeland Security Digital Library).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

Open Philanthropy Project opens applications for AI Fellows:
The Open Philanthropy Project, the grant-making foundation funded by Cari Tuna and Dustin Moskovitz, is accepting applications for its 2019 AI Fellows Program. The program will provide full PhD funding for AI/ML researchers focused on the long-term impacts of advanced AI systems. The first cohort of AI Fellows were announced in June of this year.
  Key details: “Support will include a $40,000 per year stipend, payment of tuition and fees, and an additional $10,000 in annual support for travel, equipment, and other research expenses. Fellows will be funded from Fall 2019 through the end of the 5th year of their PhD, with the possibility of renewal for subsequent years. We do encourage applications from 5th-year students, who will be supported on a year-by-year basis.”
Read more: Open Philanthropy Project AI Fellows Program (Open Phil).
Read more: Announcing the 2018 AI Fellows (Open Phil).

Google confirms Project Dragonfly in Senate:
Google have confirmed the existence of Project Dragonfly, an initiative to build a censored search engine within China, as part of Google’s broad overture towards the world’s second largest economy. Google’s chief privacy officer declined to give any details of the project, and denied the company was close to launching a search engine in the country. A former senior research scientist, who publicly resigned over Dragonfly earlier this month, had written to Senators ahead of the hearings, outlining his concerns with the plans.
  Why it matters: Google is increasingly fighting a battle on two fronts with regards to Dragonfly, with critics concerned about the company’s complicity in censorship and human rights abuses, and others suspicious of Google’s willingness to cooperate with the Chinese government so soon after pulling out of a US defense project (Maven).
  Read more: Google confirms Dragonfly in Senate hearing (VentureBeat).
  Read more: Former Google scientist slams ‘unethical’ Chinese search project in letter to senators (The Verge).

DeepMind releases framework for AI safety research:
AI company also launches new AI safety blog…
DeepMind’s safety team have launched their new blog with a research agenda for technical AI safety research. They divide the field into three areas: specification, robustness, and assurance.
  Specification research is aimed at ensuring an AI system’s behavior aligns with the intentions of its operator. This includes research into how AI systems can infer human preferences, and how to avoid problems of reward hacking and wire-heading.
  Robustness research is aimed at ensuring a system is robust to changes in its environment. This includes designing systems that can safely explore new environments and withstand adversarial inputs.
  Assurance research is aimed at ensuring we can understand and control AI systems during operation. This includes issues research into interpretability of algorithms, and the design of systems that can be safely interrupted (e.g. off-switches for advanced AI systems).
  Why it matters: This is a useful taxonomy of research directions that will hopefully contribute to a better understanding of problems in AI safety within the AI/ML community. DeepMind has been an important advocate for safety research since its inception. It is important to remember that AI safety is still dwarfed by AI capabilities research by several orders of magnitude, in terms of both funding and number of researchers.
  Read more: Building Safe Artificial Intelligence (DeepMind via Medium).

OpenAI Bits & Pieces:

OpenAI takes on Dota 2: Short Vice documentary:
As part of our Dota project we experimented with new forms of comms, including having a doc crew from Vice film us in the run-up to our competition at The International.
  Check out the documentary here: This Robot is Beating the World’s Best Video Gamers (Vice).

Tech Tales:

They call the new drones shepherds. We call them prison guards. The truth is somewhere in-between.

You can do the math yourself. Take a population. Get the birth rate. Project over time. That’s the calculus the politicians did that led to them funding what they called the ‘Freedom Research Initiative to Eliminate Negativity with Drones’ (FRIEND).

FRIEND provided scientists with a gigantic bucket of money to fund research into creating more adaptable drones that could, as one grant document stated, ‘interface in a reassuring manner with ageing citizens’. The first FRIEND drones were like pet parrots, and they were deployed into old people’s homes in the hundreds of thousands. Suddenly, when you went for a walk outside, you were accompanied by a personal FRIEND-Shepherd which would quiz you about the things around you to stave off age-based neurological decline. And when you had your meals there was now a drone hovering above you, scanning your plate, and cheerily exclaiming “that’s enough calories for today!” when it had judged you’d eaten enough.

Of course we did not have to do what the FRIEND-Shepherds told us to do. But many people did and for those of us who had distaste for the drones, peer pressure did the rest. I tell myself that I am merely pretending to do what my FRIEND-Shepherd says, as it takes me on my daily walk and suggests the addition or removal of specific ingredients from my daily salad to ‘maintain optimum productivity via effective meal balancing’.

Anyway, as the FRIEND program continued the new Shepherds became more and more advanced. But people kept on getting older and birth rates kept on falling; the government couldn’t afford to buy more drones to keep up with the growing masses of old people, so it directed FRIEND resources towards increasing the autonomy and, later, ‘persuasiveness’ of such systems.

Over the course of a decade the drones went from parrots to pop psychologists with a penchant for nudge economics. Now, we’re still not “forced” to do anything by the Shepherds, but the Shepherds are very intelligent and much of what they spend their time doing is finding out what makes us tick so they can encourage us to do the thing that extends lifespan while preserving quality of life.

The Shepherd assigned to me and my friends has figured out that I don’t like Shepherds. It has started to learn to insult me, so that I chase it. Sometimes it makes me so angry that I run around the home, trying to knock it out of the air with my walking stick. “Well done,” it will say after I am out of breath. “Five miles, not bad for a useless human.” Sometimes I will then run at it again, and I believe I truly am running at it because I hate it and not because it wants me to. But do I care about the difference? I’m not sure anymore.

Things that inspired this story: Drones, elderly care robots, the cruel and inescapable effects of declining fertility in developed economies, JG Ballard, Wall-E, social networks, emotion-based AI analysis systems, NLP engines, fleet learning with individual fine-tuning.