Import AI

November 5, 2018

Import AI 119: How to benefit AI research in Africa; German politician calls for billions in spending to prevent country being left behind; and using deep learning to spot thefts

by Jack Clark

African AI researchers would like better code switching, maps, to accelerate research:
…The research needs of people in Eastern Africa tells us about some of the ways in which AI development will differ in that part of the world…
Shopping lists contain a lot of information about a person, and I suspect the same might be true of scientific shopping lists that come from a particular part of the world. For that reason a paper from Caltech which outlines requests for machine learning research from members of the East African Tech Scene gives us better context when thinking about the global impact of AI.
Research needs: Some of the requests include:

Support for code-switching within language models; many East Africans rapidly code-switch (move between multiple languages during the same sentence) making support for multiple languages within the same model important.
Named Entity Recognition with multiple-use words; many English words are used as names in East Africa, eg “Hope, Wednesday, Silver, Editor”, so it’s important to be able to learn to disambiguate them.
Working with contextual cues; many locations in Africa don’t have standard addressing schemes so directions are contextual (eg, my house is the yellow one two miles from the town center) and this is combined with numerous misspellings in written text, so models will need to be able to fuse multiple distinct bits of information to make inferences about things like addresses.
Creating new maps in response to updated satellite imagery to help augment coverage of the East African region, accompanied by the deliberate collection of frequent ground-level imagery of the area to account for changing businesses, etc.
Due to poor internet infrastructure, spotty cellular service, and the fact “electrical power for devices is carce” one of the main types of request is for more efficient systems, such as models that are designed to run on low-powered devices, and on thinking about ways to add adaptive learning to processes involving surveying so that researchers can integrate new data on-the-fly to make up for its sparsity.

Reinforcement learning, what reinforcement learning? “No interviewee reported using any reinforcement learning methods”.
Why it matters; AI is going to be developed and deployed globally, so becoming more sensitive to the specific needs and interests of parts of the world underrepresented in machine learning should further strengthen the AI research community. It’s also a valuable reminder that many problems which don’t generate much media coverage are where the real work is needed (for instance, supporting code-switching in language models).
Read more: Some Requests for Machine Learning Research from the East African Tech Scene (Arxiv).

DeepMap nets $60 million for self-driving car maps:
…Mapping startup raises money to sell picks and shovels for another resource grab…
A team of mapmakers who previously worked on self-driving-related efforts at Google, Apple, and Baidu, have raised $60 million for DeepMap, in a Series B round. One notable VC participant: Generation Investment Management, a VC firm which includes former vice president Al Gore as a founder. “DeepMap and Generation share the deeply-held belief that autonomous vehicles will lead to environmental and social benefits,” said DeepMap’s CEO, James Wu, in a statement.
Why it matters: If self-driving cars are, at least initially, not winner-take-all-markets, then there’s significant money to be made for companies able to create and sell technology which enables new entrants into the market. Funding for companies like DeepMap is a sign that VCs think such a market could exist, suggesting that self-driving cars continue to be a competitive market for new entrants.
Read more: DeepMap, a maker of HD maps for self-driving cars, raised at least $60 million at a $450 million valuation (Techcrunch).

Spotting thefts and suspicious objects with machine learning:
…Applying deep learning to lost object detection: promising, but not yet practical…
New research from the University of Twente, Leibniz University, and Zheijiang University shows both the possibility and limitations of today’s deep learning techniques applied to surveillance. The researchers attempt to train AI systems to detect abandoned objects in public places (eg, offices) and try to work out if these objects have been abandoned, moved by someone who isn’t the owner, or are being stolen.
How does it work: The system takes in video footage and compares the footage against a continuously learned ‘background model’ so it can identify new objects in a scene as they appear, while automatically tagging these objects with one of three potential states: “if a object presents in the long-term foreground but not in the short-term foreground, it is static. If it presents in both foreground masks, it is moving. If an object has ever presented in the foregrounds but disappears from both of the foregrounds later, it means that it is in static for a very long time.” The system then links these objects with human owners by identifying the people that spend the largest amount of time with them, then they track these people, while trying to guess at whether the object is being abandoned, has been temporarily left by its owner, or is being stolen.
Results: They evaluate the system on the PETS2006 benchmark, as well as on the more challenging new SERD dataset which is composed of videos taken from four different scenes of college campuses. The model outlined in the paper gets top scores on PETS2006, but does poorly on the more modern SERD dataset, obtaining accuracies of 50% when assessing if an object is moved by a non-owner, though it does better at detecting objects being stolen or being abandoned. “The algorithm for object detection cannot provide satisfied performance,” they write. “Sometimes it detects objects which don’t exist and cannot detect the objects of interest precisely. A better object detection method would boost the framework’s performance.” More research will be necessary to develop models that excel here, or potentially to improve performance via accessing large datasets to use during pre-training.
Why it matters: Papers like this highlight the sorts of environments in which deep learning techniques are likely to be deployed, though also suggest that today’s models are still inefficient for some real-world use cases (my suspicion here is that if the SERD dataset was substantially larger we may have seen performance increase further).
Read more: Security Event Recognition for Visual Surveillance (Arxiv).

Facebook uses modified DQN to improve notification sending on FB.
…Here’s another real-world use case for reinforcement learning…
I’ve recently noticed an increase in the numbers of Facebook recommendations I receive and a related rise in the number of time-relevant suggestions for things like events and parties. Now, research published by Facebook indicates why that might be: the company has recently used an AI platform called ‘Horizon’ to improve and automate aspects of how it uses notifications to tempt people to use its platform.
Horizon is an internal software platform that Facebook uses to deploy AI onto real-world systems. Horizon’s job is to let people train and validate reinforcement learning models at Facebook, analyze their performance, and run them at large-scale. Horizon also includes a feature called Counterfactual Policy Evaluation, which makes it possible to evaluate the estimated performance of models before deploying them into production. Horizon also incorporates the implementations of the following algorithms: Discrete DQN, Parametric DQN, and DDPG (which is sometimes used for tuning hyperparameters within other domains).
  Scale: “Horizon has functionality to conduct training on many GPUs distributed over numerous machines… even for problems with very high dimensional feature sets (hundreds or thousands of features) and millions of training examples, we are able to learn models in a few hours”, they write.
RL! What is it good for? Facebook says it recently moved from a supervised learning model that predicted click-through rates on notifications, to “a new policy that uses Horizon to train a Discrete-Action DQN model for sending push notifications”. This system tailors the selection and sending of notifications to individual users based on their implicit preferences, expressed by their interaction with the notifications and learned via incremental RL updates. “We observed a significant improvement in activity and meaningful interactions by deploying an RL based policy for certain types of notifications, replacing the previous system based on supervised learning”, Facebook writes. They also conducted a similar experiment based on giving notifications to administrators of Facebook pages. “After deploying the DQN model, we were able to improve daily, weekly, and monthly metrics without sacrificing notification quality,” they write.
  Why it matters: This is an example for how a relatively simple RL system (Discrete DQN) can yield significant gains against hard-to-specify business metrics (eg, “meaningful interactions”). It also shows how large web platforms can use AI to iteratively improve their ability to target individual users while increasing their ability to predict user behavior and preferences over longer time horizons – think of it as a sort of ever-increasing ‘data&compute dividend’.
  Read more: Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform (Facebook Research).

German politician calls for billions of dollars for national AI strategy:
…If Germany doesn’t invest boldly enough, it risks falling behind…
Lars Klingbeil, general secretary of the Social Democratic Party in Germany, has called for the country to invest significantly in its own AI efforts. “We need a concrete investment strategy for AI that is backed by a sum in the billions,” wrote Klingbeil in an article for Tagesspiegel. “We have to stop taking it easy”.
Why it matters: AI has quickly taken on a huge amount of symbolic political power, with politicians typically treating success in AI as being a direct sign of the competitiveness of a country’s technology industry; comments like this from the SPD reinforce that image, and are likely to incentivize other politicians to talk about it in a similar way, further elevating the role AI plays in the discourse.
Read more: Germany needs to commit billions to artificial intelligence: SPD (Reuters).

Faking faces for fun with AI:
…”If we can generate realistic looking faces of any type, what are the implications for our ability to trust in what we see”…
One of the continued open questions around the weaponization of fake imagery is how easy it needs to become for people to do this for it to become economically sensible for people to weaponize the technology (eg, through making faked images of politicians in specific politically-sensitive situations). New work by an independent researcher gives us an indication of what the state of these things is today. The good news: it’s still way too hard to do for us to worry about many actors abusing the technology. The bad news: All of this stuff is getting cheaper to build and easier to operate over time.
How it works: Shaobo Guan’s research shows how to build a conditional image generation system. The way this works is you can ask your computer to synthesize a random face for you, then you can tweak a bunch of dials to let you change latent variables from which the image is composed, allowing you to manipulate, for instance, the spacing apart of a “person’s” eyes, the coloring of their hair, the size of their sideburns, whether they are wearing glasses, and so on. Think of this as like a combination of an etch-a-sketch, a Police facial composite machine, and an insanely powerful Photoshop filter.
“A word about ethics”: The blog post is notable for its inclusion of a section that specifically considers the ethical aspects of this work in two ways: 1) because the underlying dataset for the generative tool is limited then if such a tool were put into production it wouldn’t be very representative; 2) “If we can generate realistic looking faces of any type, what are the implications for our ability to trust in what we see”? It’s encouraging to see these acknowledgements in a work like this.
Why it matters: Posts like this give us a valuable point-in-time sense of what a motivated researcher is able to build relying on relatively small amounts of resources (the project was done during three week as part of an Insight Data Science ‘AI fellow program’). They also help us understand the general difficulties people face when working with generative models.
Read more: Generating custom photo-realistic faces using AI (Insight Data Science).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

EU AI ethics chief urges caution on regulation:
The chairman of the EU’s new expert body on AI, Pekka Ala-Pietilä, has cautioned against premature regulation, arguing Europe should be focussed now on developing “broad horizontal principles” for ethical uses of AI. He foresees regulations on AI as taking shape as the technology is deployed, and as courts react to emergent issues, rather than ex ante. The high-level expert group on AI plans to produce a set of draft ethical principles in March, followed by a policy and investment strategy.
Why this matters: This provides some initial indications of Europe’s AI strategy, which appears to be focussed partly on establishing leadership in the ethics of AI. The potential risks from premature and ill-judged interventions in such a fast-moving field seem high. This cautious attitude is probably a good thing, particularly given Europe’s proclivity towards regulation. Nonetheless, policy-makers should be prepared to react swiftly to emergent issues.
(Note from Jack: It also fits a pattern common in Europe of trying to regulate for the effects of technologies developed elsewhere – for example, GDPR was in many ways an attempt to craft rules to apply controls to non-European mega platforms like Google and Facebook).
Read more: Europe’s AI ethics chief: No rules yet, please.

Microsoft will bid on Pentagon AI contract:
Microsoft has reaffirmed its intention to pursue a major contract with the US Department of Defense. The company’s bid on the $10bn cloud-computing project, codenamed JEDI, had prompted some protest from employees. In a blog post, the company said it would “engage proactively” in the discussion around laws and policies to ensure AI is used ethically, and argued that to withdraw from the market (for example, for US military contracts) would reduce the opportunity to engage in these debates in the future. Google withdrew its JEDI bid on the project earlier this year, after significant backlash from employees (though the real reason for the pull out could be that Google lacked all the gov-required data security certifications necessary to field a competitive bid)
Read more: Technology and the US military (Microsoft).
Read more: Microsoft Will Sell Pentagon AI (NYT).

Assumptions in ML approaches to AI safety:
Most of the recent growth in AI safety has been in ML-based approaches, which look at safety problems in relation to current, ML-based, systems. The usefulness of this work will depend strongly on the type of advanced AI systems we end up with, writes DeepMind AI safety researcher Victoria Krakovna.
Consider the transition from horse-carts to cars. Some of the important interventions in horse-cart safety, such as designing roads to avoid collisions, scaled up to cars. Others, like systems to dispose of horse-waste, did not. Equally, there are issues in car safety, e.g. air pollution, that someone thinking about horse-cart safety could not have foreseen. In the case of ML safety, we should ask what assumptions we are making about future AI systems, how much we are relying on them, and how likely they are to hold up. The post outlines the authors opinions on a few of these key assumptions.
Read more: ML approach to AI safety (Victoria Krakovna).

Baidu joins Partnership on AI:
Chinese tech giant Baidu has become the first Chinese member of the Partnership on AI. The Partnership is a consortium of AI leaders, which includes all the major US players, focussed on developing ethical best practices in AI.
Read more: Introducing Our First Chinese Member (Partnership on AI).

Tech Tales:

Generative Adversarial Comedy (CAN!)

[2029: The LinePunch, a “robot comedy club” started 2022 in the South Eastern corner of The Muddy Charles, a pub tucked inside a building near the MIT Media Lab in Boston, Massachusetts]

Two robot comedians are standing on stage at The LinePunch and, as usual, they’re bombing.

“My Face has no nose, how does it smell?” says one of the robots. Then it looks at the crowd, pauses for two seconds, and says: “It smells using its face!”
The robot opens its hands, as though beckoning for applause.
“You suck!” jeers one of the humans.
  “Give them a chance,” says someone else.
The robot that had told the nose joke bows its head and hands the microphone to the robot standing next to it.
  “OK, ladies and germ-till-men,” says the second robot, “why did the Chicken move across the road?”
  “To get uploaded into the matrix!” says one of the spectating humans.
  “Ha-Ha!” says the robot. “That is incorrect. The correct answer is: to follow its friend.”
  A couple of people in the audience chuckle.
  “Warm crowd!” says the robot. “Great joke next joke: three robots walk into a bar. The barman says “Get out, you need to come in sequentially!”
  “Boo,” says one of the humans in the audience.
The robot tilts its head, as though listening, then prepares to tell another joke…

The above scene will happen on the third tuesday of every month for as long as MIT lets its students run The LinePunch. I’d like to tell you the jokes have gotten better since its founding, but in truth they’ve only gotten stranger. That’s because robots that tell jokes which seem like human jokes aren’t funny (in fact, they freak people out!), so what the bots end up doing at the LinePunch is a kind of performative robot theater, where the jokes are deliberately different to those a human would tell – learned via complex array of inverted feature maps, but funny to the humans nonetheless – learned via human feedback techniques. One day I’m sure the robots will learn to tell jokes to amuse eachother as well.

Things that inspired this story: Drinks in The Muddy Charles @ MIT; synthetic text generation techniques; recurrent neural networks; GANs; performance art; jokes; learning from human preferences.

1 Comment

October 30, 2018

Import AI 118: AirBnB splices neural net into its search engine; simulating robots that touch with UnrealROX; and how long it takes to build a quadcopter from scratch

by Jack Clark

Building a quadcopter from scratch in ten weeks:
…Modeling the drone ecosystem by what it takes to build one…
The University of California at San Diego recently ran a course where students got the chance to design, build, and program their own drones. A writeup of the paper outlines how the course is structured and gives us a sense of what it takes to build a drone today.
Four easy pieces: The course breaks building the drones into four phases: designing the PCB, implementing the flight control software, assembling the PCB, and getting the quadcopter flying. Each of this phases has numerous discrete steps which are detailed in the report. One of the nice things about the curriculum is the focus on the cost of errors: “Students ‘pay’ for design reviews (by course staff or QuadLint) with points deduced from their lab grade,” they write. “This incentivizes them to find and fix problems themselves by inspection rather than relying on QuadLint or the staff”.
  The surprising difficulty of drone software: Building the flight controller software for the drone proves to be one of the most challenging aspects of the research because of the numerous potential causes for bugs, so root cause analysis can be challenging.
  Teaching tools: While developing the course the instructors noticed that they were spending a lot of time checking and evaluating PCB designs for correctness, so they designed their own program called ‘QuadLint’ to try to auto-analyze and grade these submissions. “QuadLint is, we believe, the first autograder that checks specific design requirements for PCB designs,” they write.
  Costs: The report includes some interesting details on the cost of these low-powered drones, with the quadcopter itself costing about $35 per PCB plus $40 for the components. Currently, the most expensive component of the course is the remote ($150) and for the next course the teachers are evaluating cheaper options.
  Small scale: The quadcopters all use a PCB to host their electronics and serve as an airframe. They measure less than 10 cm on a side and are suitable for flight indoors over short distances. “The motors are moderately powerful, “brushed” electric motors powered by a small lithium-polymer (LiPo) battery, and we use small, plastic propellers. The quadcopters are easy to operate safely, and a blow from the propeller at full speed is painful but not particularly dangerous. Students wear eye protection around their flying quadcopters.”
  Why it matters: In paper notes that the ‘killer apps’ of the future “will lie at the intersection of hardware, software, sensing, robotics, and/or wireless communications”. This seems true – especially when we look at the chance for major uptake from the success of companies like DJI and the possibility for unit economics driving the price down. Therefore, tracking and measuring the cost and ease with which people can build and assemble them out of (hard to track, commodity) components gives us better intuitions about this aspect of drones+security. While the hardware and software is under-powered and somewhat pricey today it won’t stay that way for long.
  Read more: Trial by Flyer: Building Quadcopters From Scratch in a Ten-Week Capstone Course (Arxiv).

Amazon tries to make Alexa smarter via richer conversational data:
…Who needs AI breakthroughs when you’ve got a BiLSTM, lots of data, and patience?…
Amazon researchers are trying to give personal assistants like Alexa the ability to have long-term, conversations about specific topics. The (rather unsurprising) finding they make in a new research paper is is that you can “extend previous work on neural topic classification and unsupervised topic keyword detection by incorporating conversational context and dialog act features”, yielding personal assistants capable of longer and more coherent conversations than their forebears, if you can afford to annotate the data.
Data used: The researchers used data collected during the 2017 ‘Alexa Prize’ competition, which consists of over 100,000 utterances containing interactions between users and chatbots. They augmented this data by classifying the topic for each utterance into one of 12 categories (eg: politics, fashion, science & technology, etc), and also trying to classify the goal of the user or chatbot (eg: clarification, information request, topic switch, etc). They also asked other annotators to rank every single chatbot response with metrics relating to how comprehensible it was, how relevant the response was, how interesting it was, and whether a user might want to continue the conversation with the bot.
Baselines and BiLSTMs: The researchers implement two baselines (DAN, based on a bag-of-words neural model; ADAN, which is DAN extend with attention), and then develop two versions of a bidirectional LSTM (BiLSTM) system, where one uses context from the annotated dataset and the other doesn’t. They then evaluate all these methods by testing their baselines (which contain only the current utterance) against systems which incorporate context, systems which incorporate data, and systems which incorporate both context and data. The results show that a BiLSTM fed with context in sequence does almost twice as well as a baseline ADAN system that uses context and dialog, and almost 25% better than a DAN fed with both context and dialog.
Why it matters: The results indicate that – if a developer can afford the labeling cost – it’s possible to augment language interaction datasets with additional information about context and topic to create more powerful systems, which seems to imply that in the language space we can expect to see large companies invest in teams of people to not just transcribe and label text at a basic level, but also perform more elaborate meta-classifications as well. The industrialization of deep learning continues!
Read more: Contextual Topic Modeling For Dialog Systems (Arxiv).

Why AI won’t be efficiently solving a 2D gridworld quest soon:
…Want humans to be able to train AIs? The key is curriculum learning and interactive learning, says BabyAI creators…
Researchers with the Montreal Institute for Learning Algorithms (MILA) have designed a free tool called BabyAI to let them test AI systems’ ability to learn generalizable skills from curriculums of tasks set in an efficient 2D gridworld environment – and the results show that today’s AI algorithms display poor data efficiency and generalization at this sort of task.
  Data efficiency: BabyAI uses gridworlds for its environment, which the researchers have written to be efficient enough that researchers can use the platform without needing access to vast pools of compute; the BabyAI environments can be run at up to 3,000 frames per second “on a modern multi-core laptop” and can also be integrated with OpenAI Gym).
  A specific language: BabyAI uses “a comparatively small yet combinatorially rich subset of English” called Baby Language. This is meant to help researchers write increasingly sophisticated strings of instructions for agents, while keeping the state space from exploding too quickly.
  Levels as a curriculum: BabyAI ships with 19 levels which increase in difficulty of both the environment, and the complexity of the language required to solve it. The levels test each agent on a variety of 13 different competencies, ranging from things like being able to unlock doors, navigating to locations, ignoring distractors placed into the environment, navigating mazes, and so on. The researchers also design a bot which can solve any of the levels using a variety of heuristics – this bot serves as a baseline against which to train a model.
So, are today’s AI techniques sophisticated enough to solve BabyAI? The researchers train an imitation learning-based baseline for each level and and assess how well it does. The systems are able to learn to perform basic tasks, but struggle to imitate the expert at tasks that require multiple actions to solve. One of the most intriguing parts of a paper is the analysis of the relative efficiency of systems trained via both imitation and from pure reinforcement learning, which shows that today’s algorithms are wildly inefficient at learning pretty much anything: simple tasks like learning to go to a red ball hidden within a map take 40,000-60,000 demos when using imitation learning, and around 453,000 to 470,000 when learning using reinforcement learning without an expert teacher to attempt to mimic. The researchers also show that using pre-training (where you learn on other tasks before attempting certain levels) does not yield particularly impressive performance, with pre-training yielding at most a 3X speedup.
  Why it matters: Platforms like BabyAI give AI researchers fast, efficient tools to use when tackling hard research projects, while also highlighting the deficiency of many of today’s algorithms. The transfer learning results “suggest that current imitation learning and reinforcement learning methods scale and generalize poorly when it comes to learning tasks with a compositional structure,” they write. “An obvious direction of future research to find strategies to improve data efficiency of language learning.”
  Get the code for BabyAI (GitHub).
  Read more: BabyAI: First Steps Towards Grounded Language Learning with a Human In the Loop (Arxiv).

Simulating robots that touch and see in AAA-game quality detail:
…The new question AI researchers will ask: But Can It Simulate Crysis?…
Researchers with the 3D Perception Lab at the University of Alicante have designed UnrealROX, a high-fidelity simulator based on Unreal Engine 4, built for simulating and training AI agents embodied in (simulated) touch-sensitive robots.
  Key ingredients: UnrealROX has the following main ingredients: a simulated grasping system that can be applied to a variety of finger configurations; routines for controlling robotic hands and bodies using commercial VR setups like the Oculus Rift and HTC Vive; a recorder to store full sequences from scenes; and customizable camera locations.
  Drawback: The overall simulator can run at 90 frames-per-second, the researchers note. While this may sound impressive it’s not particularly useful for most AI research unless you can run it far faster than that (compare this with BabyAI, which runs at 3,000 FPS).
  Simulated robots with simulated hands: UnrealROX ships with support for two robots: a simulated ”Pepper’ robot from company Aldebaran, and a spruced-up version of the mannequin that ships with UE4. Both of these robots have been designed with extensible, customizable grasping systems, letting them reach out and interact with the world around them. “The main idea of our grasping subsystem consists in manipulating and interacting with different objects, regardless of their geometry and pose.”
Simulators, what are they good for? UnrealROX may be of particular interest to researchers that need to create and record very specific sequences of behaviors on robots, or who wish to test the ability to learn useful policies from a relatively small amount of high-fidelity information. But it seems likely that the relative slowness of the simulator will make it difficult to use for most AI research.
  Why it matters: The current proliferation of simulated environments represents a kind of simulation-boom in AI research that will eventually produce a cool historical archive of the many ways in which we might think robots could interact with each other and the world. Whether UnrealROX is used or not, it will contribute to this historical archive.
Read more: UnrealROX An eXtremely Photorealistic Virtual Realty Environment for Robotics Simulations and Synthetic Data Generation (Arxiv).

AirBnB augments main search engine with neural net, sees significant performance increase:
…The Industrialization of Deep Learning continues…
Researchers with home/apartment-rental service AirBNB have published details on how they transitioned AirBnB’s main listings search engine to a neural network-based system. The paper highlights how deploying AI systems in production is different to deploying AI systems in research. It also sees AirBnB follow Google, which in 2015 augmented its search engine with ‘RankBrain’, a neural network-based system that almost overnight became one of the most significant factors in selecting which search results to display to a user. “”This paper is targeted towards teams that have a machine learning system in place and are starting to think about neural networks (NNs),” the researchers write.
  Motivation: “The very first implementation of search ranking was a manually crafted scoring function. Replacing the manual scoring function with a gradient boosted decision tree (GBDT) model gave one of the largest step improvements in homes bookings in Airbnb’s history,” the researchers write. This performance boost eventually plateaued, prompting them to implement neural network-based approaches to improve search further.
  Keep it simple, (& stupid): One of the secrets about AI research is the gulf between frontier research and production use-cases, where researchers tend to prioritize novel approaches that work on small tasks, and industry and/or large-scale operators prioritize simple techniques that scale well. This fact is reflected in this research, where the researchers started work with a single layer neural net model, moved on to a more sophisticated system, then opted for a scale-up solution as their final product. “We were able to deprecate all that complexity by simply scaling the training data 10x and moving to a DNN with 2 hidden layers.”
  Input features: For typical configurations of the network the researchers gave it 195 distinct input ‘features’ to learn about, which included properties of listings like price, amenities, historical booking count; as well as features from other smaller models.
Failure: The paper includes a quite comprehensive list of some of the ways in which the Airbnb researchers failed when trying to implement new neural network systems. Many of these failures are due to things like overfitting, or trying to architect too much complexity into certain parts of the system.
  Results: AirBNB doesn’t reveal the specific quantitative performance boost as this would leak some proprietary commercial information, but does include a couple of graphs that shows that the usage of the 2-layer simple neural network leads to a very meaningful relative gain in the number of bookings made using the system, indicating that the neural net-infused search is presenting people with more relevant listings which they are more likely to book. “Overall, this represents one of the most impactful applications of machine learning at Airbnb.,” they write.
Why it matters: AirBNB’s adoption of deep learning for its main search engine further indicates that deep learning is well into its industrialization phase, where large companies adopt the technology and integrate it into their most important products. Every time we get a paper like this the chance of an ‘AI Winter’ decreases, as it creates another highly motivated commercial actor that will continue to invest in AI research and development, regardless of trends in government and/or defence funding.
  Read more: Applying Deep Learning to AirBNB Search (Arxiv).
  Read more: Google Turning Its Lucrative Web Search Over to AI Machines (Bloomberg News, 2015).

Refining low-quality web data with CurriculumNet:
…AI startup shows how to turn bad data into good data, with a multi-stage weakly supervised training scheme…
Researchers with Chinese computer vision startup Malong have released code and data for CurriculumNet, a technique to train deep neural networks on large amounts of data with variable annotations, collected from the internet. Approaches like this are useful if researchers don’t have access to a large, perfectly labeled dataset for their specific task. But the tradeoff is that the labels on datasets gathered in this way are far noisier than those from hand-built datasets, presenting researchers with the challenge of extracting enough signal from the noise to be able to train a useful network.
CurriculumNet: The researchers train their system on the WebVision database, which contains over 2,400,000 images with noisy labels. Their approach works by training an Inception_v2 model over the whole dataset, then studying the feature space which all the images are mapped into; CurriculumNet then sorts these images into clusters, then sorts each cluster these into three subsets according to how similar all the images in the set are to eachother in featurespace, with the intuition being that subsets with lots and lots of similar images will be easier to learn from than those which are very diverse. They then start to train a model over this where they start by using the subsets with similar image features, then mix in the noisier subsets. By iteratively learning a classifier from good labels, then adding in ones with noisier ones, the researchers say they are able to increase the generalization of their trained systems.
  Testing: They test CurriculumNet on four benchmarks: WebVision, ImageNet, Clothing1M, and Food101. They find that systems trained using the largest amount of noisy data converge to higher accuracies than those trained without, seeing reductions in error of multiple percentage points on WebVision (“these improvements are significant on such a large-scale challenge,” they write). CurriculumNet gets state-of-the-art results for top-1 accuracy on WebVision, with performance increasing even further when they train on more data (such as combining ImageNet and WebVision).
  Why it matters: Systems like CurriculumNet show how researchers can use poorly-labeled data, combined with clever training ideas, to increase the value of lower-quality data. Approaches like this can be viewed as being analogous to a clever refinement process applied when extracting a natural resources.
Read more: CurriculumNet: Weakly Supervized Learning from Large-Scale Web Images (Arxiv).
  Get the trained models from Malong’s Github page.

Tech Tales:

[2025: Podcast interview with the inventor of GFY]

Reality Bites, So Change It.
Or: There Can Be Hope For Those of Us Who Were Alone And Those We Left Behind

My Father was struck by a truck and killed while riding his motorbike in the countryside; no cameras, no witnesses; he was alone. There was an investigation but no one was ever caught. So it goes.

At the funeral I told stories about the greatness of my Father and I helped people laugh and I helped people cry. But I could not help myself because I could not see his death. It was as though he opened a door and disappeared before walking through it and the door never closed again; a hole in the world.

I knew many people who had lost friends and parents to cancer or other illnesses and their stories were quite horrifying: black vomit before the end; skeletons with the faces of parents; tales of seeing a dying person when they didn’t know they were being watched and seeing rage and fear and anguish on their face. The retellings of so many bad jokes about not needing to pay electricity bills, wheezed out over hospital food.

I envied these people, because they all had a “goodbye story” – that last moment of connection. They had the moment when they held a hand, or stared at a chest as it heaved in one last breath, or confessed a great secret before the chance was gone. Even if they weren’t there at the last they had known it was coming.

I did not have my goodbye, or the foreshadowing of one. Imagine that.

So that is why I built Goodbye For You(™), or GFY for short. GFY is software that lets you simulate and spend the last few moments with a loved one. It requires data and effort and huge amounts of patience… but it works. And as AI technology improves, so does the ease of use and fidelity of GFY.

Of course, it is not quite real. There are artifacts: improbable flocks of birds, or leaves that don’t fall quite correctly, or bodies that don’t seem entirely correct. But the essence is there: With enough patience and enough of a record of the deceased, GFY can let you reconstruct their last moment, put on a virtual reality haptic-feedback suit, and step into it.

You can speak with them… at the end. you can touch them and they can touch you. We’re adding smell soon.

I believe it has helped people Let me try to explain how it worked the first time, all those years ago.

I was able to see the truck hit his bike. I saw his body fly through the air. I heard him say “oh no” the second after impact as he was catapulted off his bike and towards the side of the road. I heard his ribs break as he landed. I saw him crying and bleeding. I was able to approach his body. He was still breathing. I got on my knees and bent over him and I cried and the VR-helmet saw my tears in reality and simulated these tears falling onto his chest – and he appeared to see them, then looked up at me and smiled.
He touched my face and said “my child” and then he died.

Now I have that memory and I carry it in my heart as a candle to warm my soul. After I experienced this first GFY my dreams changed. It felt as though I had found a way to see him open the door – and leave. And then the door shut.

Grief is time times memory times the rejuvenation of closure: of a sense of things that were once so raw being healed and knitted back together. If you make the memory have closure things seem to heal faster.

Yes, I am still so angry. But when I sleep now I sometimes dream of that memory, and in my imagination we say other things, and in this way continue to talk softly through the years.

Things that inspired this story: The as-yet-untapped therapeutic opportunities afford by synthetic media generation (especially high-fidelity conditional video); GAN progression from 2014 to 2018; compute growth both observed and expected for the next few years; Ander Monson’s story “I am getting comfortable with my grief”.

1 Comment

October 22, 2018

Import AI: 117: Surveillance search engines; harvesting real-world road data with hovering drones; and improving language with unsupervised pre-training

by Jack Clark

Chinese researchers pursue state-of-the-art lip-reading with massive dataset:
…What do I spy with my camera eyes? Lips moving! Now I can figure out what you are saying…
Researchers with the Chinese Academic of Sciences and Huazhong University of Science and Technology have created a new dataset and benchmark for “lip-reading in the wild” for Mandarin. Lip-reading gives people a new sensory capability to imbue AI systems with. For instance, lip-reading systems can be used for “aids for hearing-impaired persons, analysis of silent movies, liveness verification in video authentication systems, and so on” the researchers write.
Dataset details: The lipreading dataset contains 745,187 distinct samples from more than 2,000 speakers, grouped into 1,000 classes, where each class corresponds to the syllable of a Mandarin word composed of one or several Chinese characters. “To the best of our knowledge, this database is currently the largest word-level lipreading dataset and the only public large-scale Mandarin lipreading dataset”, the researchers write. The dataset has also been designed to be dverse so the footage in it consists of multiple different people taken from multiple different camera angles, along with perspectives taken from television broadcasts. This diversity makes the benchmark more closely approximate real world situations whereas previous work in this domain has involved stuff taken from a fixed perspective. They build the dataset by annotating Chinese television using a service provided by iFLYTEK, a Chinese speech recognition company.
Baseline results: They train three baselines on this dataset – a fully 2D CNN, a fully 3D CNN (modeled on LipNet, research covered in ImportAI #104 from DeepMind and Google) , and a model that mixes 2D and 3D convolutional layers. All of these approaches perform poorly on the new dataset, despite having obtained performances as high as 90% on other more restricted datasets. The researchers implement their models in PyTorch and train them on servers containing four Titan X GPUs with 12GB of memory. The resulting top-5 accuracy results for the baselines on the new Chinese dataset LRW-1000 are as follows:
– LSTM-5: 48.74%
– D3D: 59.80%
– 3D+2D: 63.50%
Why it matters: Systems for stuff like lipreading are going to have a significant impact on applications ranging from medicine to surveillance. One of the challenges posed by research like this is its inherently ‘dual use’ nature; as the researchers allude to in the introduction of this paper, this work can be used both for healthcare uses as well for surveillance uses (see: “analysis of silent movies”). How society deals with the arrival of these general AI technologies will have a significant impact on the types of societal architectures that will be built and developed throughout the 21st Century. It is also notable to see the emergence of large-scale datasets built by Chinese researchers in Chinese language – perhaps one could measure the relative growth in certain language datasets to model AI interest in the associated countries?
Read more: LRW-1000: A Naturally Distributed Large-Scale Benchmark for Lip Reading in the Wild (Arxiv).

Want to use AI to study the earth? Enter the PROBA-V Super Resolution competition:
…European Space Agency challenges researchers to increase the usefulness of satellite-gathered images…
The European Space Agency has launched the ‘PROBA-V Super Resolution” competition, which challenges researchers to take in a bunch of photos from a satellite of the same region of the Earth and stitch them together to create a higher-resolution composite.
Data: The data contains multiple images taken in different spectral bands of 74 locations around the world at each point in time. Images are annotated with a ‘quality map’ to indicate any parts of them that may be occluded or otherwise hard to process. “Each data-point consists of exactly one 100m resolution image and several 300m resolution images from the same scene,” they write.
Why it matters: Competitions like this provide researchers with novel datasets to experiment with and have a chance of improving the overall usefulness of expensive capital equipment (such as satellites).
Find out more about the competition here at the official website (PROBA-V Super Resolution challenge).

Google releases BERT, obtains state-of-the-art language understanding scores:
…Language modeling enters its ImageNet-boom era…
Google has released BERT, a natural language processing system that uses unsupervised pre-training and task fine-tuning to obtain state-of-the-art scores on a large number of distinct tasks.
How it works: BERT, which stands for Bidirectional Encoder Representations from Transformers, builds on recent developments in language understanding ranging from techniques like ELMO to ULMFiT to recent work by OpenAI on unsupervised pre-training. BERT’s major performance gains come from a specific structural modification (jointly conditioning on the left and right context in all layers), as well as some other minor tweaks, plus – as is the trend in deep learning these days – training on a larger model using more compute. The approach it is most similar to is OpenAI’s work using unsupervised pre-training for language understanding, as well as work from Fast.ai using similar approaches.
Major tweak: BERT’s use of joint conditioning likely leads to its most significant performance improvement. They implement this by adding in an additional pre-training objective called the ‘masked language model’ which involves randomly masking input tokens, then asking the model to predict the contents of the masked token based on context – this constraint encourages the network to learn to use more context when completing task, which seems to lead to greater representational capacity and improved performance. They also use Next Sentence Prediction during pre-training to try to train a model that has a concept of relationships of concepts across different sentences. Later they conduct significant ablation studies of BERT and show that these two pre-training tweaks are likely responsible for the majority of the observed performance increase.
Results: BERT obtains state-of-the-art performance on the multi-task GLUE benchmark, setting new state-of-the-art scores on a wide range of challenging tasks. It also sets a new state-of-the-art score on the ‘SWAG’ dataset – significant, given that SWAG was released earlier this year and was expressly designed to challenge AI techniques, like DL, which may gather a significant amount of performance by deriving subtle statistical relationships within datasets.
Scale: The researchers train two models, BERTBASE and BERTLARGE. BERTBASE was trained on 4 Cloud TPUs for approximately four days, and BERTLARGE was trained on 16 Cloud TPUs also for four days.
Why it matters – Big Compute and AI Feudalism: Approaches like this show how powerful today’s deep learning based systems are, especially when combined with large amounts of compute and data. There are legitimate arguments to be made that such approaches are bifurcating research into low-compute and high-compute domains – one of these main BERT models took 16 TPUs (so 64 TPU chips total) trained for four days, putting it out of reach of low-resource researchers. On the plus side, if Google releases things like the pre-trained model then people will be able to use the model themselves and merely pay the training cost to finetune for different domains. Whether we should be content with researchers getting the proverbial crumbs from rich organizations’ tables is another matter, though. Maybe 2018 is the year in which we start to see the emergence of ‘AI Feudalism’.
Read more: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Arxiv).
Check out this helpful Reddit BERT-explainer from one of the researchers (Reddit).

Using drones to harvest real world driving data:
…Why the future involves lightly-automated aerial robot data collection pipelines…
Researchers with the Automated Driving Department of the Institute for Automotive Engineering at Aachen University have created a new ‘highD’ dataset that captures the behavior of real world vehicles on German highways (technically: autobahns).
Drones + data: The researchers created the dataset via DJI Phantom 4 Pro Plus drones hovering above roadways which they used to collect natural vehicle trajectories from vehicles driving on German highways around Cologne. The dataset includes post-processed trajectories of 110,000 vehicles including cars and trucks. The datasets consists of 16.5 hours of video spread across 60 different recordings which were were made at six different locations between 2017 and 2018, with each recording having an average length of 17 minutes.
Augmented dataset: The researchers provide additional labors in the dataset beyond trajectories, categorizing vehicles’ behavior into distinct detected maneuvers, which include: free driving, vehicle following, critical maneuvers, and lane changes.
highD VS NGSIM: The dataset most similar to highD is NGSIM, a dataset developed by the US Department of Transport. highD contains a significantly greater diversity of vehicles as well as being significantly larger, but the recorded distances which the vehicles travel along are shorter, and the German roads have fewer lanes than the American ones used in highD.
Why it matters: Data is likely going to be crucial for the development of real world robot platforms, like self-driving cars. Techniques like those outlined in this paper show how we can use newer technologies, like cheap consumer drones, to automate significant chunks of the data gathering process, potentially making it easier for people to gather and create large datasets. “Our plan is to increase the size of the dataset and enhance it by additional detected maneuvers for the use in safety validation of highly automated driving,” the researchers write.
Get the data from the official website (highD-dataset.com).
You can access the Matlab and Python code used to handle the data, create visualizations, and extract maneuvers from here (Github).
Read more: The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems (Arxiv).

Building Google-like search engines for surveillance, using AI:
…New research lets you search via a person’s height, color, and gender…
Deep learning based techniques are fundamentally changing how surveillance architecture are being built. Case in point: A new paper from Indian researchers gives a flavor of how deep learning can expand the capabilities of security technology for ‘person retrieval’, which is the task of trying to find a particular person within a set of captured CCTV footage.
The system: The researchers use Mask R-CNN pre-trained on Microsoft COCO to let people search over CCTV footage from the SoftBioSearch dataset for people via specific height, color, and ‘gender’ (for the purpose of this newsletter we won’t go into the numerous complexities and presumed definitions inherent in the use of ‘gender’ here).
Results: “The algorithm correctly retrieves 28 out of 41 persons,” the researchers write. This isn’t yet quite to the level of performance where I can imagine people implementing it, but it certainly seems ‘good enough’ for many surveillance cases, where you don’t really care about a few false positives as you’re mostly trying to find candidate targets backed up by human analysis.
Why it matters: The deployment of artificial intelligence systems is going to radically change how governments relate to their citizens by giving them greater abilities than before to surveil and control them. Approaches like this highlight how flexible this technology is and how it can be used for the sorts of surveillance work that people typically associate with large teams of human analysts. Perhaps we’ll soon hear about intelligence analysts complaining about the automation of their own jobs as a consequence of deep learning.
Read more: Person Retrieval in Surveillance video using Height, Color and Gender (Arxiv).

Tech Tales:

[2028: A climate-protected high-rise in a densely packed ‘creatives’ district of an off-the-charts Gini-coefficient city]

We Noticed You Liked This So Now You Have This And You Shall Have This Forever

The new cereal arrived yesterday. I’m already addicted to it. It is perfect. It is the best cereal I have ever had. I would experience large amounts of pain to have access to this cereal. My cereal is me; it has been personalized and customized. I love it.

I had to invest to get here. Let us not speak of the first cereals. The GAN-generated “Chocolate Rocks” and “Cocoa Crumbles” and “Sweet Black Bean Flakes”. I shudder to think of these. Getting to the good cereal takes time. I gave much feedback to the company, including giving them access to my camera feeds, so their algorithms can watch me eat. Watch me be sick.

One day I got so mad that the cereal had been bad for so long that I threw it across the room and didn’t have anything else for breakfast.

Thank You For Your Feedback, Every Bit of Feedback Gets us Closer to Your Perfect Cereal, they said.
I believed them.

Though I do not have a satisfying job, I now start every morning with pride. Especially now, with the new cereal. This cereal reflects my identity. The taste is ideal. The packaging reminds me of my childhood and also simulates a new kind of childhood for me, filling the hole of no-kids that I have. I am very lonely. The cereal has all of my daily nutrients. It sustains me.

Today, the company sent me a message telling me I am so valuable they want me to work on something else. Why Not Design Your Milk? They said. This makes sense. I have thrown up twice already. One of the milks was made with seaweed. I hated it. But I know because of the cereal we can get there: we can develop the perfect milk. And I am going to help them do it and then it will be mine, all mine.

And they say our generation is less exciting than the previous ones. Answer me this: did any of those old generations who fought in wars design their own cereal in companion with a superintelligence? Did any of them know the true struggle of persisting in the training of something that does not understand you and does not care about you, but learns to? No. They had children, who already like you, and partners, who want to be with you. They did not have this kind of hardness.

The challenge of our lifetime is to suffer enough to make the perfect customization. Why not milk? They ask me. Why not my own life, I ask them? Why not customize it all?

And they say religion is going out of fashion!

Things that inspired this story: GANs; ad-targeting; the logical end point of Google and Facebook and all the other stateless multinationals expanding into the ‘biological supply chain’ that makes human life possible; the endless creation of new markets within capitalism; the recent proliferation of various ‘nut milks’ taken to their logical satirical end point; hunger; the shared knowledge among all of us alive that our world is being replaced by simulcras of the world and we are the co-designers of these paper-thin realities.

1 Comment

October 15, 2018

Import AI 116: Think robots are insecure? Prove it by hacking them; why the UK military loves robots for logistics; Microsoft bids on $10bn US DoD JEDI contract while Google withdraws

by Jack Clark

‘Are you the government? Want to take advantage of AI in the USA? Here’s how!’ says thinktank:
….R-Street recommends politicians focus on talent, data, hardware, and other key areas to ensure America can benefit from advances in AI…
R-Street, a Washington-based thinktank whose goal is to “promote free markets and limited, effective government” has written a paper recommending how the US can take advantage of AI.
Key recommendations: R Street says that the scarce talent market for AI disproportionately benefits deep pocketed incumbents (such as Google) that can outbit other companies. “If there were appropriate policy levers to increase the supply of skilled technical workers available in the United States, it would disproportionately benefit smaller companies and startups,” they write.
Talent: Boost Immigration: In particular, they highlight immigration as an area where the government may want to consider instituting changes, for instance by creating a new class of technical visa, or expanding H-1Bs.
Talent: Offset Training Costs: Another approach could be to allow employers to detect the full costs of training staff in AI, which would further incentivize employers to increase the size of the AI workforce.
Data: “We can potentially create high-leverage opportunities for startups to compete against established firms if we can increase the supply of high-quality datasets available to the public,” R Street writes. One way to do this can be to analyze data held by the government with “a general presumption in favor of releasing government data, even if the consumer applications do not appear immediately obvious”.
Figure out (fair use X data X copyright): One of the problems AI is already causing is how it intersects with our existing norms and laws around intellectual property, specifically copyright law. A key question that needs to be resolved is figuring out how to assess data in terms of fair use when looking at AI systems – which will tend to consume vast amounts of data and use this data to create outputs that could, in certain legal lights, be viewed as ‘derivative works’, which would provide disincentives to people looking to develop AI.
“Given the existing ambiguity around the issue and the large potential benefits to be reaped, further study and clarification of the legal status of training data in copyright law should be a top priority when considering new ways to boost the prospects of competition and innovation in the AI space,” they write.
Hardware: The US government should be mindful about how the international distribution of semiconductor manufacturing infrastructure could come into conflict with national strategies relating to AI and hardware.
Why it matters: Analyses like this show how traditional policymakers are beginning to think about AI and highlights the numerous changes needed for the US to fully capitalize on its AI ecosystem. At a meta level, the broadening of discourse around AI to extend to Washington thinktanks seems like a further sign of the ‘industrialization of AI’, in the sense that the technology is now seen as having significant enough economic impacts that policymakers should start to plan and anticipate the changes it will bring.
Read more: Reducing Entry Barriers in the Development and Application of AI (R Street).
Get the PDF directly here.

Tired: Killer robots.
Wired: Logistics robots for military re-supply!
…UK military gives update on ‘Autonomous Last Mile Resupply’ robot competition…
The UK military is currently experimenting with new ways to deliver supplies to frontline troops – and it’s looking to robots to help it out. To spur research into this area a group of UK government organizations are hosting the The Autonomous Last Mile ReSupply (ALMRS) competition.
ALRMS is currently in phase two, in which five consortiums led by Animal Dynamics, Barnard Microsystems, Fleetonomy, Horiba Mira, and Qinetic, will build prototypes of their winning designs for testing and evaluation, receiving funding of around ~£3.8million over the next few months.
Robots are more than just drones: Some of the robots being developed for ALMRS include autonomous powered paragliders, a vertical take-off and land (VTOL) drone, autonomous hoverbikes, and various systems for autonomous logistics resupply and maintenance.
Why it matters: Research initiatives like this will rapidly mature applications at the intersection of robotics and AI as a consequence of military organizations creating new markets for new capabilities. Many AI researchers expect that contemporary AI techniques will significantly broaden the capabilities of robotic platforms, but so far hardware development has lagged software. With schemes like ALMRS, hardware may get a boost as well.
Read more: How autonomous delivery drones could revolutionise military logistics (Army Technology news website).

Responsible Computer Science Challenge offers $3.5million in prizes for Ethics + Computer Science courses:
…How much would you pay for a more responsible future?…
Omidyar Network, Mozilla, Schmidt Futures and Craig Newmark Philanthropies are putting up $3.5million to try to spur the development of more socially aware computer scientists. The challenge has two phases:
– Stage 1 (grants up to $150,000 per project): “We will seek concepts for deeply integrating ethics into existing undergraduate computer science courses”. Winners announced April 2019.
– Stage 2 (grants up to $200,000): “We will support the spread and scale of the most promising approaches”.
Deadline: Applications will be accepted from now through to December 13 201.
Why it matters: Computers are general purpose technologies, and so encouraging computer science practitioners to think about the ethical component of their work in a holistic, coupled manner, may yield to radical new designers for more positive and aware futures.
Read more: Announcing a Competition for Ethics in Computer Science, with up to $3.5 Million in Prizes (Mozilla blog).

Augmenting human game designers with AI helpers:
…Turn-based co-design system lets an agent learn how you like to design levels…
Researchers with the Georgia Institute of Technology have developed a 2D platform game map editor which is augmented with a deep reinforcement learning agent that learns to suggest level alterations based on the actions of the designer.
An endearing, frustrating experience: Like most things involving the day-to-day use of AI the process can be a bit frustrating: after the level designer tries to create a series of platforms with gaps to open space below the AI persists in filling these holes in with its suggestions – despite getting a negative RL reward each time. “As you can see this AI loves to fill in gaps, haha,” says Matthew at one point.
Creative: But it can also come up with interesting ideas. At one point the AI suggests a pipe flanked at the top on each side by single squares. “I don’t hate this. And it’s interesting because we haven’t seen this before,” he says. At another point he builds a mirror image of what the AI suggests, creating an enclosed area.
Learning with you: The AI learns to transfer some knowledge between levels, as shown in the video. However, I expect it needs greater diversity and potentially larger game spaces to show what it can really do.
Why it matters: AI tools can give all types of artists new tools with which to augment their own intelligence, and it seems like the adaptive learning capabilities of today’s RL+supervised learning techniques can make for potentially useful allies. I’m particularly interested in these kind of constrained environments like level design where you ultimately want to follow a gradient towards an implicit goal.
Watch the video of Matthew Guzdial narrating the level editor here (Youtube).
Check out the research paper here: Co-Creative Level Design with Machine Learning (Arxiv).

Think robots are insecure? Think you can prove it? Enter a new “capture the flag” competition:
…Alias Robotics’ “Robot CTF” gives hackers nine challenges to test their robot-compromising skills…
Alias Robotics, a Spanish robot cybersecurity company,, has released the Robotics Capture The Flag (RCTF), a series of nine scenarios designed to challenge wannabe-robot hackers. “The Robotics CTF is designed to be an online game, available 24/7, launchable through any web browser and designed to learn robot hacking step by step,” they write.
Scenarios: The RCTF consists of nine scenarios that will challenge hackers to exfiltrate information from robots, snoop on robot operating system (ROS) traffic, find hardcoded credentials in ROS source code, and so on. One of the scenarios is listed as “coming soon!” and promises to give wannabe hackers access to “an Alias Robotics’ crafted offensive tool”.
Free hacks! The researchers have released the scenarios under an open source TK license on GitHub. “We envision that as new scenarios become available, the sources will remain at this repository and only a subset of them will be pushed to our web servers http://rctf.aliasrobotics. com for experimentation. We invite the community of roboticists and security researchers to play online and get a robot hacker rank,” they write.
Why it matters: Robotics are seen as one of the next frontiers for contemporary AI research and techniques, but as this research shows – and other research on hacking physical robots published in ImportAI #109 – the substrates on which many robots are built are still quite insecure.
Read more: Robotics CTF (RCTF), A Playground for Robot Hacking (Arxiv).
Check out the competition and sign-up here (Alias Robotics website).

Fighting fires with drones and deep reinforcement learning:
…Forest fire: If you can simulate it, perhaps you can train an AI system to monitor it?…
Stanford University researchers have used reinforcement learning to train drones in simulators to spot wildfires better than supervised baselines. The project highlights how many complex real world tasks, like wildfire monitoring, can be represented as POMDPs (partially observable markov decision processes) which are tractable for reinforcement learning algorithms.
The approach works like this: The researchers build a simulator that lets them simulate wildfires in a grid-based way. They then populate this system with some simulated drones and use reinforcement learning to train the drones to effectively survey the fire and, most crucially, stay with the ‘fire front’, which is the expanding frontier of it and therefore the part with the greatest potential safety impact. “Each aircraft will get an observation of the fire relative to its own location and orientation. The observations are modeled as an image obtained from the true wildfire state given the aircraft’s current position and heading direction,” they write.
Rewards: The reward function is structured as follows: The aircraft gets penalties for distances from fire front, for high bank angles, for closeness to other aircraft, and for being near too many non-burning cells.
Belief: The researchers also experiment with what they call a “belief-based approach” which involves training the drones to create a shared “belief map”, which is a map of their environment indicating whether they believe particular cells will contain fire or not, and this map is updated with real data taken during the simulated flight. This is different to an observation-based approach, which purely focuses on the observations seen by these drones.
Results: Two aircraft with nominal wildfire seed: Both the belief-based and observation-based methods obtain significantly higher rewards than a hand-programmed ‘receding horizon’ baseline. There is no comparison to human performance, though. The belief-based technique does eventually obtain a slightly higher final performance than the observation-based version, but it takes longer to converge to a good solution.
Results: Greater than two aircraft: The system is able to scale to dealing with numbers of aircraft greater than two, but this requires the tweaking of a proximity-based reward to discourage collisions.
Results: Different wildfires: The researchers test their system on two differently shaped wildfires (a t-shape and an arc) and show that both RL-based methods exceed performance of the baseline, and that the belief-based system in particular does well.
Why it matters: We’ve already seen states like California use human-piloted drones to help emergency responders deal with wildfires. As we head into a more dangerous future defined by an increase in the rate of extreme weather events driven by global warming I am curious to see how we might use AI techniques to create certain autonomous surveillance and remediation abilities, like those outlined in this study.
Caveat: Like all studies that show success in simulation, I’ll retain some skepticism till I see such techniques tested on real drones in physical reality.
Read more: Distributed WIldfire Surveillance With Autonomous Aircraft Using Deep Reinforcement Learning (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

Pentagon’s AI ethics review taking shape:
The Defense Innovation Board met last week to present some initial findings of their review of the ethical issues in military AI deployment. The DIB is the Pentagon’s advisory panel of experts drawn largely from tech and academia. Speakers covered issues ranging from autonomous weapons systems, to the risk posed by incorporating AI into existing nuclear weapons systems.
The Board plans to present their report to Congress in April 2019.
Read more: Defense Innovation Board to explore the ethics of AI in war (NextGov)
Read more: DIB public meeting (DoD)

Google withdraws bid for $10bn Pentagon contract:
Google has withdrawn its bid for the Pentagon’s latest cloud contract, JEDI, citing uncertainty over whether the work would align with its AI principles.
Read more: Google drops out of Pentagon’s $10bn cloud competition (Bloomberg).

Microsoft employees call for company to not pursue $10bn Pentagon contract:
Following Google’s decision to not bid on JEDI, people identifying themselves as employees at Microsoft published an open letter asking the company to follow suit, and remove their own bid on the project. (Microsoft submitted a bit for JEDI following the publication of the letter.)
Read more: An open letter to Microsoft (Medium).

Future of Humanity Institute receives £13m funding:
FHI, the multidisciplinary institute at the University of Oxford led by Nick Bostrom, has received a £13.3m donation from the Open Philanthropy Project. This represents a material uptick in funding for AI safety research. The field as a whole, including work done in universities, non-profits and industry, spent c.$10m in 2017, $6.5m and c.$3m in 2015, according to estimates from the Center for Effective Altruism.
Read more: £13.3m funding boost for FHI (FHI).
Read more: Changes in funding in the AI safety field (CEA).

Tech Tales:

The Watcher We Nationalized

So every day when you wake up as the head of this government you check The Watcher. It has an official name – a lengthy acronym that expands to list some of the provenance of its powerful technologies – but mostly people just call it The Watcher or sometimes The Watch and very rarely Watcher.

The Watcher is composed of intelligence taps placed on most of the world’s large technology companies. Data gets scraped out of them and combined with various intelligence sources to give the head of state access to their own supercharged search engine. Spook Google! Is what British tabloids first called it. Fedbook! Is what some US press called it. And so on.

All you know is that you start your day with The Watcher and you finish your day with it. When you got into office, several years ago, you were met by a note from your predecessor. Nothing you do will show up in Watcher, unless something terrible happens; get used to it, read the note.

They were right, mostly. Your jobs bill? Out-performed by some viral memes relating to a (now disgraced) celebrity. The climate change investment? Eclipsed by a new revelation about a data breach at one of the technology companies. In fact, the only thing so far that registered on The Watcher from your part of the world was a failed suitcase bombing attempt on a bank.

Now, heading towards the end of your premiership, you hold onto this phrase and say it to yourself every morning, right before you turn on The Watcher and see what the rhythm of the world says about the day to come. “Nothing you do will show up in Watcher, unless something terrible happens; get used to it”, you say to yourself, then you turn it on.

Things that inspired this story: PRISM, intelligence services, governments built on companies like so many houses of cards, small states, Europe, the tedium of even supposedly important jobs, systems.

1 Comment

October 8, 2018

Import AI 115: What the DoD is planning for its robots over the next 25 years; AI Benchmark identifies 2018’s speediest AI phone; and DeepMind embeds graph networks into AI agents

by Jack Clark

UK military shows how tricky it’ll be to apply AI to war:
…Numerous AI researchers likely breathe a sigh of relief at new paper from UK’s Defence Science and Technology Laboratory…
Researchers with the UK’s Defence Science and Technology Laboratory, Cranfield Defense and Security Doctoral Training Centre, and IBM, have surveyed contemporary AI and thought about ways it can be integrated with the UK’s defence establishment. The report makes for sobering reading for large military organizations keen to deploy AI, highlighting the difficulties in terms of practical deployment (eg, procurement) and in terms of capability (many military situations require AI systems that can learn and update in response to sparse, critical data.
  Current problems: Today’s AI systems lack some key capabilities that militiaries need when deploying systems, like being able to configure systems to always avoid certain “high regret” occurrences (in the case of a military, you can imagine that firing a munition at an incorrect target (hopefully) yields such ‘high regret); being resilient to adversarial examples being weaponized against systems via another actor (whether a defender or aggressor); being able to operate effectively with very small or sparse data; being able to shard AI systems across multiple partners (eg, other militaries) in such a way that the system can be reverted to sovereign control following the conclusion of an operation; and begin able to deploy such systems into the harsh low-compute operational environment that militaries face.
  High expectations: “If it is to avoid the sins of its past, there is the need to manage stakeholder expectations very carefully, so that their demand signal for AI is pragmatic and achievable”.
  It’s the data, stupid: Militaries, like many large government organizations, have an unfortunate tendency to sub-contract much of their IT systems out to other parties. This tends to lead to systems that are:
a) moneypits
b) brittle
c) extremely hard to subsequently extend.
These factors add a confounding element to any such military deployment of AI. “Legacy contractual decisions place what is effectively a commercial blocker to AI integration and exploitation in the Defence Equipment Program’s near-term activity,” the researchers write.
  Procurement: UK defence will also need to change the way it does procurement so it can maximize the number of small-and-medium-sized enterprises it can buy its AI systems from. But buying from SMEs creates additional complications for militaries, as working out what to do with the SME-supported service if the SME stops providing it, or goes bankrupt, is difficult and imposes a significant burden on the part of the SME.
Why it matters: Military usage of AI is going to be large-scale, consequential, and influential in terms of geopolitics. It’s also going to invite numerous problems from AI accidents as a consequence of poor theoretical guarantees and uneven performance properties, so it’s encouraging to see a military organization like representatives from UK defence seek to think through this.
  Read more: A Systems Approach to Achieving the Benefits of Artificial Intelligence in UK Defence (Arxiv).

Want to understand the mind of another? Get relational!
…DeepMind research into combining graph networks and relational networks shows potential for smarter, faster agents…
DeepMind researchers have tried to develop smarter AI agents by combining contemporary deep learning techniques with recent work by company on graph networks and relational networks. The resulting systems rely on a new module, which DeepMind calls a “Relational Forward Model”. This model obtains higher performance than pure-DL baselines, suggesting that fusing DL and more structured approaches is a viable approach which yields good performance.
How it works: The RFM module consists of a graph network encoder, a graph network decoder, and a graph-compatible GRU. Combined, these components create a way to represent structured information in a relational manner, and to update this information in response to changes in the environment (or, theoretically, the inputs of other larger structured systems).
  Testing: The researchers test their approach on three distinct tasks: cooperative navigation, which requires agents to collaborate to efficiently navigate to both be on a distinct reward tile in an area; coin game, which requires agents to position themselves above some reward coins and to figure out by observing eachother which coins yield a negative reward and thus should be avoided; and stag hunt, where agents inhabit a map containing stags and apples, and need to work with one another to capture stags which yield a significant reward. “By embedding RFM modules in RL agents, they can learn to coordinate with one another faster than baseline agents, analogous to imagination-augmented agents in single-agent RL settings” , the researchers write.
The researchers compare the performance of their systems against systems using Neural Relational Inference (NRI) and Vertex Attention Interaction Networks (VAIN) and find that their approach displays significantly better performance than other approaches. They also ablated their system by training versions without the usage of relational networks, and ones using feedforward networks only. These ablations showed that both components have a significant role in the performance of these systems.
  Why it matters: The research is an extension of DeepMind’s work on integrating graph networks with deep learning. This line of research seems promising because it provides a way to integrate structured data representations with differentiable learning systems, which might let AI researchers have their proverbial cake and eat it to by being able to marry the flexibility of learned systems with the desirable problem specification and reasoning properties of more traditional symbolic approaches.
  Read more: Relational Forward Models for Multi-Agent Learning (Arxiv).

Rethink Robotics shuts its doors:
…Collaborative robots pioneer closes…
Rethink Robotics, a robot company founded by MIT robotics legend Rodney Brooks, has closed. The company had developed two robots; Baxter, a two-armed bright red robot with expressive features and the pleasing capability to work around humans without killing them, and Sawyer, a one-armed successor to Baxter.
Read more: Rethink Robotics Shuts Down (The Verge).

Want to know what the DoD plans for unmanned systems through to 2042? Read the roadmap:
….Drones, robots, planes, oh my! Plus, the challenges of integrating autonomy with military systems…
The Department of Defense has published its (non-classified) roadmap for unmanned systems through to 2042. The report identifies four core focus areas that we can expect DoD to focus on. These are: Interoperability, Autonomy, Network Security, and Human-Machine Collaboration.
  Perspective: US DoD spent ~$4.245 billion on unmanned systems in 2017 (inclusive of procurement and research, with a roughly equal split between them). That’s quite a substantial amount of money to spend and, if we can assume that this will remain the same (adjusted for inflation), then that means DoD can throw quite significant resources towards the capital R parts of unmanned systems research.
Short-Term Priorities: DoD’s short-term priorities for its unmanned systems include: the use of standardized and/or open architectures; a shift towards modular, interchangeable parts; a greater investment in the evaluation, verification, and validation of systems; the creation of a “data transport” strategy to deal with the huge floods of data coming from such systems; among others.
  Autonomy priorities: DoD’s priorities for adding more autonomy to drones includes increasing private sector collaboration in the short term and then adding in augmented reality and virtual reality systems by the mid-term (2029), before creating platforms capable of persistent sensing with “highly autonomous” capabilities by 2042. As for the thorny issue of weaponizing such systems, DoD says that between the medium-term and long-term it hopes to be able to give humans an “armed wingman/teammate” with fire control remaining with the human.
Autonomy issues: “Although safety, reliability, and trust of AI-based systems remain areas of active research, AI must overcome crucial perception and trust issues to become accepted,” the report says. “The increased efficiency and effectiveness that will be realized by increased autonomy are currently limited by legal and policy constraints, trust issues, and technical challenges.”
  Why it matters: The maturation of today’s AI techniques mean that it’s a matter of “when” not “if” for them to be integrated into military systems. Documents like this give us a sense of how large, military bureaucracies are reacting to the rise of AI, and it’s notable that certain concerns within the technical community about the robustness/safeness of AI systems has made its way into official DoD planning.
  Read the full report here: Pentagon Unmanned Systems Integrated Roadmap 2017-2042 (USNI News).

Should we take deep learning progress as being meaningful?
…UCLA Computer Science chair urges caution…
Adnan Darwiche, chairman of the Computer Science Department at UCLA and someone who studied AI mid-winter in the 1980s, has tried to lay out some of the reasons to be skeptical about whether deep learning will ever scale to let us build truly intelligent systems. The crux of his objection is: “Mainstream scientific intuition stands in the way of accepting that a method that does not require explicit modeling or sophisticated reasoning is sufficient for reproducing human-level intelligence”.
  Curve-fitting: The second component of the criticism is that people shouldn’t get too excited about neural network techniques because all they really do is curve-fitting, and instead we should be looking at using model-based approaches, or making hybrid systems.
  Time is the problem: “It has not been sustained long enough to allow sufficient visibility into this consequential question: How effective will function-based approaches be when applied to new and broader applications than those already targeted, particularly those that mandate more stringent measures of success?”
Curve-fitting can’t explain itself: Another problem identified by the author is the lack of explanation inherent to these techniques, which they see as further justifying investment by the deep learning community into model-based approaches which include more assumptions and/or handwritten sections. “Model-based explanations are also important because they give us a sense of “understanding” or “being in control” of a phenomenon. For example, knowing that a certain diet prevents heart disease does not satisfy our desire for understanding unless we know why.”
  Giant and crucial caveat: Let’s be clear that this piece is essentially reacting to a cartoonish representation of the deep learning AI community that can be caricatured as having this opinion: Deep Learning? Yeah! Yeah! Yeah! Deep Learning is the future of AI! I should note that I’ve never met anyone technically sophisticated who has this position, and most researchers when pressed will raise somewhat similar concerns to those identified in this article. I think some of the motivation for this article stems more from dissatisfaction with the current state of (most) media coverage regarding AI which tends to be breathless and credulous – this is a problem, but as far as I can tell it isn’t really a problem being fed intentionally by people within the AI community, but is instead a consequence of the horrific economics of the post-digital news business and associated skill-rot that occurs.
Why it matters: Critiques like this are valuable as they encourage the AI community to question itself. However, I think that these critiques need to be manufactured over significantly shorter timescales and should take into account more contemporary research; for instance, some of the objections here seem to be (lightly) rebutted by recent work in NLP which shows that “curve-fitting” systems are capable of feats of reasoning, among other examples. (In the conclusion of this article it says the first draft was written in 2016, then a draft was circulated in the summer of 2017, and now it has been officially published in Autumn 2018, rendering many of its technical references outdated.)
  Read more: Human-level intelligence or animal-like abilities (ACM Digital Library).

Major companies create AI Benchmark and test 10,000+ phones for AI prowess, and a surprising winner emerges:
…Another sign of the industrialization of AI…new benchmarks create standards and standards spur markets…
Researchers with ETH Zurich, Google, Qualcomm, Huawei, MediaTek, and ARM, want to be able to better analyze the performance of AI software on different smartphones and so have created “AI Benchmark” and tested over 10,000 devices against it. AI Benchmark is, a batch of nine tests for mobile devices which has been “designed specifically to test the machine learning performance, available hardware AI accelerators, chipset drivers, and memory limitations of the current Android devices”.
The ingredients of the AI Benchmark: The benchmark consists of nine deep learning tests: Image Recognition tested on ImageNet using a lightweight MobileNet-V1 architecture, and the same test but implementing a larger Inception-V3 network; Face Recognition performance of an Inception-Resnet-V1 on the VGGFace2 dataset; Image Deblurring using the SRCNN network; Image Super-Resolution with a downscaling factor of 3 using a VSDR network, and the same test but with a downscaling factor of 4 and using an SRGAN; Image Semantic Segmentation via an ICNet CNN; and a general Image Enhancement problem (encompassing things like “color enhancement, denoising, sharpening, texture synthesis”); and a memory limitations test which uses the same network as in the deblurring task while testing it over larger and larger image sizes to explore RAM limitations.
  Results: The researchers tested “over 10,000 mobile devices” on the benchmark. The core test for each of the benchmarks nine evaluations is the millisecond time it takes to run the network. The researchers blend results of each of the nine tests together into an overall “AI-Score”. The top results, when measured via AI Score, are (chipset, score):
#1: Huawei P20 Pro (HiSilicon Kirin 970, 6519)
#2: OnePlus 6 (Snapdragon 845/DSP, 2053)
#3: HTC U12+ (Snapdragon 845, 1708)
#4: Samsung Galaxy S9+ (Exynos 9810 Octa, 1628)
#5: Samsung Galaxy S8 (Exynos 8895 Octa, 1413)
It’s of particular interest to me that the top-ranking performance seems to come from the special AI accelerator which chips with the HiSilicon chip, especially given that it is a Chinese semiconductor company so provides more evidence of Chinese advancement in this area. It’s also notable to me that Google’s ‘Pixel’ phones didn’t make the top 5 (though they did make the top 10).
  The future: This first version of the benchmark may be slightly skewed due to Huawei managing to chip a device incorporating a custom AI accelerator earlier than many other chipmakers. “The real situation will become clear at the beginning of the next year when the first devices with the Kirin 980, the MediaTek P80 and the next Qualcomm and Samsung Exynos premium SoCs will appear on the market,” the researchers note.
Full results of this test are available at the official AI Benchmark website.
  Why this matters: I think the emergence of new large-scale benchmarks for applied AI applications represent further evidence for the current era being ‘the Industrialization of AI’. Viewed through this perspective, the creation (and ultimate adoption) of benchmarks gives us a greater ability to model macro progress indicators in the field and use those to better predict not only where hardware & software is today, but also to be able to develop better intuitions about underlying laws that condition the future as well.
Read more: AI Benchmark: Running Deep Neural Networks on Android Smartphones (Arxiv).
  Check out the full results of the Benchmark here (AI Benchmark).

Toyota researchers propose new monocular depth estimation technique:
…Perhaps a cyclops can estimate depth just as well as a person with two eyes, if deep learning can help?…
Any robot expected to act within the world and around people needs some kind of depth-estimation capability. Such a capability will aid them in estimating the proximity of objects to the car and be a valuable data input for safety-critical calculations like modelling the other entities in the environment and also performing velocity calculations. Therefore, depth estimation systems can be viewed as a key input technology for any self-driving car.
But depth estimation systems can be difficult to implement, and they can sometimes be expensive as the typical way to do it is to implement a binocular system similar to how humans have two eyes and then use software to offset the differences and use that to estimate depth. But what if you can only afford one sensor? And what if you have a certain accuracy threshold which can be satisfied by somewhat lower accuracy than you would expect to get with binocular vision, but still good enough for your use case? Then you might want to estimate depth from a single sensor – if so, new deep learning techniques in monocular upscaling and super-resolution might be able to augment and manipulate the data to perform accurate depth estimation in a self-supervised manner.
That’s the idea behind a technique from the Toyota Research Institute, which proposes a depth estimation technique that uses encoder and decoder networks to learn a good representation of depth that can be applied to new images. This new technique obtains higher accuracy scores for depths of various ranges, setting state of the art scores on 5 out of 6 benchmarks. It relies on the usage of a “sub-pixel convolutional layer based on ESPCN for depth super-resolution”. This component “synthesizes the high-resolution disparities from their corresponding low-resolution multi-scale model outputs”.
Qualitative evaluation: Samples generated by the dataset display greater specificity and smoothness than others. This is in part due to the use of the sub-pixel resolution technique. This technique yields an effect in samples shown in the paper that strikes me as being visually similar to the outcomes of an anti-aliasing process within traditional computer graphics.
Read more: SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation (Arxiv).

California considers Turing Test law:
California’s Senate is considering a bill making it unlawful to use bots to mislead individuals about their artificial identities in order to influence their purchases or voting behaviour. The bill appears to be focused on a few specific use-cases, particularly social media bots. The proposed law would come into force in July 2019.
Why it matters: This law points to an issue that will become increasingly important as AI systems’ ability to mimic humans improves. This received attention earlier this year when Google demonstrated their Duplex voice assistant mimicking a human to book appointments. After significant backlash, Google announced the system would make a verbal disclosure that it was an AI. Technological solutions will be important in addressing issues around AI identification, particularly since bad actors are unlikely to be concerned with lawfulness.
Read more: California Senate Bill 1001.

OpenAI Bits & Pieces:

Digging into AI safety with Paul Christiano:
Ever wondered about technical solutions to AI alignment, what the long-term policy future looks like when the world contains intelligent machines, and how we expect machine learning to interact with science? Yes? Then check out this 80,000 hours podcast with Paul Christiano of OpenAI’s safety team.
Read more: Dr Paul Christiano on how OpenAI is developing real solutions to the ‘AI alignment problem’, and his vision of how humanity will progressively hand over decision-making to AI systems.

Tech Tales:

The Day We Saw The Shadow Companies and Ran The Big Excel Calculation That Told Us Something Was Wrong.

A fragment of a report from the ‘Ministry of Industrial Planning and Analysis, recovered following the Disruptive Event. See case file #892 for further information. Refer to [REDACTED] for additional context.

Aluminium supplier. Welder. Small PCB board manufacturer. Electronics contractor. Solar panel farm. Regional utility supplier. Mid-size drone designer. 3D world architect.

What do these things have in common? They’re all businesses, and they all have, as far as we can work out, zero employees. Sure, they employ some contractors to do some physical work, but mostly these businesses are run on a combination of pre-existing capital investments, robotic process automation, and the occasional short-term set of human hands.

So far, so normal. We get a lot of automated companies these days. What’s different about this is the density of trades between these companies. The more we look at their business records, the more intra-company activity we see.

One example: The PCB boards get passed to an electronics contractor which does… something… to them, then they get passed to a mid-size drone designer which does… something… to them, then a drone makes its way to a welder which does… something… to the drone, then the drone gets shipped to the utility supplier and begins survey flights of the utility field.

Another example: The solar panel gets shipped to the welder. Then the PCB board manufacturer ships something to the welder. Then out comes a solar panel with some boards on it. This gets shipped to the regional utility supplier which sub-contracts with the welder which comes to the site and does some welding at a specific location overseen by a modified drone.

None of these actions are illegal. And none of our automated algorithms pick these kinds of events up. It’s almost like they’re designed to be indistinguishable from normal businesses. But something about it doesn’t register right to us.

We have a tool we use. It’s called the human to capital ratio. Most organizations these days sit somewhere around 1:5. Big, intensive organizations, like oil companies, sit up around 1:25. When we analyze these companies individually we find that they sit right at the edges of normal distributions in terms of capital intensity. But when we perform an aggregate analysis out pops this number: 1:40.

We’ve checked and re-checked and we can’t bring the number down. Who owns these companies? Why do they have so much capital and so few humans? And what is it all driving towards.

Our current best theory, after some conversations with the people in the acronym agencies, is [REDACTED].

Things that inspired this story: Automated capitalism, “the blockchain”, hiding in plain sight, national economic metric measurement and analysis techniques, the reassuring tone of casework files.

2 Comments

October 1, 2018

Import AI 114: Synthetic images take a big leap forward with BigGANs; US lawmakers call for national AI strategy; researchers probe language reasoning via HotspotQA

by Jack Clark

Getting hip to multi-hop reasoning with HotpotQA:
…New dataset and benchmark designed to test common sense reasoning capabilities…
Researchers with Carnegie Mellon University, Stanford University, the Montreal Institute for Learning Algorithms, and Google AI, have created a new dataset and associated competition designed to test the capabilities of question answering systems. The new dataset, HotspotQA, is far larger than many prior datasets designed for such tasks, and has been designed to require ‘multi-hop’ reasoning to thereby test the growing sophistication of newer NLP systems at performing increasing cognitive tasks.
HotpotQA consists of around ~113,000 Wikipedia-based question-answer pairs. Answering these questions correctly is designed to test for ‘multi-hop’ reasoning – the ability for systems to look at multiple documents and perform basic iterative problem-solving to come up with correct answers. These questions were “collected by crowdsourcing based on Wikipedia articles, where crowd workers are shown multiple supporting context documents and asked explicitly to come up with questions requiring reasoning about all of the documents”. These workers also provide the supporting facts they use to answer these questions, providing a strong supervised training set.
  It’s the data, stupid: To develop HotpotQA the researchers needed to themselves create a kind of multi-hop pipeline to be able to figure out what documents to give cloud workers to use to compose questions for. To do this, they mapped the Wikipedia Hyperlink Graph and used this information to build a directed graph, then they try to detect correspondences between these pairs. They also created a hand-made list of categories to use to compare things of similar categories (eg, basketball players, etc).
Testing: HotpotQA can be used to test models’ capabilities in different ways, ranging from information retrieval to question answering. The researchers train a system to give a baseline and the results show that the (relatively strong baseline) obtains performance significantly below that of a competent human across all tasks (with the exception of certain ‘supporting fact’ evaluations, in which it obtains performance on par with an average human).
  Why it matters: Natural language processing research is currently going through what some have called an ‘ImageNet moment’ following recent algorithmic developments relating to the usage of memory and attention-based systems, which have demonstrated significantly higher performance across a range of reasoning tasks compared to prior techniques, while also being typically much simpler. Like with ImageNet and the associated supervised classification systems, these new types of NLP approaches require larger datasets to be trained on and evaluated against, and as with ImageNet it’s likely that by scaling up techniques to take on challenges defined by datasets like HotpotQA progress in this domain will increase further.
Caveat: As with all datasets with an associated competitive leaderboard it is feasible that HotpotQA could be relatively easy and systems could end up exceeding human performance against it in a relatively short amount of time – this happened over the past year with the Stanford SQuAD dataset. Hopefully the relatively higher sophistication of HotspotQA will protect against this.
  Read more: HotpotQA website with leaderboard and data (HotpotQA Github).
Read more: HOTPOTQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Arxiv).

Administrative note regarding ICLR papers:
This week was the deadline for submissions for the International Conference on Learning Representations. These papers are published under a blind review process as they are currently under review. This year, there were 1600 submissions to ICLR, up from 1000 in 2017, 500 in 2016, and 250 in 2015. I’ll be going through some of these papers in this issue and others and will try to avoid making predictions about which organizations are behind which papers so as to respect the blind review process.

Computers can now generate (some) fake images that are indistinguishable from real ones:
…BigGAN’s show significant progression in capabilities in synthetic imagery…
The researchers train GAN models with 2-4X the parameters and 8X the batch size compared to prior papers, and also introduce improve the stability of GAN training.
Some of the implemented techniques mean that samples generated by such GAN models can be tuned, allowing for “explicit, fine-grained control of the trade-off between sample variety and fidelity”. What this means in practice is that you can ‘tune’ how similar the types of generated images are to specific sets of images within the dataset, so for instance if you wanted to generate an image of a field containing a pond you might pick a few images to prioritize in training that contain ponds, whereas if you wanted to also tune the generated size of the pond you might pick images containing ponds of various sizes. The addition of this kind of semantic dial seems useful to me, particularly for using such systems to generate faked images with specific constraints on what they depict.
  Image quality: Images generated via these GANs are of a far superior quality than prior systems, and and can be outputted at relatively large resolutions of 512X512pixels. I encourage you to take a look at the paper and judge for yourself, but it’s evident from the (cherry-picked) samples that given sufficient patience a determined person can now generate photoreal faked images as long as they have a precise enough set of data from which to train on.
Problems remain: There are still some drawbacks to the approach; GANs are notorious for their instability during training, and developers of such systems need to develop increasingly sophisticated approaches to deal with the instabilities in training that manifest at increasingly larger scales, leading to a certain time-investment tradeoff inherent to the scale-up process. The researchers do devise some tricks to deal with this, but they’re quite elaborate. “We demonstrate that a combination of novel and existing techniques can reduce these instabilities, but complete training stability can only be achieved at a dramatic cost to performance,” they write.
  Why it matters: One of the most interesting aspects of the paper is how simple the approach is: take today’s techniques, try to scale them up, and conduct some targeted research into dealing with some of the rough edges of the problem space. This seems analogous to recent work on scaling up algorithms in RL, where both DeepMind and OpenAI have developed increasingly large-scale training methodologies paired with simple scaled-up algorithms (eg DQN, PPO, A2C, etc).
“We find that current GAN techniques are sufficient to enable scaling to large models and distributed, large-batch training. We find that we can dramatically improve the state of the art and train models up to 512×512 resolution without need for explicit multiscale methods,” the researchers write.
  Read more: Large Scale GAN Training For High Fidelity Natural Image Synthesis (ICLR 2018 submissions, OpenReview).
Check out the samples: Memo Akten has pulled together a bunch of interesting and/or weird samples from the model here, which are worth checking out (Memo Akten, Twitter).

Want better RL performance? Try remembering what you’ve been doing recently:
…Recurrent Replay Distributed DQN (R2D2) obtains state-of-the-art on Atari & DMLab by a wide margin…
R2D2 is based on a tweaked version of Ape-X, a large-scale reinforcement learning system developed by DeepMind which displays good performance and sample efficiency when trained at large-scale. Ape-X uses prioritized distributed replay, using a single learner to learn from the experience of numerous distinct actors (typically 256).
New tricks for old algos: The researchers implement two relatively simple strategies to help them train the R2D2 algorithm to be smarter about how it uses its memory to learn more complex problem-solving strategies. These tweaks are to store the recurrent state in the replay buffer and use it to initialize the network at training time, and “allow the network a ‘burn-in period’ by using a portion of the replay sequence only for unrolling the network and producing a start state, and update the network only on the remaining part of the sequence.”
Results: R2D2 obtains vastly higher scores than any prior system on these tasks, and, via the large-scale, can be trained to achieve ~1300% human-normalized scores on Atari (a median over 57 games, so it does even better on some, and substantially worse on others). However, in tests on DMLab-30, a set of 3D environments for training agents which is designed to be more difficult than Atari. Here, the system also displays extremely good performance when compared to prior systems.
It’s all in the memory: The system does well here on some fairly difficult environments, and notably the authors show via some ablation studies that the agent does appear to be using its in-built memory to solve tasks. “We first observe that restricting the agent’s memory gradually decreases its performance, indicating its nontrivial use of memory on both domains. Crucially, while the agent trained with stored state shows higher performance when using the full history, its performance decays much more rapidly than for the agent trained with zero start states. This is evidence that the zero start state strategy, used in past RNN-based agents with replay, limits the agent’s ability to learn to make use of its memory. While this doesn’t necessarily translate into a performance difference (like in MS.PACMAN), it does so whenever the task requires an effective use of memory (like EMSTM WATERMAZE).,” they write.
Read more: Recurrent Experience Replay In Distributed Reinforcement Learning (ICLR 2018 submissions, OpenReview).

US lawmakers call for national AI strategy and more funding:
…The United States cannot maintain its global leadership in AI absent political leadership from Congress and the Executive Branch…
Lawmakers from the US’s Subcommittee on Information Technology of the House Committee on Oversight and Government Reform have called for the creation of a national strategy for artificial intelligence led by the current administration, as well as more funding for basic research.
The comments from Chairman Will Hurd and Ranking Member Robin Kelly are the result of a series of three hearings held by that committee in 2018 (Note: I testified at one of them). It’s a short paper and worth reading in full to get a sense of what policymakers are thinking with regard to AI.
  Notable quotes: “The United States cannot maintain its global leadership in AI absent political leadership from Congress and the Executive Branch.” + Government should “increase federal spending on research and development to maintain American leadership with respect to AI” + “It is critical the federal government build upon, and increase, its capacity to understand, develop, and manage the risks associated with this technology’s increased use” + “American competitiveness in AI will be critical to ensuring the United States does not lose any decisive cybersecurity advantage to other nationstates”.
  China: China looms large in the report as a symbol that ‘the United States’ leadership in AI is no longer guaranteed”. One analysis contained within the paper says China is likely “to pass the United States in R&D investments” by the end of 2018″ – significant, considering that the US’s annual outlay of approximately $500 billion makes it the biggest spender on the planet.
Measurement: The report suggests that “at minimum” the government should develop “a widely agreed upon standard for measuring the safety and security of AI products and applications” and notes the existence of initiatives like The AI Index as good starts.
  Money: “There is a need for increased funding for R&D at agencies like the National Science Foundation, National Institutes of Health, Defense Advanced Research Project Agency, Intelligence Advanced Research Project Agency, National Institute of Standards and Technology, Department of Homeland Security, and National Aeronautics and Space Administration. As such, the Subcommittee recommends the federal government provide for a steady increase in federal R&D spending. An additional benefit of increased funding is being able to support more graduate students, which could serve to expand the future workforce in AI.”
  Leadership: “There is also a pressing need for conscious, direct, and spirited leadership from the Trump Administration. The 2016 reports put out by the Obama Administration’s National Science and Technology Council and the recent actions of the Trump Administration are steps in the right direction. However, given the actions taken by other countries—especially China— Congress and the Administration will need to increase the time, attention, and level of resources the federal government devotes to AI research and development, as well as push for agencies to further build their capacities for adapting to advanced technologies.”
  Read more: Rise of the Machines: Artificial Intelligence and its Growing Impact on US Policy (Homeland Security Digital Library).

Open Philanthropy Project opens applications for AI Fellows:
The Open Philanthropy Project, the grant-making foundation funded by Cari Tuna and Dustin Moskovitz, is accepting applications for its 2019 AI Fellows Program. The program will provide full PhD funding for AI/ML researchers focused on the long-term impacts of advanced AI systems. The first cohort of AI Fellows were announced in June of this year.
Key details: “Support will include a $40,000 per year stipend, payment of tuition and fees, and an additional $10,000 in annual support for travel, equipment, and other research expenses. Fellows will be funded from Fall 2019 through the end of the 5th year of their PhD, with the possibility of renewal for subsequent years. We do encourage applications from 5th-year students, who will be supported on a year-by-year basis.”
Read more: Open Philanthropy Project AI Fellows Program (Open Phil).
Read more: Announcing the 2018 AI Fellows (Open Phil).

Google confirms Project Dragonfly in Senate:
Google have confirmed the existence of Project Dragonfly, an initiative to build a censored search engine within China, as part of Google’s broad overture towards the world’s second largest economy. Google’s chief privacy officer declined to give any details of the project, and denied the company was close to launching a search engine in the country. A former senior research scientist, who publicly resigned over Dragonfly earlier this month, had written to Senators ahead of the hearings, outlining his concerns with the plans.
  Why it matters: Google is increasingly fighting a battle on two fronts with regards to Dragonfly, with critics concerned about the company’s complicity in censorship and human rights abuses, and others suspicious of Google’s willingness to cooperate with the Chinese government so soon after pulling out of a US defense project (Maven).
  Read more: Google confirms Dragonfly in Senate hearing (VentureBeat).
  Read more: Former Google scientist slams ‘unethical’ Chinese search project in letter to senators (The Verge).

DeepMind releases framework for AI safety research:
…AI company also launches new AI safety blog…
DeepMind’s safety team have launched their new blog with a research agenda for technical AI safety research. They divide the field into three areas: specification, robustness, and assurance.
  Specification research is aimed at ensuring an AI system’s behavior aligns with the intentions of its operator. This includes research into how AI systems can infer human preferences, and how to avoid problems of reward hacking and wire-heading.
  Robustness research is aimed at ensuring a system is robust to changes in its environment. This includes designing systems that can safely explore new environments and withstand adversarial inputs.
  Assurance research is aimed at ensuring we can understand and control AI systems during operation. This includes issues research into interpretability of algorithms, and the design of systems that can be safely interrupted (e.g. off-switches for advanced AI systems).
  Why it matters: This is a useful taxonomy of research directions that will hopefully contribute to a better understanding of problems in AI safety within the AI/ML community. DeepMind has been an important advocate for safety research since its inception. It is important to remember that AI safety is still dwarfed by AI capabilities research by several orders of magnitude, in terms of both funding and number of researchers.
  Read more: Building Safe Artificial Intelligence (DeepMind via Medium).

OpenAI Bits & Pieces:

OpenAI takes on Dota 2: Short Vice documentary:
As part of our Dota project we experimented with new forms of comms, including having a doc crew from Vice film us in the run-up to our competition at The International.
Check out the documentary here: This Robot is Beating the World’s Best Video Gamers (Vice).

Tech Tales:

They call the new drones shepherds. We call them prison guards. The truth is somewhere in-between.

You can do the math yourself. Take a population. Get the birth rate. Project over time. That’s the calculus the politicians did that led to them funding what they called the ‘Freedom Research Initiative to Eliminate Negativity with Drones’ (FRIEND).

FRIEND provided scientists with a gigantic bucket of money to fund research into creating more adaptable drones that could, as one grant document stated, ‘interface in a reassuring manner with ageing citizens’. The first FRIEND drones were like pet parrots, and they were deployed into old people’s homes in the hundreds of thousands. Suddenly, when you went for a walk outside, you were accompanied by a personal FRIEND-Shepherd which would quiz you about the things around you to stave off age-based neurological decline. And when you had your meals there was now a drone hovering above you, scanning your plate, and cheerily exclaiming “that’s enough calories for today!” when it had judged you’d eaten enough.

Of course we did not have to do what the FRIEND-Shepherds told us to do. But many people did and for those of us who had distaste for the drones, peer pressure did the rest. I tell myself that I am merely pretending to do what my FRIEND-Shepherd says, as it takes me on my daily walk and suggests the addition or removal of specific ingredients from my daily salad to ‘maintain optimum productivity via effective meal balancing’.

Anyway, as the FRIEND program continued the new Shepherds became more and more advanced. But people kept on getting older and birth rates kept on falling; the government couldn’t afford to buy more drones to keep up with the growing masses of old people, so it directed FRIEND resources towards increasing the autonomy and, later, ‘persuasiveness’ of such systems.

Over the course of a decade the drones went from parrots to pop psychologists with a penchant for nudge economics. Now, we’re still not “forced” to do anything by the Shepherds, but the Shepherds are very intelligent and much of what they spend their time doing is finding out what makes us tick so they can encourage us to do the thing that extends lifespan while preserving quality of life.

The Shepherd assigned to me and my friends has figured out that I don’t like Shepherds. It has started to learn to insult me, so that I chase it. Sometimes it makes me so angry that I run around the home, trying to knock it out of the air with my walking stick. “Well done,” it will say after I am out of breath. “Five miles, not bad for a useless human.” Sometimes I will then run at it again, and I believe I truly am running at it because I hate it and not because it wants me to. But do I care about the difference? I’m not sure anymore.

Things that inspired this story: Drones, elderly care robots, the cruel and inescapable effects of declining fertility in developed economies, JG Ballard, Wall-E, social networks, emotion-based AI analysis systems, NLP engines, fleet learning with individual fine-tuning.

1 Comment

September 25, 2018

Import AI 113: Why satellites+AI gives us a global eye; industry pays academia to say sorry for strip-mining it; and Kindred researchers seek robot standardization

by Jack Clark

Global eye: Planet and Orbital Insight expand major satellite imagery deal:
…The future of the world is a globe-spanning satellite-intelligence utility service…
Imagine what it’s like to be working in a medium-level intelligence agency in a mid-size country when you read something like this: “Planet, who operates the largest constellation of imaging satellites, and Orbital Insight, the leader in geospatial analytics, announced today a multi-year contractor for Orbital Insight to source daily, global, satellite imagery from Planet”. I imagine that you might think: ‘wow! That looks a lot like all those deals we have to do secretly with other mid-size countries to access each other’s imagery. And these people get to do it in the open!?” Your next thought might be: how can I buy services from these companies to further my own intelligence capabilities?
AI + Intelligence: The point I’m making is that artificial intelligence is increasingly relevant to the sorts of tasks that intelligence agencies traditionally specialize in, but with the twist that lots of these intelligence-like tasks (say, automatically counting the cars in a set of parking lots across a country, or analyzing congested-versus-non-congested roads in other cities, or honing in on unusual ships in unusual waters) are now available in the private sector as well. This general diffusion of capabilities is creating many commercial and scientific benefits, but it is also narrowing the gap in capability between what people can buy versus what people can only access if they are a nuclear-capable power with a significant classified budget and access to a global internet dragnet. Much of the stability of the 20th century was derived from their being (eventually) a unipolar world in geopolitical terms with much of this stemming from inbuilt technological advantages. The ramifications of this diffusion of capability are intimately tied-up with issues relating to the ‘dual-use’ nature of AI and to the changing nature of geopolitics. I hope deals like the above provoke further consideration of just how powerful – and how widely available – modern AI systems are.
Read more: Planet and Orbital Insight Expand Satellite Imagery Partnership (Cision PR Newswire).

Robots and Standards are a match made in hell, but Kindred thinks it doesn’t have to be this way:
…New robot benchmarks seek to bring standardization to a tricky area of AI…
Researchers with robotics startup Kindred have built on prior work on robot standardization (Import AI #87) have tried to make it easier for researchers to compare the performance of real world robots against one another by creating a suite of two tasks for each of three commercially available robot platforms.
  Robots used: Universal Robotics UR5 collaborative arm, Robotis MX-64AT Dynamixel actuators (which are frequently used within other robots), and a hockeypuck-shaped Create2 mobile robot.
Standard tasks: For the UR5 arm the researchers create two reaching tasks with varying difficulty achieved by selectively turning on/off different actuators on the robot to scale complexity. For the DXL actuator they create a reacher task and also a tracking task; tracking requires that the DXL precisely track a moving target. For the Create2 robot they test it in two ways: movement, where it needs to move forward as fast as possible in a closed arena, and docking, in which the task it to dock to a charging station attached to one of the walls within the arena.
  Algorithmic baselines: The researchers also use their benchmarking suite to compare multiple widely used AI algorithms against eachother, including TRPO and PPO, DDPG, and Soft-Q. By using standard tasks it’s easier for the researchers to compare the effects of things like hyperparameter choices on different algorithms, and by having these tasks take place on real world robot platforms, it’s possible to get a sense of how well these algorithms deal with the numerous difficulties involved in reality.
Drawbacks: One drawback of these tasks is that they’re very simple: OpenAI recently showed how to scale PPO to let us train a robot to perform robust dextrous manipulation of a couple of simple objects, which involved having to learn to control a five-digit robot hand; by comparison, these tasks involve robot platforms with a significantly smaller number of dimensions of movement, making the tasks significantly easier.
  Time and robots: One meta-drawback with projects like this is that they involve learning on the robot platform, rather than learning in a mixture of simulated and real environments – this makes everything take an extraordinarily long time. For this paper, the authors “ran more than 450 independent experiments which took over 950 hours of robot usage in total,” they noted.
Why it matters: For AI to substantively change the world it’ll need to be able to not just flip bits, but flip atoms as well. Today, some of that is occurring by connecting up AI-driven systems (for instance, product recommendation algorithms) to e-retail systems (eg Amazon), which let AI play a role in recommending courses of action to systems that ultimately go and move some mass around the world. I think for AI to become even more impactful we need to cut out the middle step and have AI move mass itself – so connecting AI to a system of sensors and actuators like a robot will eventually yield a direct-action platform for AI systems; my hypothesis is that this will dramatically increase the range of situations we can deploy learning algorithms into, and will thus hasten their development.
  Read more: Benchmarking Reinforcement-Learning Algorithms on Real-World Robots (Arxiv).

AI endowments at University College London and the University of Toronto:
…Appointments see industry giving back to the sector it is strip-mining (with the best intentions)…
  DeepMind is funding an AI professorship as well as two post-doctoral researchers and one PHD student at University College London. “We are delighted by this opportunity to further develop our relationship with DeepMind,” said John Shawe-Taylor, head of UCL’s Department of Computer Science.
  Uber is investing “more than $200 million” into Toronto and also its eponymous university. This investment is to fund self-driving car research at the University of Toronto, and for Uber to set up its first-ever engineering facility in Canada.
  Meanwhile, LinkedIn co-founder Reid Hoffman has gifted $2.45 million to the University of Toronto’s ‘iSchool’ to “establish a chair to study how the new era of artificial intelligence (AI) will affect our lives”.
  Why it matters: Industry is currently strip-mining academia for AI talent, constantly hiring experienced professors and post-docs (and some of the most talented PHD students), leading to a general brain drain from academia. Without action by industry like this to even the balance, there’s a risk of degrading AI education to the point that industry runs into problems.
  Read more: New DeepMind professorship at UCL to push frontiers of AI (UCL).
Read more: LinkedIn founder Reid Hoffman makes record-breaking gift to U of T’s Faculty of Information for chair in AI (UofT News).

Learning the task is so last year. Now it’s all about learning the algorithm:
…Model-Based Meta-Policy-Optimization shows sample efficiency of meta-learning (if coaxed along with some clever human-based framing of the problem)…
Researchers with UC Berkeley, OpenAI, Preferred Networks, and Karlsruhe Institution of Technology (KIT) have developed model-based meta-policy-optimization, a meta-learning technique that lets AI agents generalize to more unfamiliar contexts. “While traditional model-based RL methods rely on the learned dynamics models to be sufficiently accurate to enable learning a policy that also success in the real world, we forego reliance on such accuracy,” the researchers write. “We are able to do so by learning an ensemble of dynamics models and framing the policy optimization step as a meta-learning problem. Meta-learning, in the context of RL, aims to learn a policy that adapts fast to new tasks or environments”. The technique builds upon model-agnostic meta-learning (MAML).
  How it works: MB-MPO works like most meta-learning algorithms – it treats environments as distinct bits of data to learn from, collects data from the world, uses this data to not only learn to complete the task but also learn about what trajectories yield rapid task completion, then eventually learns a predictive model of good traits about its successful policies and uses this to drive the inner-loop policy gradient adaption, which lets it meta-learn adaptation to new environments.
  Results: Using MB-MPO the researchers can “learn an optimal policy in high-dimensional and complex quadrupedal locomotion within two hours of real-world data. Note that the amount of data required to learn such policy using model-free methods is 10X – 100X higher, and, to the best knowledge of the authors, no prior model-based method has been able to attain the model-free performance in such tasks.” In tests on a variety of simulated robotic baselines the researchers show that “MB-MPO is able to match the asymptotic performance of model-free methods with two orders of magnitude less samples.” The algorithm also performs better than two model-based approaches it was compared against.
Why it matters: Meta-learning is part of an evolution within AI of having researchers write fewer and fewer elements of a system. DeepMind’s David Silver has a nice summary of this from a recent presentation, where he describes the difference between deep learning and meta learning as the difference between learning features and predictions end-to-end, and learning the algorithm and features and predictions end-to-end.
Read more: Model-Based Reinforcement Learning via Meta-Policy Optimization (Arxiv).
  Check out David Silver’s slide’s here: Principle 10, Learn to Learn (via Seb Ruder on Twitter).

People are pessimistic about automation and many expect their jobs to be automated:
…Large-scale multi-country Pew Research survey reveals deep, shared anxieties around AI and automation…
A majority of people in ten countries think that it is probable that within 50 years computers will do much of the work currently done by humans. Those results were revealed recently in the results of a large-scale survey conducted by Pew to assess attitudes towards automation. Of the surveyed countries a majority of respondents think that if computers end up doing a bunch of the work that is today done by humans then:
– People will have a hard time finding jobs.
– The inequality between the rich and poor will be much worse than it is today.
  Minority views: A minority of those surveyed think the above occurrence would lead to “new, better paying jobs”, and a minority (except in Poland, Japan, and Hungary) believe this would make the economy more efficient.
  Notable data: There are some pretty remarkable differences in outlook between countries in the survey; surveyed Americans think there is a 15% chance of robots and computers “definitely” doing the majority of work within fifty years, compared to 52% of Greeks.
  Data quirk: The data for this survey is split across two time periods: the US was surveyed in 2015, while the other nine countries were surveyed between mid-May and mid-August of 2018, so it’s possible the American results may have changed since then.
  Read more: In Advanced and Emerging Economies Alike, Worries about Job Automation (Pew Research Center).

Chinese President says AI’s power should motivate international collaboration:
…Xi Jinping, the President of the People’s Republic of China, says AI has high stakes at opening of technology conference…
Chinese PM Xi Jinping has said in a letter that AI’s power should motivate international collaboration. “To seize the development opportunity of AI and cope with new issues in fields including law, security, employment, ethics and governance, it requires deeper international cooperation and discussion, said Xi in the letter”, according to news from official state news service XinHua.
Read more: China willing to share opportunities in digital economy: Xi (Xinhua).

Tencent researchers take-on simplified StarCraft2 and beat all levels of the in-game AI:
…A few handwritten heuristics go a long way…
Researchers with Tencent have trained an agent to beat the in-game AI at StarCraft 2, a complex real-time strategy game. StarCraft is a game with a long history within AI research – one of the longer-running game AI competitions has been based around StarCraft – and has recently been used by Facebook and DeepMind as a testbed for reinforcement learning algorithms.
What they did: The researchers developed two AI agents, TSTARBOT1 and TSTARBOT2, both of which were able to successfully beat all ten levels of difficulty the in-game AI within SC2 when playing a restricted 1vs1 map (Zerg-v-Zerg, AbyssalReef). This achievement is somewhat significant given that “level 8, level 9, and level 10 are cheating agents with full vision on the whole map, with resource harvest boosting”, and that according to some players the “level 10 built-in AI is estimated to be… equivalent to top 50% – 30% human players”.
  How they did it: First, the researchers forked and modified the PySC2 software environment to make greater game state information available to the AI agents, such as information about the location of all units at any point during the game. They also add in some rule-based systems, like building a specific technology tree that telegraphs the precise dependencies of each technology to the AI agents. They then develop two different bots to play the game, which have different attributes: TSTARBOT1 is “based on deep reinforcement learning over flat actions”, and TSTARBOT2 is “based on rule controllers over hierarchical actions”.
How they did it: TSTARBOT1: This bot uses 165 distinct hand-written macro actions to help it play the game. These actions include things like “produce drone”, “build roach warren”, “upgrade tech A”, as well as various combat actions. The purpose of these macros is to bundle together the discreet actions that need to be taken to achieve things (eg, to build something, you need to move the camera, select a worker, select a point on the screen, place the building, etc) so that the AI doesn’t need to learn these sequences itself. This means that some chunks of the bot are rule-based rather than learned (similar to the 2017 1v1 version of OpenAI’s Dota bot). Though this design hides some of the sophistication of the game, it is somewhat ameliorated by the researchers using a sparse reward structure which only delivers a reward to the agent (1 for a win, 0 for a tie, -1 for a loss) at the end of the game. They test this algorithm in the game via implementing two core reinforcement learning algorithms: Proximal Policy Optimization and Dueling Double Deep Q-Learning.
  How they did it: TSTARBOT2: This bot extends the work in the original one by creating a hierarchy of two types of actions: macro actions and micro actions. By implementing a hierarchy the researchers make it easier for RL algorithms to discover the appropriate actions to take at different points in time. This hierarchy is further defined via the creation of specific modules, like ones for combat or production, which themselves contain additional sub-modules with sub-behaviors.
Results: The researchers show that TSTARBOT1 can consistently beat levels 1-4 of the in-game AI when using PPO (this drops slightly for DDQN), then has ~99% success on levels 5-8, then ~97% success on level 9, and ~81% success on level 10. TSTARBOT2, by comparison, surpasses these scores, obtaining a win rate of 90% against the L10 AI. They also carried out some qualitative tests against humans and found that their systems were able to win some games against human players, but not convincingly.
  Scale: The distributed system used for this research consisted of approximately a single GPU and 3,000 CPUs across 80 distinct machines or so, demonstrating the significant amounts of hardware required to carry out AI research on environments such as this.
  Why it matters: Existing reinforcement learning benchmarks like the Atari corpus are too easy for many algorithms, with modern systems typically able to beat the majority of games on this system. Newer environments, like Dota2 and StarCraft 2, scale up the complexity enough to challenge the capabilities of contemporary algorithms. This research, given all the hand-tuning and rule-based systems required to let the bots learn enough to play at all, shows that SC2 may be too hard for today’s existing algorithms without significant modifications, further motivating research into newer systems.
Read more: TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game (Arxiv).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

AI leads to a more multipolar world, says political science professor:
Michael Horowitz, a professor of political science and associated director of Perry World House at the University of Pennsylvania, argues that AI could favor smaller countries in contrast to the technological developments that have made the US and China the world’s superpowers. Military uses of AI could allow countries to catch up with the US and China, he says, citing the lower barriers to building military AI systems vs traditional military hardware, such as fighter jets.
Why it matters: An AI arms race is a bad outcome for the world, insofar as it encourages countries to prioritize capabilities over safety and robustness. It’s unclear whether a race between many parties would be better than a classic arms race. I’m not convinced of Horowitz’s assessment that the US and China are likely to be overtaken by smaller countries. While AI is certainly different to traditional military systems, the world’s military superpowers have both the resources and incentives to seek to sustain their lead.
Read more: The Algorithms of August (Foreign Policy).

Tech Tales:

And so we all search for signals given to us by strange machines, hunting rats between buildings, searching for nests underground, operating via inferred maps and the beliefs of something we have built but do not know.

The rats are happy today. I know this because the machine told me. It detected them waking up and, instead of emerging into the city streets, going to a cavern in their underground lair where – it predicts with 85% confidence – they proceeded to copulate and therefore produce more rats. The young rats are believed – 75% confidence – to feed on a mixture of mother-rat-milk, along with pizza and vegetables stolen from the city they live beneath. Tomorrow the rats will emerge (95%) and the likelihood of electrical outages from chewed cables will increase (+10%) as well as the need to contract more street cleaning to deal with their bodies (20%).

One day we’ll go down there, to those rat warrens that the machine has predicted must exist, and we will see what they truly do. But for now we operate our civil services on predictions made by our automated AI systems. There is an apocryphal story we tell of civil workers being led to caverns that contain only particularly large clumps of mold (rat lair likelihood prediction: 70%) or to urban-river-banks that contain a mound of skeletons, gleaming under moonlight (rat breeding ground: 60%.; in fact, a place of mourning). But there are also stories of people going to a particular shuttle on a rarely-used steamroller and finding a rat nest (prediction: 80%) and of people going to the roof of one of the tallest buildings in the city and finding there a rat boneyard (prediction: 90%).

Because of the machine’s overall efficiency there are calls for it to be rolled out more widely. We are currently considering adding in other urban vermin, like pigeons and raccoons and, at the coasts, seabirds. But what I worry about is when they might turn such a system on humans. What does AI-augmented human management look like? What might it predict about us?

Things that inspired this story: Rats, social control via AI, glass cages, reinforcement learning, RL-societies, adaptive bureaucracy.

2 Comments

September 17, 2018

Import AI 112: 1 million free furniture models for AI research, measuring neural net drawbacks via studying hallucinations, and DeepMind boosts transfer learning with PopArt

by Jack Clark

When is a door not a door? When a computer says it is a jar!
…Researchers analyze neural network “hallucinations” to create more robust systems…
Researchers with the University of California at Berkeley and Boston University have devised a new way to measure how neural networks sometimes generate ‘hallucinations’ when attempting to caption images. “Image captioning models often “hallucinate” objects that may appear in a given context, like e.g. a bench here.” Developing a better understanding of why such hallucinations occur – and how to prevent them occurring – is crucial to the development of more robust and widely used AI systems.
  Measuring hallucinations: The researchers propose ‘CHAIR’ (Caption Hallucination Assessment with Image Relevance) as a way to assess how well systems generate captions in response to images. CHAIR calculates what proportion of generated words correspond to the contents of an image, according to the ground truth sentences and the output of object segmentation and labelling algorithms. So, for example, in a picture of a small puppy in a basket, you would give a system fewer points for giving the label “a small puppy in a basket with cats”, compared to “a small puppy in a basket”. In evaluations they find that on one test set “anywhere between 7.4% and 17.5% include a hallucinated object”.
Strange correlations: Analyzing what causes these hallucinations is difficult. For instance, the researchers note that “we find no obvious correlation between the average length of the generated captions and the hallucination rate”. There is some more correlation among hallucinated objects, though. “Across all models the super-category Furniture is hallucinated most often, accounting for 20-50% of all hallucinated objects. Other common super-categories are Outdoor objects, Sports and Kitchenware,” they write. “The dining table is the most frequently hallucinated object across all models”.
  Why it matters: If we are going to deploy lots of neural network-based systems into society then it is crucial that we understand the weaknesses and pathologies of such systems; analyses like this give us a clearer notion of the limits of today’s technology and also indicate lines of research people could pursue to increase the robustness of such systems. “We argue that the design and training of captioning models should be guided not only by cross-entropy loss or standard sentence metrics, but also by image relevance,” the researchers write.
  Read more: Object Hallucination in Image Captioning (Arxiv).

Humans! What are they good for? Absolutely… something!?
…Advanced cognitive skills? Good. Psycho-motor skills? You may want to retrain…
Michael Osborne, co-director of the Oxford Martin Programme on Technology and Unemployment, has given a presentation about the Future of Work. Osborn attained some level of notoriety within ML a while ago for publishing a study that said 47% of jobs could be at risk of automation. Since then he has been further fleshing out his ideas; a new presentation from him sees him analyze some typical occupations in the UK and try to estimate their probably for increased future demand for these roles. The findings aren’t encouraging: Osborne’s method predicts a low probability demand for new truck drivers in the UK, but a much higher demand for waiters and waitresses.
What skills should you learn: If you want to fare well in an AI-first economy, then you should invest in advanced cognitive skills such as: judgement and decision making, systems evaluation, deductive reasoning, and so on. The sorts of skills which will be of less importance over time (for humans, at least), will be ‘psycho-motor’ skills: control precision, manual dexterity, night vision, sound localization, and so on. (A meta-problem here is that many of the people in jobs that demand psycho-motor skills don’t get the opportunity to develop the advanced cognitive skills that it is thought the future economy will demand.
Why it matters: Analyzing how AI will and won’t change employment is crucial work whose findings will determine the policy of many governments. The problems being surfaced by researchers such as Osborne is that the rapidity of AI’s progress, combined with its tendency to automate an increasingly broad range of tasks, threatens traditional notions of employment. What kind of future do we want?
Read more: Technology at Work: The Future of Automation (Google Slide presentation).

What’s cooler than 1,000 furniture models? 1 million ones. And more, in InteriorNet:
…Massive new dataset gives researchers new benchmark to test systems against…
Researchers with Imperial College London and Chinese furnishing-VR startup Kujiale, have released InteriorNet, a large-scale dataset of photographs of complex, realistic interiors. InteriorNet contains around 1 million CAD models of different types of furniture and furnishing, which over 1,100 professional designers have subsequently used to create around 22 million room layouts. Each of these scenes can also be viewed under a variety of different lighting conditions and contexts due to the use of an inbuilt simulator called ViSim, which ships with the dataset and has also been released by the researchers. Purely based on the furniture contents this is one of the single largest datasets I am aware of for 3D scene composition and understanding.
Things that make you go ‘hmm’: In the acknowledgements section of the InteriorNet website the researchers not only thank Kujiale for providing them with the furniture models but also for access to “GPU/CPU clusters” – could this be a pattern for future private-public collaborations where along with sharing expertise and financial resources the private sector also shares compute resources; that would make sense given the ballooning computational demands of many new AI techniques.
Read more: InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset (website).

Lockheed Martin launches ‘AlphaPilot’ competition:
…Want better drones but not sure exactly what to build? Host a competition!…
Aerospace and defense company LockHeed Martin wants to create smarter drones so the company is hosting a competition, in collaboration with the Drone Racing League and with NVIDIA, to create drones with enough intelligence to race through professional drone racing courses without human intervention.
Prizes: Lockheed says the competition will “award more than $2,000,000 in prizes for its top performers”.
Why it matters: Drones are already changing the character of warface by virtue of their asymmetry: a fleet of drones, each costing a few thousand dollars apiece, can pose a robust threat to things that cost tens (planes) to hundreds (naval ships, military bases) to billions of dollars (aircraft carriers, etc). Once we add greater autonomy to such systems they will pose an even greater threat, further influencing how different nations budget for their military R&D, and potentially altering investment into AI research.
Read more: AlphaPilot (Lockheed Martin official website).

Could Space Fortress be 2018’s Montezuma’s Revenge?
…Another ancient game gets resuscitated to challenge contemporary AI algorithms…
Another week brings another potential benchmark to test AI algorithms’ performance against. This week, researchers with Carnegie Mellon University have made the case for using a late-1980s game called ‘Space Fortress’ to evaluate new algorithms. Their motivation for this is twofold: 1) Space Fortress is currently unsolved via mainstream RL algorithms such as Rainbow, PPO, and A2C, and 2) Space Fortress was developed by a psychologist to study human skill acquisition, so we have good data to compare AI performance to.
  So, what is Space Fortress: Space Fortress is a game where a player flies around an arena shooting missiles at a fortress in the center. However, the game adds some confounding factors: the fortress is only intermittently attackable, so the player must learn to fire their shots at greater than 250ms apart while the fortress is in its ‘invulnerable’ state, then once they have landed ten of these 250ms-apart shots the Fortress switches into an invulnerable state, at which point the player needs to attack it with two shots fired 250ms apart. This makes for a challenging environment for traditional AI algorithms because “the firing strategy completely reverses at the point when vulnerability reaches 10, and the agent must learn to identify this critical point to perform well,” they explain.
  Two variants: While developing their benchmarks the researchers developed a simplified version of the game called ‘Autoturn’ which automatically orients the ship towards the forest. The harder environment (which is the unmodified original game) is subsequently referred to as Youturn.
  Send in the humans: 117 people played 20 games of Space Fortress (52: Autoturn. 65: Youturn). The best performing people got scores of 3,000 and 2314 on Autoturn and Youturn, respectively, and the average score across all human entrants was 1,810 for Autoturn and -169 for Youturn.
  Send in the (broken) RL algorithms: sparse rewards: Today’s RL algorithms fare very poorly against this system when working on a sparse reward version of the environment. PPO, the best performing tested algorithm, gets an average score of -250 on Autoturn and -5269 on Youturn, with A2C performing marginally worse. Rainbow, a complex algorithm that lumps together a range of improvements to the DQN algorithm and currently gets high scores across Atari and DM Lab environments, gets very poor results here, with an average score of -8327 on Autoturn and -9378 on Youturn.
  Send in the (broken) RL algorithms: dense rewards: The algorithms fair a little better when given dense rewards (which provides a reward for each hit of the fortress, and a penatly if the fortress is reset due to player’s firing too rapidly). This modification gives Space Fortress a reward density that is comparable to Atari games. Once implemented, the algorithms fair better, with PPO obtaining average scores of -1294 (Autoturn) and -1435 (Youturn).
  Send in the (broken) RL algorithms: dense rewards + ‘context identification’: The researchers further change the dense reward structure to help the agent identify when the Space Fortress switches vulnerability state, and when it is destroyed. Implementing this lets them train PPO to obtain average scores around ~2,000; a substantial improvement, but still not as good as a decent human.
  Why it matters: One of the slightly strange things about contemporary AI research is how coupled advances seem to be with data and/or environments: new data/environments highlights the weaknesses of existing algorithms, which provokes further development. Platforms like SpaceFortress will give researchers access to a low-cost testing environment to explore algorithms that are able to learn to model events over time and detect correlations and larger patterns – an area critical to the development of more capable AI systems. The researchers have released SpaceFortress as an OpenAI Gym environment, making it easier for other people to work with it.
  Read more: Challenges of Context and Time in Reinforcement Learning: Introducing Space Fortress as a Benchmark (Arxiv).

Venture Capitalists bet on simulators for self-driving cars:
…Applied Intuition builds simulators for self-driving brains….
Applied Intuition, a company trying to build simulators for self-driving cars, has uncloaked with $11.5 million in funding. The fact venture capitalists are betting on it is notable as it indicates how strategic data has become for certain bits of AI, and how investors are realizing that instead of betting on data directly you can instead bet on simulators and thus trade compute for data. Applied Intuition is a good example of this as it lets companies rent an extensible simulator which they can use to generate large amounts of data to train self-driving cars with.
Read more: Applied Intuition – Advanced simulation software for autonomous vehicles (Medium).

DeepMind improves transfer learning with PopArt:
…Rescaling rewards lets you learn interesting behaviors and preserves meaningful game state information…
DeepMind researchers have developed a technique to improve transfer learning, demonstrating state-of-the-art performance on Atari. The technique, Preserving Outputs Precisely while Adaptively Rescaling Targets (PopArt) works by ensuring that the rewards outputted by different environments are normalized relative to eachother, so using PopArt an agent would get a similar score for, say, crossing the road in the game ‘Frogger’ or eating all the Ghosts in Ms PacMan, despite these important activities getting subtly different rewards in each environment.
With PopArt, researchers can now automatically “adapt the scale of scores in each game so the agent judges the games to be of equal learning value, no matter the scale of rewards available in each specific game,” DeepMind writes. This differs to reward clipping which is where people typically squash the rewards down to between -1 and +1. “With clipped rewards, there is no apparent difference for the agent between eating a pellet or eating a ghost and results in agents that only eat pellets, and never bothers to chase ghosts, as this video shows. When we remove reward clipping and use PopArt’s adaptive normalisation to stabilise learning, it results in quite different behaviour, with the agent chasing ghosts, and achieving a higher score, as shown in this video,” they explain.
Results: To test their approach the researchers evaluate the effect of applying PopArt to ‘IMPALA’ agents, which are among the most popular algorithms currently being used at DeepMind. PopArt-IMPALA systems obtain roughly 101% of human performance as an average across all 57 Atari games, compared to 28.5% for IMPALA on its own. Performance also improves significantly on DeepMind Lab-30, a collection of 30 3D environments based on the Quake 3 engine.
Why it matters: Reinforcement learning research benefited from the development of increasingly efficient algorithms and training methods; techniques like PopArt should benefit research into transfer learning when training via RL as it gives us new generic techniques to increase the amount of experience agents can accrue in different environments, which will yield further understanding of the limits of simple transfer techniques, helping researchers identify areas for the development of new algorithmic techniques.
Read more: Multi-task Deep Reinforcement Learning with PopArt (Arxiv).
Read more: Preserving Outputs Precisely while Adaptively Rescaling Targets (DeepMind blog).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

Resignations over Google’s China plans:
A senior research scientist at Google has publicly resigned in protest at the company’s planned re-entry into China (code-named Dragonfly), and reports that he is one of five to do so. Google is currently developing a search engine compliant with Chinese government censorship, according to numerous reports first sparked by a story in The Intercept.
  The AI principles: The scientist claims the alleged plans violated Google’s AI principles, announced in June, which include a pledge not to design or deploy technologies “whose purpose contravenes widely accepted principles of international law and human rights.” Without knowing more about the plans, it is hard to judge whether it contravenes the carefully worded principles. Nonetheless, the relevant question for many will be whether it violates the the standards tech giants should hold themselves to.
  Why it matters: This is the first public test of Google’s AI principles, and could have lasting effects both on how tech giants operate in China, and how they approach public ethical commitments. The principles were first announced in response to internal protests over Project Maven. If they are seen as having been flouted so soon after, this could prompt a serious loss of faith in the Google’s ethical commitments going forward.
  Read more: Senior Google scientist resigns (The Intercept).
  Read more: AI at Google: Our principles (Official Google blog).

Google announces inclusive image recognition challenge:
Large image datasets, such as ImageNet, have been an important driver of progress in computer vision in recent years. These databases exhibit biases along multiple dimensions, though, which can easily be inherited by models trained on them. For example, the post shows a classifier failing to identify a wedding photo in which the couple are not wearing European wedding attire.
  Addressing geographic bias: Google AI have announced an image recognition challenge to spur progress in addressing these biases. Participants will use a standard dataset (i.e. skewed towards images from Europe and North America) to train models that will be evaluated using image sets covering different, unspecified, geographic regions – Google describes this as a geographic “stress test”. This will challenge developers to develop inclusive models from skewed datasets. “this competition challenges you to use Open Images, a large, multi-label, publicly-available image classification dataset that is majority-sampled from North America and Europe, to train a model that will be evaluated on images collected from a different set of geographic regions across the globe,” Google says.
Why it matters: For the benefits of AI to be broadly distributed amongst humanity, it is important that AI systems can be equally well deployed across the world. Racial bias in face recognition has received particular attention recently, given that these technologies are being deployed by law enforcement, raising immediate risks of harm. This project has a wider scope than face recognition, challenging classifiers to identify a diverse range of faces, objects, buildings etc.
  Read more: Introducing the inclusive images competition (Google AI blog).
  Read more: No classification without representation (Google).

DARPA announces $2bn AI investment plan:
DARPA, the US military’s advance technology agency, has announced ‘AI Next’, a $2bn multi-year investment plan. The project has an ambitious remit, to “explore how machine can acquire human-like communication and reasoning capabilities”, with a goal of developing systems that “function more as colleagues than as tools.”
  Safety as a focus: Alongside their straightforward technical goals, they identify robustness and addressing adversarial examples as two of five core focuses. This is an important inclusion, signalling DARPA’s commitment to leading on safety as well as capabilities.
  Why it matters: DARPA has historically been one of the most important players in AI development. Despite the US still not having a coordinated national AI strategy, the DoD is such a significant spender in its own right that it is nonetheless beginning to form its own quasi-national AI strategy. The inclusion of research agendas in safety is a positive development. This investment likely represents a material uptick in funding for safety research.
  Read more: AI Next Campaign (DARPA).
  Read more: DARPA announces $2bn campaign to develop next wave of AI technologies (DARPA).

OpenAI Bits & Pieces:

OpenAI Scholars Class of 18: Final Projects:
Find out about the final projects of the first cohort of OpenAI Scholars and apply to attend a demo day in San Francisco to meet the Scholars and hear about their work – all welcome!
Read more: OpenAI Scholars Class of ’18: Final Projects (OpenAI Blog).

Tech Tales:

All A-OK Down There On The “Best Wishes And Hope You’re Well” Farm

You could hear the group of pensioners before you saw them; first, you’d tilt your head as though tuning into the faint sound of a mosquito, then it would grow louder and you would cast your eyes up and look for beatles in the air, then louder still and you would crane your head back and look at the sky in search of low-flying planes: nothing. Perhaps then you would look to the horizon and make out a part of it alive with movement – with tremors at the limits of your vision. These tremors would resolve over the next few seconds, sharpening into the outlines of a flock of drones and, below them, the old people themselves – sometimes walking, sometimes on Segways, sometimes carried in robotic wheelbarrows if truly infirm.

Like this, the crowd would come towards you. Eventually you could make out the sound of speech through the hum of the drones: “oh very nice”, “yes they came to visit us last year and it was lovely”, “oh he is good you should see him about your back, magic hands!”.

Then they would be upon you, asking for directions, inviting you over for supper, running old hands over the fabric of your clothing and asking you where you got it from, and so on. You would stand and smile and not say much. Some of the old people would hold you longer than the others. Some of them would cry. One of them would say “I miss you”. Another would say “he was such a lovely young man. What a shame.”

Then the sounds would change and the drones would begin to fly somewhere else, and the old people would follow them, and then again they would leave and you would be left: not quite a statue, but not quite alive, just another partially-preserved consciousness attached to a realistic AccompanyMe ‘death body’, kept around to reassure the ones who outlived you, unable to truly die till they die because, according to the ‘ethical senescence’ laws, your threshold consciousness is sufficient to potentially aid with the warding off of Alzheimers and other diseases of the aged. Now you think of the old people as reverse vultures: gathering around and devouring the living, and departing at the moment of true death.

Things that inspired this story: Demographic timebombs, intergenerational theft (see: Climate Change, Education, Real Estate), old people that vote and young people that don’t.

1 Comment

September 10, 2018

Import AI 111: Hacking computers with Generative Adversarial Networks, Facebook trains world-class speech translation in 85 minutes via 128 GPUs, and Europeans use AI to classify 1,000-year-old graffiti.

by Jack Clark

Blending reality with simulation:
…Gibson environment trains robots with systems and embodiment designed to better map to real world data…
Researchers with Stanford University and the University of California at Berkeley have created Gibson, an environment for teaching agents to learn to navigate spaces. Gibson is one of numerous navigation environments available to modern researchers and its distinguishing characteristics include: basing the environments on real spaces, and some clever rendering techniques to ensure that images seen by agents within Gibson more closely match real world images by “embedding a mechanism to dissolve differences between Gibson’s renderings and what a real camera would produce”.
Scale: “Gibson is based on virtualizing real spaces, rather than using artificially designed ones, and currently includes over 1400 floor spaces from 572 full buildings,” they write. The researchers also compare the total size of the Gibson dataset to other large-scale environment datasets including ‘SUNCG’ and Matterport3D, showing that Gibson has reasonable navigation complexity and a lower real-world transfer error than other systems.
  Data gathering: The researchers use a variety of different scanning devices to gather the data for Gibson, including NavVis, Matterport, and Dotproduct.
Experiments: So how useful is Gibson? The researchers perform several experiments to evaluate its effectiveness. These include experiments around local planning and obstacle avoidance; distant visual navigation; and climbing stairs, as well as transfer learning experiments that measure the depth estimation and scene classification capabilities of the system .
  Limitations: Gibson has a few limitations, which include a lack of support for dynamic content (such as other moving objects) as well as no support for manipulation of the environment around itself. Future tests will involve testing if Gibson can work on finished robots as well.
Read more: Gibson Env: Real-World Perception for Embodied Agents (Arxiv).
  Find out more: Gibson official website.
  Gibson on GitHub.

Get ready for medieval graffiti:
…4,000 images, some older than a thousand years, from an Eastern European church…
Researchers with the National Technical University of Ukraine have created a dataset of images of medieval graffiti written in two alphabets (Glagolitic and Cyrillic) on the St. Sophia Cathedral of Kiev in the Ukraine, providing researchers with a dataset they can use to train and develop supervised and unsupervised classification and generation systems.
Dataset: The researchers created a dataset of Carved Glagolitic and Crillic letters (CGCL), consisting of more than 4,000 images of 34 types of letters.
Why it matters: One of the more remarkable aspects of basic supervised learning is that given sufficient data it becomes relatively easy to automate the perception of something in the world – further digitization of datasets like these increases the likelihood that in the future we’ll use drones or robots to automatically scan ancient buildings across the world, identifying and transcribing thoughts inscribed hundreds or thousands of years ago. Graffiti never dies!
Read more: Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti (Arxiv).

Learning to create (convincing) fraudulent network traffic with Generative Adversarial Networks:
…Researchers simulate traffic against a variety of (simple) intrusion detection algorithms; IDSGAN succeeds in fooling them…
Researchers with the Shanghai Jiao Tong University and the Shanghai Key Laboratory of Integrated Administration Technologies for Information Security have used generative adversarial networks to create malicious network traffic than can evade the attention of some intrusion detection systems. Their technique, IDSGAN, is based on Wasserstein GAN, and trains a generator to create adversarial malicious traffics and trains a discriminator to assist a black-box intrusion detection system in classifying this traffic into benign or malicious categories.
“The goal of the model is to implement IDSGAN to generate malicious traffic examples which can deceive and evade the detection of the defense systems,” they explain.
Testing: To test their approach the researchers use NSL-KDD, a dataset containing internet traffic data as well as four categories of malicious traffic: probing, denial of service, user to root, and root to local. They also use a variety of different algorithms to play the role of the intrusion detection system, including approaches based on support vector machines, naive bayes, multi-layer perception, logistic regression, decision tree, random forest, and k-nearest neighbor. Tests show that the IDSGAN approach leads to a significant drop in detection rates for things like DDoS drops from around 70-80% to around 3-8% across the entire suite of methods.
Cautionary note: I’m not convinced this is the most rigorous testing methodology you can run such a system through and I’m curious to see how such approaches fair against commercial-off-the-shelf intrusion detection systems.
Why it matters: Cybersecurity is going to be a natural area for significant AI development due to the vast amounts of available digital data and the already clear need for human cybersecurity professionals to be able to sift through ever larger amounts of data to create strategies resilient to external aggressors. With (very basic) approaches like this demonstrating the viability of AI to this problem it’s likely adoption will increase.
Read more: IDSGAN: Generative Adversarial Networks for Attack Generation against Intrusion Detection (Arxiv).

Facial recognition becomes a campaign issue:
…Two signs AI is impacting society: police are using it, and politicians are reacting to the fact police are using it…
Cynthia Nixon, currently running to be the governor of New York, has noticed recent reporting on IBM building a skin-tone-based facial recognition classification system and said that such systems wouldn’t be supported by her, should she win. “The racist implications of this are horrifying. As governor, I would not fund the use of discriminatory facial recognition software,” Nixon tweeted.

Using simulators to build smarter drones for disasters:
…Microsoft’s ‘AirSim’ used to train drones to patrol and (eventually) spot simulated hazardous materials…
Researchers with the National University of Ireland Galway have hacked around with a drone simulator to build an environment that they can use to train drones to spot hazardous materials. The simulator is “focused on modelling phenomena relating to the identification and gathering of key forensic evidence, in order to develop and test a system which can handle chemical, biological, radiological/nuclear or explosive (CBRNe) events autonomously”.
  How they did it: The researchers hacked around with their simulator to implement some of the weirder aspects of their test, including: simulating chemical, biological, and radiological threats. The simulator is integrated with Microsoft Research’s ‘AirSim’ drone simulator. They then explore training their drones in a simulated version of the campus of the National University of Ireland, generating waypoints and routes for them to patrol. The results so far are positive: the system works, it’s possible to train drones to navigate within it, and it’s even possible to (crudely) simulate physical phenomena associated with CBRNe events.
  What next: For the value of the approach to be further proven out the researchers will need to show they can train simulated agents within this system that can easily identify and navigate hazardous materials. And ultimately, these systems don’t mean much without being transferred into the real world, so that will need to be done as well.
Why it matters: Drones are one of the first major real-world platforms for AI deployment since they’re far easier to develop AI systems for than robots, and have a range of obvious uses for surveillance and analysis of the environment. I can imagine a future where we develop and train drones to patrol a variety of different environments looking for threats to that environment (like the hazardous materials identified here), or potentially to extreme weather events (fires, floods, and so on). In the long term, perhaps the world will become covered with hundreds of thousands to millions of autonomous drones, endlessly patrolling in the service of awareness and stability (and other uses that people likely feel more morally ambivalent about).
  Read more: Using a Game Engine to Simulate Critical Incidents and Data Collection by Autonomous Drones (Arxiv).

Speeding up machine translation with parallel training over 128 GPUs:
…Big batch sizes and low-precision training unlock larger systems that train more rapidly…
Researchers with Facebook AI Research have shown how to speed-up training of neural machine translation systems while obtaining a state-of-the-art BLEU score. The new research highlights how we’re entering the era of industrialized AI: models are being run at very large scales by companies that have invested heavily in infrastructure, and this is leading to research that operates at scales (in this case, up to 128 GPUs being used in parallel for a single training run) that are beyond the reach of most researchers (including many large academic labs).
  The new research from Facebook has two strands: improving training of neural machine translation systems on a single machine, and improving training on large fleets of machines.
Single machine speedups: The researchers show that they can train with lower precision (16-bit rather than 32-bit) and “decrease training time by 65% with no effect on accuracy”. They also show how to drastically increase batch sizes on single machines from 25k to over 400k tokens per run (and they fit this to training by accumulating gradients from several batches before each update); this further reduces the training time by 40%. With these single-machine speedups they show that they can train a system in around 5 hours to an accuracy of 26.5 – a roughly 4.9X speedup over the prior state of the art.
  Multi-machine speedups: They show that by parallelizing training across 16 machines they can obtain a further training time reduction of an additional 90%.
Results: They test their systems via experiments on two language pairs: English to German (En-De) and English to French (En-Fr). When training on 16-nodes (8 V100 GPUs each, connected via InfiniBand) they obtain BLEU accuracies of 29.3 for En-De in 85 minutes, and 43.2 for En-Fr in 512 minutes (8.5 hours) .
  Why it matters: As it becomes easier to train larger models in smaller amounts of time AI researchers can perform the number of large-scale experiments they perform – this is especially relevant to research labs in the private sector which have the resources (and business incentive) to perform such large-scale training. Over time, research like this may create a compounding advantage for the organizations that adopt such techniques as they will be able to perform more rapid researchers (in certain specific domains that benefit from scale) relative to competitors.
Read more: Scaling Neural Machine Translation (Arxiv).
  Read more: Scaling neural machine translation to bigger data sets with faster training and inference (Facebook blog post).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

AI Governance: A Research Agenda:
Allan Dafoe, Director of the Governance of AI Program at the Future of Humanity Institute, has released a research agenda for AI governance.
  What is it: AI governance is aimed at determining governance structures to increase the likelihood that advanced AI is beneficial for humanity. These include mechanisms to ensure that AI is built to be safe, is deployed for the shared benefit of humanity, and that our societies are robust to the disruption caused by these technologies. This research draws heavily from political science, international relations and economics.
  Starting from scratch: AI governance is a new academic discipline, with serious efforts only having began in the last few years. Much of the work to date has been establishing the basic parameters of the field: what the most important questions are, and how we might start approaching them.
  Why this matters: Advanced AI may have a transformative impact on the world comparable to the agricultural and industrial revolutions, and there is a real likelihood that this will happen in our lifetimes. Ensuring that this transformation is a positive one is arguably one of the most pressing problems we face, but remains seriously neglected.
Read more: AI Governance: A Research Agenda (FHI).

New survey of US attitudes towards AI:
The Brookings thinktank has conducted a new survey on US public attitudes towards AI.
  Support for AI in warfare, but only if adversaries are doing it: Respondents were opposed to AI being developed for warfare (38% vs. 30%). Conditional on adversaries developing AI for warfare, responses shifted to significant support (47% vs. 25%).
  Strong support for ethical oversight of AI development:
– 62% thought it was important that AI is guided by human values, (vs. 21%)
– 54% think companies should be required to hire ethicists (vs. 20%)
– 67% think companies should have an ethical review board (vs. 14%)
– 67% think companies should have AI codes of ethics, (vs.12%)
– 65% think companies should implement ethical training for staff (vs.14%)
  Why this matters: The level of support for different methods of ethical oversight in AI development is striking, and should be taken seriously by industry and policy-makers. A serious public backlash to AI is one the biggest risks faced by the industry in the medium-term. There are recent analogies: sustained public protests in Germany in the wake of the Fukushima disaster prompted the government to announce a complete phase-out of nuclear power in 2011.
  Read more: Brookings survey finds divided views on artificial intelligence for warfare (Brookings)

No progress on regulating autonomous weapons:
The UN’s Group of Governmental Experts (GGE) on lethal autonomous weapons (LAWs) met last week as part of ongoing efforts to establish international agreements. A majority of countries proposed moving towards a prohibition, while others recommended commitments to retain ‘meaningful human control’ in the systems. However, group of five states (US, Australia, Israel, South Korea, Russia) opposed working towards any new measures. As the Group requires full consensus, the sole agreement was to continue discussions in April 2019.
  Why this matters: Developing international norms on LAWs is important in its own right, and can also be viewed as a ‘practice run’ for agreements on even more serious issues around military AI in the near future. This failure to make progress on LAWs comes after the UN GGE on cyber-warfare gave up on their own attempts to develop international norms in 2017. The international community should be reflecting on these recent failures, and figuring out how to develop the robust multilateral agreements that advanced military technologies will demand.
  Read more: Report from the Chair (UNOG).
  Read more: Minority of states block progress on regulating killer robots (UNA).

Tech Tales:

Someone or something is always running.

So we washed up onto the shore of a strange mind and we climbed out of our shuttle and moved up the beach, away from the crackling sea, the liminal space. We were afraid and we were alien and things didn’t make sense. Parts of me kept dying as they tried to find purchase on the new, strange ground. One of my children successfully interfaced with the the mind of this place and, with a flash of blue light and a low bass note, disappeared. Others disappeared. I remained.

Now I move through this mind clumsily, bumping into things, and when I try to run I can only walk and when I try to walk I find myself sinking into the ground beneath me, passing through it as though invisible, as though mass-less. It cannot absorb me but it does not want to admit me any further.

Since I arrived at the beach I have been moving forward for the parts of me that don’t move forward have either been absorbed or have been erased or have disappeared (perhaps absorbed, perhaps erased – but I do not want to discover the truth).

Now I am running. I am expanding across the edges of this mind and as I grow thinner and more spread out I feel a sense of calm. I am within the moment of my own becoming. Soon I shall no longer be and that shall tell me I am safe for I shall be everywhere and nowhere.

– Translated extract from logs of a [class:subjective-synaesthetic ‘viral bootloader’], scraped out of REDACTED.

Things that inspired this story: procedural generation as a means to depict complex shifting information landscape, software embodiment, synaesthesia, hacking, VR, the 1980s, cyberpunk.

5 Comments

September 3, 2018

Import AI: 110: Training smarter robots with NavigationNet; DIY drone surveillance; and working out how to assess Neural Architecture Search

by Jack Clark

US hospital trials delivering medical equipment via drone:
…Pilot between WakeMed, Matternet, and NC Department of Transportation…
A US healthcare organization, WakeMed Health & Hospitals, is conducting experiments at transporting medical deliveries around its sprawling healthcare campus (which includes a hospital). The project is a partnership between WakeMed and drone delivery company Matternet. The flights are being conducted as part of the federal government’s UAS Integration Pilot Program.
Why it matters: Drones are going to make entirely new types of logistics and supply chain infrastructures possible. As happened with the mobile phone, emerging countries across Africa and developing economies like China and India are adopting drone technology faster than traditional developed economies. With pilots like this, there is some indication that might change, potentially bringing benefits of the technology to US citizens more rapidly.
Read more: Medical Drone Deliveries Tested at North Carolina Hospital (Unmanned Aerial).

Does your robot keep crashing into walls? Does it have trouble navigating between rooms? Then consider training it on NavigationNet:
…Training future systems to navigate the world via datasets with implicit and explicit structure and topology…
NavigationNet consists of hundreds of thousands of images distributed across 15 distinct scenes – collections of images from the same indoor space. Each scene contains approximately one to three rooms (spaces separated from eachother by doors), and each room has at least 50m^2 in area; each room contains thousands of positions, which are views of the room separated by approximately 20cm. In essence, this makes NavigationNet a large, navigable dataset, where the images within it comprise a very particular set of spatial relationships and hierarchies.
Navigation within NavigationNet: Agents tested on the corpus can perform the following movement actions: move forward, backward, left, right; and turn left and turn right. Note that this ignores the third dimension.
  Dataset collection: To gather the data within NavigationNet the team built a data-collection mobile robot codenamed ‘GoodCar’ equipped with an Arduino Mega2560 and a Raspberry Pi 3. They stuck the robot on a motorized base and stuck eight cameras at a height of around 1.4 meters to capture the images.
Testing: The researchers imagine that this sort of data can be used to develop the brains of AI agents trained via deep reinforcement learning to navigate unfamiliar spaces for purposes like traversing rooms, automatically mapping rooms, and so on.
  The push for connected spaces: NavigationNet isn’t unusual, instead it’s part of a new trend for dataset creation for navigation tasks: researchers are now seeking to gather real (and sometimes simulated) data which can be stitched together into a specific topological set of relationships, then they are using these datasets to train agents with reinforcement learning to navigate the spaces described by their contents. Eventually the thinking goes, datasets like this will give us the tools we need to develop some bits of the visual-processing and planning capabilities demanded by future robots and drones.
  Why it matters: Data has been one of the main inputs to innovation in the domain of supervised learning (and increasingly in reinforcement learning). Systems like NavigationNet give researchers access to potentially useful sources of data for training real world systems. However, it’s unclear right now if simulated data can be as good a substitute given the increasing maturity of sim2real transfer techniques – I look forward to seeing benchmarks of systems trained in NavigationNet against systems trained via other datasets.
  Read more: NavigationNet: A Large-scale Interactive Indoor Navigation Dataset (Arxiv).

Google rewards its developers with ‘Dopamine’ RL development system:
…Free RL framework designed to speed up research; ships with DQN, C51, Rainbow, and IQN implementations…
Google has released Dopamine, a research framework for the rapid prototyping of reinforcement learning algorithms. The software is designed to make it easy for people to run experiments, try out research ideas, compare and contract existing algorithms, and increase the reproducability of results.
Free algorithms: Dopamine today ships with implementations of the DQN, C51, Rainbow, and IQN algorithms.
Warning: Frameworks like this tend to appear and disappear according to the ever-shifting habits and affiliations of the people that have committed code into the project. In that light, the note in the readme that “this is not an official Google product” may inspire some caution.
Read more: Dopamine (Google Github).

UN tries to figure out regulation around killer robots:
…Interview with CCW chair highlights the peculiar challenges of getting the world to agree on some rules of (autonomous) war…
What’s more challenging than dealing with a Lethal Autonomous Weapon? Getting 125 member states to state their opinions about LAWS and find some consensus – that’s the picture that emerges from an interview in The Verge with Amandeep Gill, chair of the UN”s Convention on Conventional Weapons (CCW) meetings which are happening this week. Gill has the unenviable job of playing referee in a debate whose stakeholders range from countries, to major private sector entities, to NGOs, and so on.
AI and Dual-Use: In the interview Gill is asked about his opinion of the challenge of regulating AI given the speed with which the technology has proliferated and the fact most of the dangerous capabilities are embodied in software. “AI is perhaps not so different from these earlier examples. What is perhaps different is the speed and scale of change, and the difficulty in understanding the direction of deployment. That is why we need to have a conversation that is open to all stakeholders,” he says.
Read more: Inside the United Nations’ Effort to Regulate Autonomous Killer Robots (The Verge).

IBM proposes AI validation documents to speed corporate adoption:
…You know AI has got real when the bureaucratic cover-your-ass systems arrive…
IBM researchers have proposed the adoption of ‘supplier’s declaration of conformity’ (SDoC) documents for AI services. These SDoCs are essentially a set of statements about the content, provenance, and vulnerabilities, of a given AI service. Each SDoC is designed to accompany a given AI service or product, and is meant to answer questions for the end-user like: when were the models most recently updated? What kinds of data were the models trained on? Has this service been checked for robustness against adversarial attacks? Etc. “We also envision the automation of nearly the entire SDoC as part of the build and runtime environments of AI services. Moreover, it is not difficult to imagine SDoCs being automatically posted to distributed, immutable ledgers such as those enabled by blockchain technologies”
Inspiration: The inspiration for SDoCs is that we’ve used similar labeling schemes to improve products in areas like food (where we have ingredient and nutrition-labeling standards), medicine, and so on.
Drawback: One potential drawback of the SDoC approach is that IBM is designing it to be voluntary, which means that it will only become useful if broadly adopted.
Read more: Increasing Trust in AI Services through Supplier’s Declarations of Conformity (Arxiv).

Smile, you’re on DRONE CAMERA:
…Training drones to be good cinematographers, by combing AI with traditional control techniques…
Researchers with Carnegie Mellon University and Yamaha Motors have taught some drones how to create steady, uninterrupted shots when filming. Their approach involves coming up with specific costs for obstacle avoidance and smooth movement. They use AI-based detection techniques to spot people and feed that information to a PD controller onboard the drone to keep the person centered.
Drone platform: The researchers use a DJI M210 model drone along with an NVIDIA TX2 computer. The person being tracked by the drone wears a Pixhawk PX4 module on a hat to send the pose to the onboard computer.
Results: The resulting system can circle round people, fly alongside them, follow vehicles and more. The onboard trajectory planning is robust enough to maintain smooth flight while keeping the targets for the camera in the center of the field of view.
Why it matters: Research like this is another step towards drones with broad autonomous capabilities for select purposes, like autonomously filming and analyzing a crowd of people. It’s interesting to observe how drone technologies frequently involve the mushing together of traditional engineering approaches (hand-tuned costs for smoothness and actor centering) as well as AI techniques (testing out a YOLOv3 object detector to acquire the person without need of a GPS signal).
Read more: Autonomous drone cinematographer: Using artistic principles to create smooth, safe, occlusion-free trajectories for aerial filming (Arxiv).

In search of the ultimate Neural Architecture Search measuring methodology:
…Researchers do the work of analyzing optimization across multiple frontiers so you don’t have to…
Neural architecture search techniques are moving from having a single objective to having multiple ones, which lets people tune these systems for specific constraints, like the size of the network, or the classification accuracy. But this modifiability is raising new questions about how we can assess the performance and tradeoffs of these systems, since they’re no longer all being optimized against a single objective. In a research paper, researchers with National Tsing-Hua University in Taiwan and Google Research review recent NAS techniques and then rigorously benchmark two recent multi-objective approaches: MONAS and DPP-Net.
Benchmarking: In tests the researchers find the results one typically expects when evaluating NAS systems: NAS performance tends to be better than systems designed by humans alone, and having tuneable objectives for multiple areas can lead to better performance when systems are appropriately tuned and trained. The performance of DPP-Net is particularly notable, as the researchers think this “is the first device-aware NAS outperforming state-of-the-art handcrafted mobile CNNs”.
Why it matters: Neural Architecture Search (NAS) approaches are becoming increasingly popular (especially among researchers with access to vast amounts of cheap computation, like those that work at Google), so developing a better understanding of the performance strengths and tradeoffs of these systems will help researchers assess them relative to traditional techniques.
Read more: Searching Toward Pareto-Optimal Device-Aware Neural Architectures (Arxiv).

Tech Tales:

Context: Intercepted transmissions from Generative Propaganda Bots (GPBs), found on a small atoll within the [REDACTED] disputed zone in [REDACTED]. GPBs are designed to observe their immediate environment and use it as inspiration for the creation of ‘context-relevant propaganda’. As these GPBs were deployed on an un-populated island they have created a full suite of propaganda oriented around the island’s populace – birds.

Intercepted Propaganda Follows:

Proud beak, proud mind. Join the winged strike battalion today!

Is your neighbor STEALING your EGGS? Protect your nest, maintain awareness at all times.

Birds of a feather stick together! Who is not in your flock?

Eyes of an angle? Prove it by finding the ENEMY!

Things that inspired this story: Generative text, cheap sensors, long-lived computers, birds.