Import AI 157: How weather can break self-driving car AI; modelling traffic via deep learning and satellites; and Chinese scientists make a smarter, smaller YOLOv3

by Jack Clark

Want to break an image classifier? Add some weather:
…Don’t use AI on a snow day…
Many of today’s object recognition systems are less robust and repeatable than people might assume – new research from the University of Tubingen and the International Max Planck Research School for Intelligent Systems shows just how fragile these systems are, with a trio of datasets that help people test the resilience of their AI systems. 

Three datasets to frustrate AI systems: The three datasets are called Pascal-C, Coco-C, and Cityscapes-C; these are ‘corrupted’ versions of existing datasets, and for each dataset the images within are corrupted with any of 15 distortions, each with five levels of severity. Some of the distortions that can be applied to the images include the addition of snow, frost, or fog to an image, as well as other distortions like the addition of noise, or the use of certain types of transforms. 

Just how bad is it: Out-of-the-box algorithms (typically based on the widely-used R-CNN family of models) see relative performance drops of between 30 and 50% on the corrupted versions of the datasts, highlighting the brittleness of many of today’s algorithms. 

Saving algorithms with messy data: One simple trick people can use to improve the robustness of models is to train them on stylized data – here, they basically take the underlying dataset and for each image create a variant stylized with a texture. These images are combined with the clean data, then trained on; models trained against datasets that incorporate the stylized data are more robust than those trained purely on clean data – this makes sense, as we’ve basically algorithmically expanded the dataset to encourage a certain type of generalization. 

Why this matters: Datasets like this make it easier for people to investigate the robustness of trained AI models, which can help us understand how contemporary models may fail and provide data to calibrate against when designing more robust ones. And the authors hope that other researchers will expand the benchmark further:
   “We encourage readers to expand the benchmark with novel corruption types. In order to achieve robust models, testing against a wide variety of different image corruptions is necessary, there is no ‘too much’. Since our benchmark is open source, we welcome new corruption types and look forward to your pull requests to https://github.com/bethgelab/imagecorruptions“.
   Read more: Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming (Arxiv).
   Get the code, data, and benchmarking leaderboard here (‘Robust Detection Benchmark’ official GitHub)

####################################################

The future of AI is… *checks notes* an AI assistant for the procedural building game Minecraft:
…Facebooks ‘CraftAssist’ project tries to build smarter AI systems by having them work alongside humans…
Facebook AI Research wants to study increasingly advanced AI systems by studying humans work alongside smart computers, so it has developed a bot to assist human players in the procedural building game Minecraft. “The ultimate goal of the bot is to be a useful and fun assistant in a wide variety of tasks specified and evaluated by human players”, they write. 

How the bot works: The robot works by taking in written prompts in natural language, then maps those to sequences of actions, like moving around the world or interacting with objects.
   For instance, in response to the query “go to the blue house”, the agent would try and map ‘blue’ and ‘house’ to entities it had stored in its memory, and if it found them would try to create a ‘move’ task that could let the agent navigate to that part of the world. The team achieves this via a neural semantic parser they call the Text-to-Action-Dictionary (TTAD) model, which converts natural language commands to specific actions. The agent also ships with systems to help out process the world around it, crudely analyzing the terrain and also heuristics for referring to objects based on their positions. 

Future extensions: Facebook has designed its agent to be extended in the future with more advanced AI capabilities. To that end, any CraftAssist agent can take in images in the form of a 64X64 ‘block’ resolution view (so viewing in terms of blocks in minecraft, rather than individual pixels). The agents can also access a 3D map of the space they’re in, so can locate any block within the world around them. 

Datasets: Facebook is releasing a dataset consisting of 800000 pairs of (algorithmically generated) actions and written instructions, 25402 human-written sentences that map to some of these actions pairs; 2513 suggested/imagined commands from humans that interacted with the bot, and 708 dialogue-action pairs from in-game chat. They’ve also release a ‘House’ dataset, which consists of 2050 human-built houses from Minecraft.  

Why this matters: Embedding AI systems into games will likely be one of the ways that we see people take AI research and port it into production – the use of Minecraft here is interesting given its playerbase numbering in the tens of millions, many of them children. Could we eventually see AI systems trained via the conversations with kids talking in broken English, training more robust policies through childish lingo? I think so! Next up: a generative Fortnite dance machine!
   Read more: CraftAssist: A Framework for Dialogue-enabled Interactive Agents (Arxiv).
   Get the code for CraftAssist here (official GitHub repository).

####################################################

Counting cars with deep learning and satellite imagery:
How can you count cars in countries that don’t have sensors wired into roads and traffic lights to gather the required data? Researchers with CMU and ETH Zurich think the use of deep learning and satellite imagery could be a viable supplement, and could help countries easily get measures for the Average Annual Daily Truck Traffic (AADTT) in a given region. 

In new research, they develop “a remote sensing approach to monitor freight vehicles through the use of high-resolution satellite images,” they write. “As satellite images become both cheaper and are taken at a higher resolution over time, we anticipate that our approach will become scalable at an affordable cost within the next few years to much larger geographic regions”.

The data: To train their system, the researchers hand-annotated vehicles seen in satellite images with around 2,000 bounding boxes from the Northeastern USA. “We used the predicted vehicle count from the detection model, the time stamp of the images, time-varying factors, and speed to make a probabilistic prediction of the AADTT”.

Testing generalization: The researchers gathered the data in America, and tested it also on data gathered from Brazil to explore the generalization properties of their system. “We found that distinct truck types (rather than geography) can impact the prediction accuracy of the detection model, and additional training seems necessary to transfer the model between countries,” they write. Additionally, “information on local driving patterns and labor laws could reduce the estimation error from the traffic monitoring model.” They trained a single-shot detection model to detect vehicles, and found that the model could provide reasonable predictions for locations from the United States, but struggled to provide as accurate predictions for Brazil, even once finetuned. 

Why this matters: Medium- and heavy-duty trucking accounts for about 7% of global CO2 emissions, and more than half of the world’s countries lack the infrastructure needed to accurately monitor traffic in their countries. Therefore, if we can develop AI-based classifiers to provide crude, cheap assessment capabilities, we can gather more data to help inform people about the world.
   Read more: Truck Traffic Monitoring with Satellite Images (Arxiv)

####################################################

How should AI researchers broadcast their insights to the world, and what do they need to be careful about?
…Publication in AI isn’t a binary choice between ‘release’ or ‘don’t release’, there are other tools available…
How can researchers maximize their contribution to scientific discourse while minimizing downsides (dual-use, malicious use, abuse, etc) of their research? That’s a question researchers from The Thoughtful Technology Project and Cambridge University’s Leverhulme Center for the Future of Intelligence, set out to provide some answers to in a blog post and action-oriented paper. 

   The core of their argument is that when researchers think they may have cause to question the release of their research, they should view their choice as being one of many, rather than a binary decision: “We particularly want to emphasize that when thinking about release practices, the choice is not a binary one between ‘release’ or ‘don’t release’. There are several different dimensions to consider and many different options within each of these dimensions, including: (1) content — what is released (options ranging from a fully runnable system all the way to a simple use case idea or concept); (2) timing — when it is released (options include immediate release, release at a specific predetermined time period or external event, staged release of increasingly powerful systems); and (3) distribution — where/to whom it is released to (options ranging from full public access to having release safety levels with auditing and approval processes for determining who has access).”

What should people do? They suggest three things the AI community should do to increase the chance of accruing the maximum possible social benefit from AI while minimizing certain downsides.

  1. Understand the potential risks of research via collaboration with experts, and develop mitigation strategies
  2. Build a community devoted to mitigating malicious use impacts of AI research and work to establish collective norms. 
  3. Create institutions to manage research practices in ML, potentially including techniques for expert vetting of certain research, as well as the development of sophisticated release procedures for research. 

Read more: Reducing malicious use of synthetic media research (Medium).

####################################################

Chinese scientists make a smarter, smaller drone vision system:
…What happens when drones become really, really smart?…
I have a confession to make: I’m afraid of drones. Specifically, I’m afraid of what happens when in a few years drones gain significant autonomous capabilities as a dividend of the AI revolution, and bad people do awful shit with these capabilities. I’m concerned about this because while drones have a vast range of good uses (which massively outnumber the negative ones!), they are also fundamentally mobile robots, and mobile robots are, to some people, great weapons (e.g. ISIS use of modified DIY military drones in recent years). 

What am I doing about this worry? I’m closely tracking developments in drone sensing and moving capabilities to try and develop my intuitions about this sub-field of AI development, and whenever I speak to policymakers I advocate for large-scale investments into the ongoing measurement, analysis, forecasting, and benchmarking of various AI capabilities so as to direct public money towards positive uses and generate the data that can unlock funding for dealing with (potential) negative uses. One of the things that motivates me here is a belief that if we just develop decent intuitions about the shape of progress at intersection of AI and drones, we’ll be able to get ahead of 95% of the bad stuff, and maximize our ability to benefit as a society from the technology. 

Now, researchers with the Beijing Institute of Technology have published (and released code for) ‘SlimYOLOv3’, a miniaturized version of the widely-used, very popular ‘YOLO’ (You Only Look Once) object recognition model. The difference between SlimYOLOv3 and YOLOv3 is simple: the Slim version is much, much smaller than the other, making it easier to deploy it on small computational devices, like the chips that can fit onto most drones. Specifically, they use sparsity training to guide a subsequent pruning process which helps them chop out unneeded bits of the neural network, then they fine-tune the model, and iteratively repeat the process until they obtain a satisfactory loss. 

So, how well does it work? “SlimYOLOv3 achieves compelling results compared with its unpruned counterpart: ~90.8% decrease of FLOPs, ~92% decline of parameter size, running ~2 faster and comparable detection accuracy as YOLOv3,” the authors write.
   They test out the system on the ‘VisDrone2018-Det’ dataset, which consists of ~7,000 drone-captured images containing any of ten predefined labelled objects (eg, pedestrian, car, bicycle, etc). They test out their SlimYOLOv3 system against an efficient YOLOv3 baseline, as well as a version of YOLOv3 augmented with spatial pyramid pooling (YOLOv3-SPP3). Variants of SlimYOLOv3 obtain scores that are around 10 absolute percentage points higher on evaluation criteria like Precision, Recall, and F1-score when compared against YOLOv3-tiny, while fitting in roughly the same computational envelop (8 million parameters, ~30mb model size). However, SlimYOLOv3 has a somewhat higher inference time than the less accurate YOLOv3-tiny. 

Be careful what you wish for: It’s notable that in March 2018 (Import AI #88), when YOLOv3 got released, the author anticipated its rapid diffusion, modification, and use: “What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook. I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal information and sell it to…. wait, you’re saying that’s exactly what it will be used for?? Oh. Well the other people heavily funding vision research are the military and they’ve never done anything horrible like killing lots of people with new technology oh wait…”.

Why this matters: Technologies like SlimYOLOv3 will give drones better, more efficient perceptive capabilities, which will make it easier for researchers to deploy increasingly sophisticated autonomous and semi-autonomous systems onto drones. This is going to change the world massively and rapidly – we should pay attention to what is happening.
   Read more: SlimYOLOv3: Narrower, Faster, and Better for Real-Time UAV Applications (Arxiv).  

####################################################

Want to test and develop better commonsense AI systems? Try WINOGRANDE:
…From 273 Winograd questions to 40,000 WINOGRANDE ones. Plus, pre-training for commonsense!…
Researchers with the Allen Institute for AI and the University of Washington have released ‘WINOGRANDE’, a scaled-up version of the iconic Winograd Schema Challenge (WSC) test for AI systems. For those not familiar, the WSC is a challenge and dataset consisting of 273 problems that AI systems need to try and solve. 

   So, what’s a Winograd problem? Here’s an example:
   Problem: “Pete envies Martin because he is successful.”
   Question: Is ‘he’ Pete or Martin?
   Answer: Martin. 

These problems are difficult for computers because they typically require a combination of context, world knowledge, and symbolic reasoning to solve. Some people have used progress on WSC as a litmus test for broader progress in AI research. But one thing that has held WSC back has been the lack of data – however you slice it, 273 just isn’t very large. Another has been that though the WSC questions were designed by experts to be challenging for AI systems, they still exhibit some language-based and data-based biases that can be exploited by AI systems, which can solve them by uncovering some of these underlying (unintentional) statistical regularities. 

Enter WINOGRANDE: The new WINOGRANDE dataset has been designed to be free of these biases, while also being much larger; the dataset contains around 44k questions, developed through crowdsourcing. The researchers hope researchers will test systems against WINOGRANDE to develop smarter systems, and will also use the dataset as a pre-training resource for applying to subsequent tasks (in tests, they show they can pre-train on WINOGRANDE to improve the state of the art on a range of other commonsense reasoning benchmarks in AI, including WSC, PDP, DPR, and COPA). 

Why this matters: Datasets like WINOGRANDE help define the frontier of difficulty for some AI systems and can also serve as training inputs for other, larger models. Commonsense reasoning is one of the main examples people use when discussing the limitations of contemporary AI techniques, so WINOGRANDE could define a new challenge which, if solved, could tell us something important about the future of genuinely intelligent AI.
   Read more: WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale (Arxiv). 

####################################################

Positive uses of AI – A crowd-sourced list:
…AI isn’t all doom and gloom – it’s also changing the world for the better…
This year, I’ve been giving an occasional lecture to congressional staff at the Woodrow Wilson Center in Washington DC on AI, measurement, and geopolitics. The lecture is basically a Cliff’s Notes version of a lot of the central concerns of Import AI: the relationship between AI and compute; the geopolitical shifts caused by AI advances; what AI tells us about the (complicated!) future of 2019-era-capitalism; how to view AI as an ecosystem of differently resourced parties rather than simply as blobs of resources linked to specific nation states, and so on. 

Recently, I asked some of my wonderful friends on Twitter for recent examples of positive uses of AI that I could highlight in a lecture, in part to show the pace of development here, and the breadth of opportunity. I got a great response to my Tweet, so I’m including some of the responses here as breadcrumbs for others:

Read the original tweet and the rest of the responses here (@jackclarksf twitter)

####################################################

Tech Tales:

The Rich Vessel 

Every week, someone else gets to be the richest person in the world. It’ll never be me because I don’t have the implant, so they can’t port my brain over into The Vessel. But for 95% of the rest of the planet, it could be them. 

So what happens when you’re the richest person in the world? Pretty predictable things:

  • Lots of people choose to feed people.
  • Lots of people choose to house people.
  • Lots of people choose to donate wealth. 
  • Few people choose to flaunt wealth. 
  • Few people choose to use wealth to hurt others. 
  • Very few people try to use wealth to influence politics (and they fail, as policy takes years, and getting stuff done in a week requires an act of god combined with a one-in-a-million chance). 
  • Basically no one rejects the offer. 

Now here is the scary thing. Would it surprise you if I told you that, despite this experiment running for over a year now, the richest person in the world is still the richest person in the world – and getting richer. 

You see, it turns out when the richest person in the world announced they were ‘taking a step back’ and created The Vessel initiative there was an ulterior purpose. They weren’t trying to ‘share their wealth and life experience’, they were trying to make sure that their own estate was resilient to them changing their own ideology. The whole purpose of The Vessel project isn’t to enhance our understanding of eachother, but is instead to give the Family Office and Lawyers and Consultants of the richest person in the world an ever-growing set of examples of all the decisions they need to be resilient to. 

After all, even if the richest person in the world woke up one day and wanted to give all their money away at once, that wouldn’t be the smartest move for them. They’d need to slow it down. Think more. Bring in the lawyers and consultants. Thanks to The Vessel, the collective intelligence of the world is discovering all the ways the world’s richest person could subvert the architectures of control they had built around themselves. 

Things that inspired this story: The habits of billionaires; Baudrillard; carceral architectures of bureaucracy and capital; brain-implants; societal stability; Gini coefficient; recipes to avoid revolution, recipes for trapping the world in amber until the sun melts it.