Import AI 156: The 7,500 images that break image recognition systems; open source software for deleting objects from videos; and what it takes to do multilingual translation

by Jack Clark

Want 7,500 images designed to trick your object recognition system? Check out the ‘Natural Adversarial Examples’ dataset:
Can your AI system deal with these naturally occurring optical illusions?…
Have you ever been fiddling in the kitchen and dropped an orange-colored ceramic knife into a pile of orange peels and temporarily lost it? I have! These kinds of visual puzzles can be confusing for humans, and are even more tricky for machines to deal with. Therefore, researchers with the University of Berkeley, the University of Washington and the University of Chicago have developed and released a dataset full of these ‘natural adversarial examples’, which should help researchers test the robustness of AI systems and develop more powerful ones. 

Imagenet-A: You can get the data as an ImageNet classifier test called ImageNet-A, which consists of around 7,500 images designed to confuse and frustrate modern image recognition systems.

How hard are ‘natural adversarial examples’? Extremely hard!
The researchers tested out DenseNet-121 and ResNeXt-50 models on the dataset and show that both obtain an accuracy rate of less than 3% on ImageNet-A (compared to accuracies of 97%+ on standard ImageNet). Things don’t improve much when they try to train their AI systems with techniques designed to increase robustness of classifiers, finding that using things like adversarial training, styleized imagenet augmentation, uncertainty metrics, and other approaches don’t work particularly well. 

Why this matters: Being able to measure all the ways in which AI systems fail is a superpower, because such measurements can highlight the ways existing systems break and point researchers towards problems that can be worked on. I hope we’ll see more competitions that use datasets like this to test how resilient algorithms are to confounding examples.
   Read more: Natural Adversarial Examples (Arxiv).
   Get the code and the ‘IMAGENET-A’ dataset here (Natural Adversarial Examples GitHub)

####################################################

Computer, delete! Open source software for editing videos:
…AI is making video-editing much cheaper and more effective…
Ever wanted to pick a person or an animal or other object in a video and make it disappear? I’m sure the thought has struck some of you sometimes. Now, open source AI projects let you do just this: Video Object Removal is a new GitHub project that does what it says. The technology lets you draw a bounding box around an object in a video, and then the AI system will try to remove the person and inpaint the scene behind them. The software is based on two distinct technologies: Deep Video Inpainting, and Fast Online Object Tracking and Segmentation: A Unifying Approach. 

Why this matters: Media is going to change radically as a consequence of the proliferation of AI tools like this – get ready for a world where images and video are  so easy to manipulate that they become just another paintbrush, and be prepared to disbelieve everything you see online.
   Get the code from the GitHub page here (GitHub)

####################################################

Breaking drones to let others make smarter drones:
…’ALFA” datasets gives researchers flight data for when things go wrong…
Researchers with the Robotics Institute at Carnegie Mellon University have released ALFA, a dataset containing flight data and telemetry from a model plane, including data when the plane breaks. ALFA will make it easier for people to assess how well fault-spotting and fault-remediation algorithms work when exposed to real world failures. 

ALFA consists of data for 47 autonomous flights with scenarios for eight different types of faults, including engine, rudder, and elevator errors. The data represents 66 minutes of normal flight and 13 minutes of post-fault flight time taking place over a mixture of fields and woodland near Pittsburgh, and there’s also a larger unprocessed dataset representing “several hours of raw autonomous, autopilot-assisted, and manual flight data with tens of different faults scenarios”. 

Hardware: To collect the dataset, the researchers used a modified Carbon Z T-28 model plane, equipped with an onboard Nvidia Jetson TX2 computer, and running ‘Pixhawk’ autopilot software modified so that the researchers can remotely break the plan, generating the failure data. 

Why this matters: Science tends to spend more time and resources inventing things and making forward progress on problems, rather than breaking things and casting a skeptical eye on recent events (mostly); datasets like ALFA make it easier for people to study failures, which will ultimately make it easier to develop more robust systems.
   Read more: ALFA: A Dataset for UAV Fault and Anomaly Detection (Arxiv).
   Get the ALFA data here (AIR Lab Failure and Anomaly (ALFA) Dataset website).

####################################################

How far are we from training a single AI system to translate between all languages?
…Study involves 25 billion parallel sentences across 103 languages…
How good are modern multilingual machine learning-based translation systems – that is, systems which can translate between a multitude of different languages, typically via using the same massive trained model? A new study from Google – which it says may be the largest ever conducted of its kind – analyzes the performance of these systems in the wild. 

Data: For the study, the researchers evaluate “a massive open-domain dataset containing over 25 billion parallel sentences in 103 languages” using a large-scale machine translation system. They think that “this is the largest multilingual NMT system to date, in terms of the amount of training data and number of languages considered at the same time”. The datasets are distributed somewhat unevenly, though, reflecting the differing levels of documentation available for different languages. “The number of parallel sentences per language in our corpus ranges from around tens of thousands to almost 2 billion”, they write; there is a discrepancy of almost 5 orders of magnitude between the languages with the greatest and smallest amounts of data in the corpus. Google generated this data by crawling and extracting parallel sentences from the web, it writes. 

Desirable features: An excellent multilingual translation system should have the following properties, according to the researchers:

  • Maximum throughput in terms of number of languages considered within a single model. 
  • Positive transfer towards low-resource languages. 
  • Minimum interference (negative transfer) for high-resource languages. 
  • Models that perform well in “realistic, open-domain settings”.

When More Data Does Not Equal Better Data: One of the main findings of the study is the difficulty of training large models on such varied datasets, the researchers write. “In a large multi-task setting, high resource tasks are starved for capacity while low resource tasks benefit significantly from transfer, and the extent of interference and transfer are strongly related.” They develop some sampling techniques to train models to be more resilient to this, but find this involves its own tradeoffs between large-data and small-data languages as well. In many ways, the complexity of the task of large-scale machine translation, seems to hide subtle difficulties: “Performance degrades for all language pairs, especially the high and medium resource ones, as the number of tasks grows”, they write. 

Scale: To improve performance, the researchers test our three variants of the ‘Transformer’ component, training a small parameter count model (400 million parameters), a large and wide 12-layer model (1.3 billion parameters), and a large and 24-layer deep model (1.3 billion parameters); the large and deep model demonstrates superior performance and “does not overfit in low resource languages”, suggesting that model capacity has a significant impact on performance. 

Why this matters: Studies like this point us to a world where we train a translation model so large and so capable that it seems equivalent to the Babelfish from The Hitchhiker’s Guide to the Galaxy – a universal translation system, capable of taking concepts from one language and translating them to another then decoding it into the relevant target language. It’s also fascinating to think about what kinds of cognitive capabilities such models might develop – translation is hard, and to do a good job you need to be able to port concepts between languages, as well as just carefully translating words.
   Read more: Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges (Arxiv)

####################################################

Tech Tales:

[Underground mixed-use AI-Human living complex, Earth, 2050] 

Leaving Breakdown City 

It was a citywide bug. A bad one. Came in off of some off-zoned industrial code cleaning facilities. I guess something leaked. It made its way in through some utility exchange data centers and it spread from there. The first sign was the smell – suddenly, all the retail-zoned streets got thick with the smell of bacon and of perfume and of citrus – that was the tell, the chemical synthesizers going haywire. Things spread after that. 

I went and got Sandy while this was going on. She was working in the hospital and when I got there  was wheeling out some spiderweb-covered analog medical systems, putting patients onto equipment without chips, and pulling the smart equipment out of the most at risk patients. The floor was slick with hand saniters, from all the machines on the walls deciding to void themselves at once. 

C’mon, I said to her.
Just help me hook this up, she said.
We plugged a machine into a patient and unplugged the electronics. The patient whispered ‘thank you’.
You’re so welcome, Sandy said. I’ll be back, don’t forget your pills. 

We left the hospital and we headed for one of the elevators. We could smell flowers and meat and smoke and it was a real rush to run together, noses thick with scent, as the world ended behind us. We made it to an elevator and turned its electronics over to analog, then used the chunky, mechanical controls to set it on an upward trajectory. We’d come out topside and head to the next town over, hope that the isolation systems had kicked in to halt the spread. 

Watching the madness spread from robot to robot, floor to floor, system to other sub-system, and so on. We held hands and looked at the city as we rose up, and we saw:

  • Garbage trucks backing into hospitals. 
  • One street cleaning robot chasing a couple of smaller robots. 
  • Main roads that were completely empty and small roads that were completely jammed. 
  • Factories producing products which flow out of the factory and into the street on delivery robots, which then take it to over-stocked stores, leaving the boxes outside. 
  • Other elevators shivering up and down shafts, way too fast, taking in products and robots and spitting them out elsewhere for other reasons.

Things that inspired this story: Brain damage; brain surgery, a 50/50 chance of being able to speak properly after aforementioned surgery (not mine!); the Internet of Things; the Internet of Shit; computer viruses, noir novels, an ambition to write dollar-store AI fiction.