Import AI 194: DIY AI drones; Audi releases its self-driving dataset; plus, Eurovision-style AI pop.

Want to see if AI can write a pop song? Cast your vote in this contest:
…VPRO competition challenges teams to write a half-decent song using AI tools…
Dutch broadcaster VPRO wants to see if songs created via AI tools can be compelling, enjoyable pieces of music. Contestants need to use AI to help them compose a song of no more than three minutes long and need to document their creative process. Entries will be judged by a panel of AI Experts, as well as an international audience who can cast votes on the competition website (yes, that includes you, the readers of Import AI).

What are they building in there? One French group has used GPT-2, Char-RNN, and Magenta Studio for Ableton to write their song, and an Australian act has used audio samples of Australian animals including koalas, kookaburras and Tasmanian devils as samples for their music (along with a generative system trained on Eurovision pop contest songs).

  When do we get a winner? Winners will be announced on May 12, 2020.
  Listen to the songs: You can listen to the songs and find out more about the teams here.
Read more here: FAQ about the AI Song Contest (vpro website).

####################################################

Audi releases a semantic segmentation self-driving car dataset:
…Audi sees Waymo’s data release, raises with vehicle bus data…
Audi has released A2D2, a self-driving car dataset. This is part of a recent trend where large companies have started releasing expensive datasets, collected by proprietary means.

What is A2D2 and what can you do with it? The dataset consists of simultaneously recorded images and 3D point clouds, along with 3D bounding boxes, semantic segmentation, instance segmentation, and data from the vehicle’s automotive bus. This means it’s a good dataset for imitation learning research, as well as various visual processing problems. The inclusion of the vehicle’s automotive bus data is interesting, as it means you can also use this dataset for reinforcement learning research, where you can learn from both the visual scenes and also the action instructions from the bus.

How much data? A2D2 consists of around 400,000 images in total. Includes data recorded on highways, country roads, and cities in the south of Germany. The data was recorded under cloudy, rainy, and sunny weather conditions. Some of the data is labelled: 41,277 images are accompanied with semantic and instance segmentation labels for 38 categories, and 12,497 images also annotated with 3D bounding boxes within the field of view of the front-center camera.

How does it compare? The A2D2 dataset is relatively large compared to other self-driving datasets, but is likely smaller than Waymo’s Waymo Open Dataset (Import AI 161), which has 1.2 million 2D bounding boxes and 12 million 3D bounding boxes in its dataset across hundreds of thousands of annotated frames. However, Audi’s dataset includes a richer set of types of data, including the vehicle’s bus.

GDPR & Privacy: The researchers blur faces and vehicle number plates in all the images so they can follow GDPR legislation, they say.

Who gets to build autonomous cars? One motivation for the dataset is to “contribute to startups and other commercial entities by freely releasing data which is expensive to generate”, the researchers write. This highlights an awkward truth of today’s autonomous driving developments – gathering real-world data is a punishingly expensive exercise, and because for a long time companies kept data private, there aren’t many real-world benchmarks. Dataset releases like A2D2 will hopefully make it easier for more people to conduct research into autonomous cars.
  Read more: A2D2: Audi Autonomous Driving Dataset (arXiv).
  Download the 2.3TB dataset here (official A2D2 website).

####################################################

The DIY AI drone future gets closer:
…Software prototype shows how to load homebrew models onto consumer drones…
Researchers with the University of Udine in Italy and the Mongolian University of Science and Technology have created a software system that lets them load various AI capabilities onto a drone, then remotely pilot it. The system is worth viewing as a prototype for how we might see AI capabilities get integrated into more sophisticated, future systems, and it hints at a future full of cheap consumer drones being used for various surveillance tasks.

The software: The main work here is in developing software that pairs a user-friendly desktop interface (showing a drone video feed, a map, and a control panel), with backend systes that interface with a DJI drone and execute AI capabilities on it. For this work, they implement a system that combines a YOLOv3 object detection model with a Discriminative Correlation Filter (DCFNet) model to track objects. In tests, the system is able to track an object of interest at 29.94fps, and detect multiple objects at processing speeds of around 20fps. 

Where this research is going: Interfaces are hard – but they always get built given enough interest. I think in the future we’ll see open source software packages emerge that let us easily load homebrew AI models onto off-the-shelf consumer drones. I think the implications of this kind of capability are hard to fathom, and I’d guess we’re less than three years away from us seeing scaled-up versions of the research discussed here.
  Read more: An Efficient UAV-based Artificial Intelligence Framework for Real-Time Visual Tasks (arXiv).

####################################################

Can AI help us automate satellite surveillance? (Hint: Yes, it can):
…Where we’re going, clouds don’t matter…
A group of defense-adjacent of involved organizations have released SpaceNet6, a high-resolution synthetic aperture radar dataset. “No other open datasets exist that feature near-concurrent collection of SAR and optical at this scale with sub-meter resolution,” they write. The authors of the dataset and associated research paper come from In-Q-Tel, Capella Space, Maxar Technologies, German Aerospace Center, and the Intel AI Lab. They’re also launching a challenge for researchers to train deep learning systems to infer building dimensions from SAR data. The dataset and associated paper

What’s in the data? The SpaceNet6 Multi-Sensor All Weather Mapping (MSAW) dataset consists of SAR and optical data of the port of Rotterdam, the Netherlands, and contains 48,000 annotated building footprints across 120 square kilometers of sensory data. “The dataset covers heterogeneous geographies, including high-density urban environments, rural farming areas, suburbs, industrial areas and ports resulting in various building size, density, context and appearance”.

Who cares about SAR? SAR is an interesting data format – it’s radar, so it is made up of reflections from the earth, which means SAR data has different visual traits to optical data (e.g, one phenomenon called layover distorts things like skyscrapers ‘where the object is so tall that the radar signal reaches the top of an object before it reaches the bottom of it’, which causes alignment problems.  This “presents unique challenges for both computer vision algorithms and human comprehension,” the researchers write. But SAR also has massive benefits – it intuitively maps out 3D structures, can see through clouds, and as we develop better SAR systems we’ll be able to extract more and more information from the world. The challenge is building automated systems that can decode it and harmonize it with optical data – which is some of what SpaceNet6 helps with.

Interesting progress: “Although SAR has existed since the 1950s [22] and studies with neural nets date back at least to the 1990s [3], the first application of deep neural nets to SAR was less than five years ago [23]. Progress has been rapid, with accuracy on the MSTAR dataset rising from 92.3% to 99.6% in just three years [23, 12]. The specific problem of building footprint extraction from SAR imagery has been only recently approached with deep-learning [29, 37]”

Can you solve the MSAW challenge? “The goal of te challenge is to extract building footprints from SAR imagery, assuming that coextensive optical imagery is available for training data but not for inference,” they write. The nature of the challenge relates to how people (cough intelligence agencies cough) might want to use this capability in the wild; “concurrent collection [of optical data] is often not possible due to inconsistent orbits of the sensors or cloud cover that will render the optical data unusable”.
  Read more: SpaceNet6: Multi-Sensor All Weather Mapping Dataset (arXiv).
  Get the SpaceNet6 data here (official website).

####################################################

How deep learning can enforce social distancing:
…COVID means dystopias can become desirable…
An AI startup founded by Andrew Ng has built a tool that can monitor people in videos and work out if they’re standing too close together. This system is meant to help customers of the startup, Landing AI, automatically monitor their employees and be better able to enforce social distancing norms to reduce transmission of the coronavirus.
  “The detector could highlight people whose distance is below the minimum acceptable distance in red, and draw a line between to emphasize this,” they write. “The system will also be able to issue an alert to remind people to keep a safe distance if the protocol is violated.”

Why this matters: AI is really just shorthand for ‘a computer analog of a squishy cognitive capability’, like being able to perceive certain things or make certain correlations. Tools like this social distancing prototype highlight how powerful it can be to bottle up a given cognitive capability and apply it to a narrowly defined task, like figuring out if people are walking too close together. It’s also the sort of thing that makes people intuitively uncomfortable – we know that this kind of thing can be useful for helping to fight a coronavirus, but we also know that the same technology can be a boon to tyrants. How does our world change as technologies like this become ever easier to produce for ever-more specific purposes?
  Check out a video of the system in action herer (Landing AI YouTube).
  Read more: Landing AI Creates an AI Tool to Help Customers Monitor Social Distancing in the Workplace (Landing AI blog).

####################################################

Want to test out a multi-task model in your web browser? Now you can!
…Think you can flummox a cutting-edge model? Try the ViLBERT demo…
In the past few years, we’ve moved from developing machine learning models that can do single tasks to ones that can do multiple tasks. One of the most exciting areas of research has been in the development of models that can perform tasks in both the visual and written domains, like being able to caption pictures, or answer written questions about them. Now, researchers with Facebook, Oregon State University, and Georgia Tech, have put a model on the internet so people can test it themselves.

How good is this model? Let’s see: I decided to test the model by seeing how well it did at challenges relating to an image that contained a picture of a cellphone. After uploading my picture, I was able to test out the model on tasks like visual question answering, spatial reasoning (e.g, what is to the right of the phone); visual entailment, and more. Try it out yourself!
  Play with the demo yourself: CloudCV: ViLBERT Multi-Task Demo
  Read more about the underlying research: 12-in-1: Multi-Task Vision and Language Representation Learning (arXiv).

####################################################


AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Concrete mechanisms for trustworthy AI:
As AI systems become increasingly powerful, it becomes increasingly important to ensure that they are designed and deployed responsibly. Fostering trust between AI developers and society at large is an important aspect of achieving this shared goal. Mechanisms for making and assessing verifiable claims are an important next step in building and maintaining this trust.

Principles: Over the last few years, companies and researchers have been adopting ethics principles. These are a step in the right direction, but can only get us so far — they are generally non-binding and hard to verify. We need concrete mechanisms to allow AI developers to demonstrate responsible behavior, grounded in verifiable claims. Such mechanisms are commonplace in other industries — e.g. we have well-defined standards for vehicle safety that are subject to testing.

Mechanisms: The report recommends several mechanisms that operate on different parts of the AI development process.
Institutional mechanisms are designed to shape the incentives of people developing AI — e.g. bias and safety bounties to incentivize external parties to discover and report flaws in AI systems; red teaming exercises to encourage developers to discover and fix such flaws in their own systems.
– Software mechanisms can enable better oversight of AI systems’ properties to support verifiable claims — e.g. audit trails capturing relevant information about the development and deployment process to make parties more accountable; better interpretability of AI systems to allow all parties to better understand and scrutinize them.
– Hardware mechanisms can help verify claims about private and security, and the use and distribution of computational resources — e.g. standards for secure hardware to support assurances about privacy and security; standards for measuring the use of computational resources to make it easier to verify claims about what exactly organizations are doing.

Jack’s view: I helped out with some of this report and I’m excited to see what kinds of suggestions and feedback we get about the proposed mechanisms. I think the biggest thing is what happens in the next year or so – can we get different people and organizations to experiment with these mechanisms and thereby create evidence for how effective (or ineffective) they are? Watch this space!
Matthew’s view: This is a great report, and I’m excited to see a collaborative effort between developers and other stakeholders in designing and implementing these sorts of mechanisms. As the authors point out, there are important challenges in responsible AI development that are unlikely to be solved through easier verification of claims — e.g. ensuring safety writ large (a goal that is too general to be formulated into an easily verifiable claim).
  Read more: Toward Trustworthy AI: Mechanisms for Supporting Verifiable Claims (arXiv).

####################################################

Tech Tales:

The danger of a thousand faces
2022

Don’t start with your friends. That’s a mistake. Start with strangers. It’s easy enough to find them. There are lots of sites that let you chat with random strangers. Go and talk to them. While they talk to you, record them. Feed that data into the system. Get your system to search for them on the web. If they seem to know interesting people – maybe people with money, or people who work at a company you’re interested in – then you get your system to learn how to make you look like them. Deepfaking – that’s what people used to call it before it went everywhere. Then you put on their face and use an audio transform to make your voice sound like theirs, and you try and talk to their friends, or colleagues, or family members. You use the face to find out more information. Maybe gather other people’s faces.

You could go to prison for this. That was the point of the Authenticity Accords. But to go to prison, someone has to catch you. So pick your targets. Not too technical. Not too young. Never go for teenagers – too suspicious of anything digital. Find your targets and pretend. The better you are at pretending, the better you’ll do.

See for yourself how people react to you. But don’t let it change you. If you spend enough time wearing someone else’s face, you’ll either slip up or get absorbed. Some people think it gets easier as you put on more faces. These people are wrong. You just get more used to changing yourself. One day you’ll look in the mirror and your own face won’t seem right. You’ll turn on your machine and show yourself a webcam view and warp your face to someone else. Then you’ll look into your eyes that are not your eyes and you’ll whisper “don’t you see” and think this is me.

Things that inspired this story: Deepfakes; illusion; this project prototyping deepfake avatars for Skype/Zoom; Chatroulette; endless pandemic e-friendships via video;

Some technical assumptions: Since this story is set relatively near in the future, I’m going to lay out some additional thinking behind it: I’m assuming that we figure out small-data fine-tuning for audio synthesis systems, which I’m betting will come from large pre-trained models (similar to what we’ve seen in vision and text); I’m also assuming this technology will go ‘consumer-grade’, so we’ll see joint video-audio ‘deepfake’ software suites get developed and open-sourced (either illicitly or otherwise). I’m also presuming we won’t sort out better authentication of digital media, and it will be sufficiently expensive to run full-scale audio/video detector models on certain low-margin services (e.g., some social media sites) that enforcement will be thin.