Import AI 273: Corruption VS Surveillance; Baidu makes better object detection; understanding the legal risk of datasets

Sure, you can track pedestrians using Re-ID, but what if your camera is corrupted?
…Testing out Re-ID on corrupted images…
Pedestrian re-identification is the task of looking at a picture of someone in a CCTV camera feed, then looking at a picture from a different CCTV feed and working out they’re the same person. Now, researchers with the Southern University of Science and Technology in China have created a benchmark for ‘corruption invariant person re-identification’; in other words, a benchmark for assessing how robust re-ID systems are to perturbations in the images they’re looking at.

What they did: The authors take five widely-used Re-ID datasets (CUHK03, Market-1501, MSMT17, RegDB, SYSU-MM01) and then apply ~20 image corruptions to the images, altering them with things like rain, snow, frost, blurring, brightness variation, frosted glass, and so on. They then look at popular re-ID algorithms and how well they perform on these different datasets. Their findings are both unsurprising and concerning: “In general, performance on the clean test set is not positively correlated with performance on the corrupted test set,” they write.

Things that make you go ‘hmmm’: It’s quite typical for papers involved in surveillance to make almost no mention of, you know, the impact of surveillance. This is typically especially true of papers coming from Chinese institutions. Well, here’s an exception! This paper has a few paragraphs on broader impacts that names some real ReID issues, e.g, that lots of ReID data is collected without consent and that these datasets have some inherent fairness issues. (There isn’t a structural critique of surveillance here, but it’s nice to see people name some specific issues).

Why this matters: Re-ID is the pointy-end of the proverbial surveillance sphere – it’s a fundamental capability that is already widely-used by governments. Understanding how ‘real’ performance improvements are here is of importance for thinking about the social impacts of large-scale AI.
  Read more:Benchmarkjs for Corruption Invariant Person Re-identification (arXiv).

####################################################

What’s been going on in NLP and what does it mean?
…Survey paper gives a good overview of what has been going on in NLP…
Here’s a lengthy survey paper from researchers with Raytheon, Harvard, the University of Pennsylvania, University of Oregon, and University of the Basque Country, which looks at the recent emergence of large-scale pre-trained language models (e.g, GPT-3), and tries to work out what parts of this trend are significant. The survey paper concludes with some interesting questions that researchers in the field might want to focus on. These include:

How much unlabeled data is needed? It’s not yet clear what the tradeoffs are between having 10million and a billion words in a training set are – some skills might require billions of words, while others may require millions. Figuring out which capabilities require which amounts of data would be helpful.

Can we make this stuff more efficient? Some of the initial large-scale modules consume a lot of compute (e.g, GPT-3). What techniques might we hope to use to make these things substantially more efficient?

How important are prompts? Prompts, aka, filling up the context window in a pre-trained language model with a load of examples, are useful. But how useful are they? This is an area where more research could shed a lot of light on the more mysterious properties of these systems.
  Read more:Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (arXiv).

####################################################

What does it take to make a really efficient object detection system? Baidu has some ideas:
…PP-PicoDet: What industrial AI looks like…
Baidu researchers have built PicoDet, software for doing object detection on lightweight mobile devices, like phones. PicoDet tries to satisfy the tradeoff between performance and efficiency, with an emphasis on miniaturizing the model so you can run object detection at the greatest number of frames per second on your device. This is very much a ‘nuts and bolts’ paper – there isn’t some grant theoretical innovation, but there is a lot of productive tweaking and engineering to crank out as much performance as possible.

How well do these things work: Baidu’s models outperform earlier Baidu systems (e.g, PP-YOLO), as well as the widely used ‘YOLO’ family of object detection models. The best systems are able to crank out latencies on the order of single digit milliseconds (compared to tens of milliseconds for prior systems).

Neural architecture search: For many years, neural architecture search (NAS) techniques were presented as a way to use computers to search for better variants of networks than those designed by humans. But NAS approaches haven’t actually shown up that much in terms of applications. Here, the Baidu authors use NAS techniques to figure out a better detection system – and it works well enough they use it.
Read more: PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices (arXiv).

####################################################

Want to use web-scraped data without being sued into oblivion?
…Huawei researchers lay out the messy aspects of AI + licensing…
Today, many of the AI systems used around us are built on datasets that were themselves composed of other datasets, some of which were indiscriminately scraped from the web. While much of this data is likely covered under a ‘fair use’ provision due to the transformational nature of training models on it, there are still complicated licensing questions that companies need to tackle before using the data. This is where new work from Huawei, York University, and the University of Victoria tries to help, by providing a set of actions an organization might take to assure itself it is on good legal ground when using data.

So, you want to use web-scraped datasets for your AI model? The researchers suggest a multi-step process, which looks like this:
– Phase one: Your AI engineers need to extract the license from your overall model (e.g, CIFAR-10), then identify the provenance of the dataset (e.g, CIFAR-10 is a subset of the 80 Million Tiny Images dataset), now go and look at the data sources that compose your foundational dataset, and extract their licenses as well.
– Phase two: Your lawyers need to read the license associated with the dataset and underlying sources, then needs to analyze the license(s) with regard to the product being considered and work out if deployment works.
– Phase three: If the licenses and source-licenses support the use case, then you should deploy. If a sub-component of the system (e.g, a subsidiary license) doesn’t support your use case, then you should flag this somewhere.

Case study of 6 datasets: The authors applied their method to six widely-used datasets (FFHQ, MS COCO, VGGFace2, ImageNet, Cityscapes, and CIFAR-10) and found the following:
– 3/6 have a standard dataset license (FFHQ, MS COCO, VGGFace2 have standard licenses, ImageNet and Cityscapes have a custom license, CIFAR-10 doesn’t mention one)
– 5/6 datasets contain data from other datasets as well (exception: Cityscapes)
– 5/6 datasets could result in license compliance violation if used to build commercial AI (exception: MS COCO)

Is this horrendously complicated to implement? The authors polled some Huawei product teams about the method and got the feedback that people worried “over the amount of manual effort involved in our approach”, and “wished for some automated tools that would help them”.
Read more: Can I use this publicly available dataset to build commercial AI software? Most likely not (arXiv).

####################################################

Tech tales:

Attention Trap
[A robot-on-robot battlefield, 2040, somewhere in Africa]

The fireworks were beautiful and designed to kill machines. They went up into the sky and exploded in a variety of colors, sending sparklers pinwheeling out from their central explosions, and emitting other, smaller rockets, which exploded in turn, in amazing, captivating colors.
Don’t look don’t look don’t look thought the robot to itself, and it was able to resist the urge to try to categorize the shapes in the sky.
But one of its peers wasn’t so strong – and out of the sky came a missile which destroyed the robot that had looked at the fireworks.

This was how wars were fought now. Robots, trying to spin spectacles for each other, drawing the attention of their foes. The robots were multiple generations in to the era of AI-on-AI warfare, so they’d become stealthy, smart, and deadly. But they all suffered the same essential flaw – they thought. And, specifically, their thinking was noisy. So many electrical charges percolating through them whenever they processed something. So much current when they lit themselves up within to compute more, or store more, or attempt to learn more.
  And they had grown so very good at spotting the telltale signs of thinking, that now they did this – launched fireworks into the sky, or other distractors, hoping to draw the attention and therefore the thinking of their opponents.

Don’t look don’t look don’t look had become a mantra for one of the robots.
Unfortunately, it overfit on the phrase – repeating it to itself with enough frequency that it’s thought showed up as a distinguishable pattern to the exquisite sensors of its enemies.
Another missile, and then Don’tlookdon’tlo-SHRAPNEL. And that was that.

The robots were always evolving. Now, one of the peers tried something. Don’t think, it thought. And then it attempted to not repeat the phrase. To just hold itself still, passively looking at the ground in front of it, but attempting-without-attempting to not think of anything – to resist the urge to categorize and to perceive.

Things that inspired this story: Thinking deeply about meditation and what meditation would look like in an inhuman mind; adversarial examples; attention-based methods for intelligence; the fact that everything in this world costs something and it’s really about what level of specificity people can detect costs; grand strategy for robot wars.