Import AI 271: The PLA and adversarial examples; why CCTV surveillance has got so good; and human versus computer biases

Just how good has CCTV surveillance got? This paper gives us a clue:
…One of the scariest AI technologies just keeps getting better and better…
Researchers with Sichuan University have written a paper summarizing recent progress in pedestrian Re-ID. Re-ID is the task of looking at a picture of a person, then a different picture of that person from a different camera and/or angle, then figuring out that those images are of the same people. It’s one of the scarier applications of AI, given that it enables low-cost surveillance via the CCTV cameras that have proliferated worldwide in recent years. This paper provides a summary of some of the key trends and open challenges in the AI capability.

Datasets: We’ve seen the emergence of both image- and video-based datasets that, in recent years, have been distinguished by their growing complexity, the usage of multiple different cameras, and more variety in the types of angles people are viewed from.

Deep learning + human expertise: Re-id is such an applied area that recent years have seen deep learning methods set new state-of-the-art performance, usually by pairing basic deep learning methods with other conceptual innovations (e.g, using graph convolution networks and attention-based mechanisms, instead of things like RNNs and LSTMs, or optical flow techniques).

What are the open challenges in Re-ID? “Although existing deep learning-based methods have achieved good results… they still face many challenges,” the authors write. Specifically, for the technology to improve further, researchers will need to:
– Incorporate temporal and spatial relationship models to analyze how things happen over time.
– Build larger and more complicated datasets
– Improve the performance of semi-supervised and unsupervised learning methods so they’re less dependent on labels (and therefore, reduce the cost of dataset acquisition)
– Improve the robustness of Re-ID systems by making them more resilient to significant changes in image quality
– Create ‘end-to-end person Re-ID’ systems; most Re-ID systems perform person identification and Re-ID via separate systems, so combining these into a single system is a logical next steps.
  Read more: Deep learning-based person re-identification methods: A survey and outlook of recent works (arXiv).

####################################################

Do computers have the same biases as humans? Yes. Are they more accurate? Yes:
…Confounding result highlights the challenges of AI ethics…
Bias in facial recognition is one of the most controversial issues of the current moment in AI. Now, a new study from researchers from multiple US universities has found something surprising – computers are far more accurate than non-expert humans at facial recognition, and they display similar (though not worse) biases.

What the study found: The study tried to assess three types of facial recognition system against one another – humans, academically developed neural nets, and commercially available facial recognition services. The key findings are somewhat surprising, and can be summed up as “The performance difference between machines and humans is highly significant”. The specific findings are:
– Humans and academic models both perform better on questions with male subjects
– Humans and academic models both perform better on questions with light-skinned subjects
– Humans perform better on questions where the subject looks like they do
– Commercial APIs are phenomenally accurate at facial recognition and we could not evaluate any major disparities in their performance across racial or gender lines

What systems they tested on: They tested their systems against academic models trained on a corpus of 10,000 faces built from the CelebA dataset, as well as commercial services from Amazon (AWS Rekognition), Megvii (Megvii Face++), and Microsoft (Microsoft Azure). AWS and Megvii showed very strong performance, while Azure had slightly worse performance and a more pronounced bias towards males.

Why this matters: If computers are recapitulating the same biases as humans, but with higher accuracies, then what is the ideal form of bias these computers should have? My assumption is people want them to have no bias at all – this poses an interesting challenge, since these systems are trained on datasets that themselves have labeling errors that therefore encode human biases.
  Read more: Comparing Human and Machine Bias in Face Recognition (arXiv).

####################################################

NVIDIA releases StyleGAN3 – generated images just got a lot better:
…Up next – using generative models for videos and animation…
NVIDIA and Aalto University have built and released StyleGAN3, a powerful and flexible system for generating realistic synthetic images. StyleGAN3 is a sequel to StyleGAN2 and features “a comprehensive overhaul of all [its] signal processing aspects”. The result is “an architecture that

exhibits a more natural transformation hierarchy, where the exact sub-pixel position of each feature is exclusively inherited from the underlying coarse features.“

Finally, a company acknowledges the potential downsides: NVIDIA gets some points here for explicitly calling out some of the potential downsides of its research, putting in contrast with companies (e.g, Google) that tend to bury or erase negative statements. “Potential negative societal impacts of (image-producing) GANs include many forms of disinformation, from fake portraits in social media to propaganda videos of world leaders,” the authors write. “Our contribution eliminates certain characteristic artifacts from videos, potentially making them more convincing or deceiving, depending on the application.”
    Detection: More importantly, “in collaboration with digital forensic researchers participating in DARPA’s SemaFor program, [NVIDIA] curated a synthetic image dataset that allowed the researchers to test and validate the performance of their image detectors in advance of the public release”.
  Read more: Alias-Free Generative Adversarial Networks (arXiv).
  Get the StyleGAN3 models from here (GitHub, NVIDIA Labs).

####################################################

China’s People’s Liberation Army (and others) try to break and fix image classifiers:
…Adversarial examples competition breaks things to (eventually) fix them…
An interdisciplinary group of academics and military organizations have spent most of 2021 running a competition to try and outwit image classifiers using a technology called adversarial examples. Adversarial examples are kind of like ‘magic eye’ images for machines – they look unremarkable, but encode a different image inside them, tricking the classifier. In other words, if you wanted to come up with a technology to outwit image classification systems, you’d try and get really good at building adversarial examples. This brings me to the author list of the research paper accompanying this competition:

Those authors, in full: The authors are listed as researchers from Alibaba Group, Tsinghua University, RealAI, Shanghai Jiao Tong University, Peking university, University of Waterloo, Beijing University of Technology, Guangzhou University, Beihang University, KAIST, and the Army Engineering University of the People’s Liberation Army (emphasis mine). It’s pretty rare to see the PLA show up on papers, and I think that indicates the PLA has a strong interest in breaking image classifiers, and also building resilient ones. Makes you think!

What the competition did: The competition had three stages, where teams tried to build systems that could defeat an image classifier, then build systems that could defeat an unknown image classifier, then finally build systems that could defeat an unknown classifier while also producing images that were ranked as high quality (aka, hard to say they’d been messed with) by humans. Ten teams competed in the final round, and the winning team (‘AdvRandom’) came from Peking University and TTIC.

Best result: 82.76% – that’s the ‘attack success rate’ for AdvRandom’s system. In other words, four out of five of its images got through the filters and successfully flummoxed the systems (uh oh!).

What’s next? Because the competition yielded a bunch of effective systems for generating adversarial examples, the next competition will be about building classifiers that are robust to these attack systems. That’s a neat approach, because you can theoretically run these competitions a bunch of times, iteratively creating stronger defenses and attacks – though who knows how public future competitions may be. 

Why this matters: The intersection of AI and security is going to change the balance of power in the world. Therefore, competitions like this both tell us who is interested in this intersection (unsurprisingly, militaries – as shown here), as well as giving us a sense of what the frontier looks like.
  Read more: Unrestricted Adversarial Attacks on ImageNet Competition (arXiv).

####################################################

DeepMind makes MuJoCo FREE, making research much cheaper for everyone
…What’s the sound of a thousand simulated robot hands clapping?…
DeepMind has bought MuJoCo, a widely-used physics simulator that underpins a lot of robotics research. The strange thing is DeepMind has bought MuJoCo to make it free. You can download MuJoCo for free now, and DeepMind says in the future it’s going to develop the software as an open source project “under a permissive license”.

Why this matters: Physics is really important for robot development, because the better your physics engine, the higher the chance you can build robots in simulators then transfer them over to reality. MuJoCo has always been a widely-used tool for this purpose, but in the past its adoption was held back by the fact it was quite expensive. By making it free, DeepMind will boost the overall productivity of the AI research community.
  Read more: Opening up a physics simulator for robotics (DeepMind blog).

####################################################

Stanford builds a scalpel to use to edit language models:
…MEND lets you make precise changes on 10b-parameter systems…
Today’s large language models are big and hard to work with, what with their tens to hundreds of billions of parameters. They also sometimes make mistakes. Fixing these mistakes is a challenge, with approaches varying from stapling on expert code, to retraining on different datasets, to fine-tuning. Now, researchers with Stanford University have come up with the AI-editing equivalent of a scalpel – an approach called ‘MEND’ that lets them make very precise changes to tiny bits of knowledge within large language models.

What they did: “The primary contribution of this work is a scalable algorithm for fast model editing that can edit very large pre-trained language models by leveraging the low-rank structure of fine-tuning gradients”, they write. “MEND is a method for learning to transform the raw fine-tuning gradient into a more targeted parameter update that successfully edits a model in a single step”.
  They tested out MEND on GPT-Neo (2.7B parameters), GPT-J (6B), T5-XL (2.8B), and T5-XXL(11B), and found it “consistently produces more effective edits (higher success, lower drawdown) than existing editors”.

Not fixed… yet: Just like with human surgery, even if you have a scalpel, you might still cut in more places than you intend to. MEND is the same. Sometimes, changes enforced by MEND can lead the model to sometimes change its output “for distinct but related inputs” (though MEND seems to be less destructive and prone to errors than other systems).

Why this matters: It seems like the next few years will involve a lot of people poking and prodding increasingly massive language models (see, Microsoft’s 530billion parameter model covered in Import AI #270), so we’re going to need tools like MEND to make it easier to get more of the good things out of our models, and to make it easier to improve them on-the-fly.
  Read more: Fast Model Editing at Scale (arXiv).
  Find out more at the MEND: Fast Model Editing at Scale paper website.

####################################################

AI Ethics, with Abhishek Gupta

…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute and the AI Ethics Brief writes about AI ethics, and Jack will edit them. Feedback welcome!…

What are some fundamental properties for explainable AI systems?

… explainable AI, when done well, spans many different domains like computer science, engineering, and psychology … 

Researchers from the Information Technology Laboratory at the National Institute of Standards and Technology (NIST), propose four traits that good, explainable AI systems should have. These principles are: explanation, meaningfulness, explanation accuracy, and knowledge limits.

Explanation: A system that delivers accompanying evidence or reasons for outcomes and processes. The degree of detail (sparse to extensive), the degree of interaction between the human and the machine (declarative, one-way, and two-way), and the format of the explanation visual, audio, verbal, etc. are all important considerations in the efficacy of explainable AI systems.  

Meaningfulness: A system that provides explanations that are understandable to the intended consumers. The document points out how meaningfulness itself can change as consumers gain experience with the system over time.

Explanation Accuracy: This requires staying true to the reason for generating a particular output or accurately reflecting the process of the system. 

Knowledge Limits: A system that only operates under conditions for which it has been designed and it has sufficient confidence in its output. “This principle can increase trust in a system by preventing misleading, dangerous, or unjust outputs.”

Why it matters: There are increased calls for explainable AI systems, either because of domain-specific regulatory requirements, such as in finance, or through broader incoming legislations that mandate trustworthy AI systems, part of which is explainability. There are many different techniques that can help to achieve explainability, but having a solid framework to assess various approaches and ensure comprehensiveness is going to be important to get users to trust these systems. More importantly, in cases where little guidance is provided by regulations and other requirements, such a framework provides adequate scaffolding to build confidence in one’s approach to designing, developing, and deploying explainable AI systems that achieve their goals of evoking trust in their users.      Read more: Draft NISTIR 8312 – Four Principles of Explainable Artificial Intelligence

####################################################

Tech Tales:

Generative Fear
[America, 2028]

It started with ten movie theatres, a captive audience, and a pile of money. That’s how the seeds of the Fear Model (FM) were laid.

Each member of the audience was paid about double the minimum wage and, in exchange, was wired up with pulse sensors and the cinema screen was ringed by cameras, which were all trained on the pupils of the audience members. In this way, the Fear Model developers could build a dataset that linked indications of mental and psychological distress in the audience with moments transpiring onscreen in a variety of different films.

Ten movie theatres were rented, and they screened films for around 20 hours a day, every day, for a year. This generated a little over 70,000 hours of data over the course of the year – data which consisted of footage from films, paired with indications of when people were afraid, aroused, surprised, shocked, and so on. They then sub-sampled the ‘fear’ moments from this dataset, isolating the parts of the films which prompted the greatest degree of fear/horror/anxiety/shock.

With this dataset, they trained the Fear Model. It was a multimodal model, trained on audio, imagery, and also the aligned scripts from the films. Then, they used this model to ‘finetune’ other media they were producing, warping footage into more frightening directions, dosing sounds with additional screams, and adding little flourishes to scripts that seemed to help actors and directors wring more drama out of their material.

The Fear Model was subsequently licensed to a major media conglomerate, which is reported to be using it to adjust various sound, vision, and text installations throughout its theme parks.

Things that inspired this story: Generative adversarial networks; distillation; learning from human preferences; crowdwork; the ever-richer intersection of AI and entertainment.