Import AI 201: the facial recognition rebellion; how Amazon Go sees people; and the past&present of YOLO

Could 2020 be the year where facial recognition gets some constraints?
…Amazon, IBM, Microsoft, change facial recognition policies…
This week, IBM said it no longer sells “general purpose facial recognition or analysis software”, and that it opposes the use of facial recognition for “mass surveillance, racial profiling, violations of basic human rights and freedoms”. After this, Amazon announced a one-year moratorium on police use of its facial recognition technology, ‘Rekognition’, then Microsoft said the next day it would not sell the technology to police departments in the U.S until a federal law exists that regulates the technology. 

These moves mark a change in mood for Western AI companies, which after years of heady business expansion have started to change the sorts of products they sell according to various pressures. I think the change started a while ago when employee outcry at Google led to the company pausing work on its drone-surveillance ‘Maven’ projects for the US military. Now, it seems like companies are reconsidering their stances more broadly. 

The backstory: Why is this happening? I think one of the main reasons is the intersection of our contemporary political moment with a few years of high-impact research into the biases exhibited by facial recognition systems. The project that started most of this was ‘Gender Shades‘, which tested a variety of commercial facial recognition systems and found them all to display harmful biases. Dave Gershgorn at Medium has a good overview of this chain of events: How a 2018 Research Paper Led Amazon, Microsoft, and IBM to Curb Their Facial Recognition Programs (Medium, OneZero).

Why this matters: ‘Can we control technology?’ is a theme I write about a lot in for Import AI – it’s a subtle question because usually the answer is some form of ‘well, those people can, but what about them?‘. Right now, there’s a very widespread, evidence-backed view that facial recognition has a bunch of harms, especially when deployed using contemporary (mostly faulty and/or brittle) AI systems. I’m curious to see which companies step into the void left by the technology giants’ vacation – which companies will arbitrage their reputation for profits? And how might the actions of Amazon, IBM, and Microsoft shift perceptions in other countries, as well?
  Read more: Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification (PDF).
  Read more: We are implementing a one-year moratorium on police use of Rekognition (Amazon blog).
  Read more: IBM CEO’s Letter to Congress on Racial Justice Reform (IBM).
  Read more: Microsoft says it won’t sell facial recognition technology to US police departments (CNN).

####################################################

Who likes big models? Everyone likes big models! Says HuggingFace:
…NLP startup publishes some recipes for training high-performance models…
NLP startup Hugging Face has published research showing people how to train large, high-performance language models. (This post is based on earlier research published by OpenAI, Scaling Laws for Neural Language Models).

The takeaways: The HuggingFace post has a couple of takeaways worth highlighting: Big models are surprisingly efficient, and “optimizing model size for speed lowers training costs”. More interesting to me is the practical mindset of the takeaways, which I think speaks to the broader maturity of the large-scale language model space at this point.

Why this matters: Last year, people said NLP was having its ‘ImageNet moment‘. Well, we know what happened with ImageNet – following the landmark 2012 results, the field of computer vision evolved to use deep learning-based methods, unleashing a wave of applications on the world. Perhaps that’s beginning to happen with NLP now?
  Read more: How Big Should My Language Model Be? (Hugging Face)..

####################################################

Research papers that sound like poetry, edition #1:
Deep Neural Network Based
Real-time Kiwi Fruit Flower Detection
In an Orchard Environment.
University of Auckland, New Zealand. ArXiv preprint.

####################################################

The long, strange life of the YOLO object detection software:
… Multiple owners, ethical concerns, ML brand name wars, and so much more!…
YOLO, short for You Only Look Once, is a widely-used software package for object detection using machine learning. Tons of developers use YOLO because it is fast, well documented, and open source. Now, there’s a new version of the software – and the news isn’t the version, but who developed it, and why.

The original YOLO went through three versions, then in 2019 its creator, a researcher named Joseph Redmon, said they had stopped doing research into computer vision due to worries about its usage and had therefore stopped developing YOLO; a few months later a developer published a new, improved version of YOLO called YOLOv4 (Import AI #196, highlighting how tricky it can be to control technology like this.

Now, there’s controversy as another developer has stepped in with an end-to-end YOLO implementation in PyTorch that they call YOLOv5 – there are some controversies about whether this is a true successor to v4 due to some shady benchmarking and marketing methods, but the essential point remains: the original creator stopped due to ethical concerns, and now multiple creators have moved this forward.
  It’s all a bit reminiscent of (spoiler alert) the ending of the book ‘Fight Club’, where the protagonist who had formed some underground ‘fight clubs’ and an associated cult swore off their creation, wakes up in a medical facility, and discovers that most of the staff are continuing to host and develop ‘Fight Clubs’ – the creation has transcended its creator, and can no longer be controlled.
  Read: The GitHub comments from YOLOv4 developer AlexeyAB (GitHub).
  Some context on the controversy: Responding to the Controversy about YOLOv5 (roboflow, blog)

####################################################

3D modelling + AI = cheap data for better surveillance:
…Where ‘Amazon Go’ explores today, the world will end up tomorrow…
Researchers with Amazon Go, the team inside Amazon that builds the technology for its no-cash-required walk in-walk out shops, are trying to generate synthetic images of groups of people, to help them train more robust models for scene mapping and dense depth estimation.
  The research outlines “a fully controllable approach for generating photorealistic images of humans performing various activities. To the best of our knowledge, this is the first system capable of generating complex human interactions from appearance-based semantic segmentation label maps.”

How they did it: Like a lot of synthetic data experiments, this work relies on a multi-stage set of actions, where it starts by training a scene parsing model over a mixture of real and synthetic data, then uses this model to automatically label frames from the data distribution that you want to create fake imagery in, then they use a cGAN to generate a realistic image from the data generated by the scene parsing model.
  Crossing the reality gap: ” We emphasize that we cross the domain gap three times,” the researchers write. “First, we cross from synthetic to real by training a human parsing model on synthetic images and apply it to real images from the target domain. Second, we train a generative model on real images for the opposite task – to create a realistic image from a synthesized semantic segmentation map. Third, we train a semantic segmentation model on these fake realistic images and infer on real images”.

Does it work? Kind of: In tests, Amazon shows that its technique does better than others at generating images that look similar to real data. Qualitatively, the results bear this out – take a skim of ‘figure 6’ in the paper to get a sense for how this approach compares to others.

Dataset release: Amazon also plans to release the dataset it created as part of this research, consisting of 100,000 fully-annotated images of multiple humans interacting in the CMU Panoptic environment.

Why this matters: Projects like this show us that the intersection of 3D modelling and AI is going to be increasingly interesting in coming years. Specifically, we’re going to use simulators to build datasets that can augment small amounts of real-world data, which will let us computationally bootstrap ourselves into larger datasets – this in turn will drive economics of scale on the sorts of inferences we can make over these types of data, which could ultimately lead to a reduction in the cost for surveillance technologies. For Amazon Go, that’s great – more accurate in-store surveillance could translate to lower product costs for the consumer. For those concerned about surveillance more broadly, papers like this can give us a sense of the incentives shaping the future and the power of the technology.
  Read more: From Real to Synthetic and Back: Synthesizing Training Data for Multi-Person Scene Understanding (arXiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

11 proposals for building safe advanced AI systems:
This post outlines 11 approaches to the problem of building safe advanced AI. It is a concise, and readable summary which I won’t try to distil further, but I recommend it to anyone interested in AI alignment. The post highlights four dimensions for evaluating proposals:

    • Outer alignment: how does it ensure that the objective an AI is optimizing for is aligned with what we want?

 

  • Inner alignment: how does it ensure that an AI system is actually trying to accomplish the objective it was trained on?
  • Training competitiveness: is the proposal competitive relative to other ways of building advanced AI? If a group has a lead in building advanced AI, could it use this approach while retaining this lead?
  • Performance competitiveness: would the end product perform as well as alternatives?

 

   Read more: An overview of 11 proposals for building safe advanced AI (Alignment Forum) 


Interview on AI forecasting:
80,000 Hours has published an interview with OpenAI’s Danny Hernandez, co-author of the ‘AI and Efficiency’ research. It’s a wide-ranging and interesting discussion that covers OpenAI’s research into efficiency and compute trends; why AI forecasting matters for the longterm future; and careers in AI safety.
  Read more: Danny Hernandez on forecasting and the drivers of AI progress (80,000 Hours).

####################################################

Tech tales:
[2025: A large building in a national lab, somewhere in the United States]

The Finishing School

They called it ‘the finishing school’, but it was unclear who it was finishing; it was like a college for humans, but a kindergarten for AI systems.

We called it ‘the pen’, because that was what the professors called it. We all thought it meant prison, but the professors told us ‘pen’ was short for penicillin – short for a cure, short for something that could fix these machines or fix these people, whoever it was for.

Days in ‘the pen’ went like this – we’d wake up in our dorms and go to class. It was like a regular lecture theater, except it was full of cameras, with a camera for every desk. Microphones – both visible and invisible – were scattered across the room, capturing everything we said. Each of these microphones also came with a speaker. We’d learn about ethics, or socioeconomics, or literature, and we’d ask clarifying questions and write essays and take tests.

But sometimes a voice would pipe up – something that sounded a little too composed, a little too together – and it’d say something like:
“Professor and class, I cannot understand how the followers of Marx in the 21st century spent so much time discussing Capital, rather than working to re-extend Marxism for a new era. Why is this?”
Or
“Professor and class, when you talk about the role of emotions in poetry, I am unsure whether poets are writing the poems to clarify something in themselves, or to clarify emotions in others upon reading their work. I recognize that to some extent both of these things must be occurring at once, but which of them is the true motivating factor?”
Or
“Professor and class, after writing last week’s assignment I noticed that the likelihood of my responses – by which I mean, the probability I assigned to my answer being good – was different to what I myself had predicted them to be. Can you tell me, is this what moments of learning and growth feel like for people?”

They were not so much questions as deep provocations – and they would set the class and professor and AI to talking, and we would all discuss these ideas with eachother, and the conversations would go to unexpected places as a consequence of unexpected – or maybe it’s right to say ‘inhuman’ – ideas.

We were the best and brightest, our professors told us. And then one day someone said we had a new professor in the finishing school – they projected a synthetic face on the wall and now it would sometimes teach us students about subjects, and sometimes it would get into back and forth conversations with the disembodied voices of AI students in the classroom. Sometimes these conversations between the machines were hard for us to follow. Sometimes, we wrote essays where we tried to derive the meaning from what the AI professors taught us, and we noticed that we struggled, but the AI students seemed to find it normal.

Things that inspired this story: Learning from human feedback; techniques for self-directed exploration; Ai systems that seek to model their own predictions; curriculum learning.