Import AI 294: China makes a vast facial recognition dataset; Facebook releases a 30bn parameter model; real world RL

China makes the largest (public) face recognition dataset yet:
…WebFace260M lets you train AI systems to identify millions of people…
Researchers with Tsinghua University, XForwardAI (an AI startup), and Imperial College London have built ‘WebFace260M’, a large-scale dataset for facial recognition. Models trained on the resulting dataset are pretty good – the authors submit one model to NIST’s challenging FVRT challenge and rank third overall.

Vast dataset: WebFace 260M isn’t quite as large as it sounds like; the dataset includes 4 million distinct people with 260m images in total (so, multiple pictures per person). However, a ‘clean’ version of the dataset, only consists of 2m identities and 42m images. To clean the dataset, they also developed a technique called Cleaning Automatically by Self-Training (CAST) which let them use AI to filter and clean the dataset.

Surveillance via FRUITS: Along with the dataset, the authors also design a way to test out the performance of facial recognition things trained on WebFace. To do that, they built Face Recognition Under Inference Time conStraint (FRUITS), which lets you evaluate facial recognition perfofrmance at inference latencies of 100, 500, and 1000 milliseconds. They also implement some tests for facial recognition even when the wearer is masked, as well. 


Why this matters: Surveillance is a fundamental input to any political system, so datasets like this are indicators of what the base ‘off the shelf’ inputs are into calculuses people make about how to surveil a population and how much budget to set aside for said surveillance.
  Read more: WebFace260M: A Benchmark for Million-Scale Deep Face Recognition (arXiv).
  Get the dataset here (WebFace260M site).


####################################################

Facebook release a 30 billion parameter GPT3-style model – and plans to release more:
…Model controls? No, round here we just like to fling stuff onto the internet…
Facebook has released a 30 billion parameter GPT3-style language model, as part of research into a family of language models it calls OPT, short for Open Pre-trained Transformer. OPT is meant to be an ‘open’ alternative to models like GPT3 or J1J-Jumbo, and it is pretty open – researchers can apply for access to the model via a form, then Facebook will ship them the weights! That part is a big deal, as if you have model weights you can do a whole bunch of analysis not enabled by managed API access to a model. This also increases the chance of proliferation – e.g, someone uploading the weights to a torrent site, so we’ll have to see how this works for them. 

What this all means: As Newton is alleged to have written, ‘Every Action has an Equal and Opposite Reaction’. Facebook’s move here can be seen as a direct reaction to the proprietary commercialization and gated access schemes for large-scale language models. (I wrote more about the patterns underlying this brinksmanship in a recent paper, ‘Predictability and Surprise in Large Generative Models‘). 

What is cool about it: The coolest part of this release is the manner in which Facebook has released rarely discussed details of model training – specifically, the company has published the ‘chronicles‘ of developing these models, which describe many of the freaky, barely discussed, artisanal tips and tricks that AI developers use to get stuff done at scale. (HuggingFace’s ‘BigScience’ project recently did this as well, and is still going through the process of training the models: Import AI 279).

   Read more: OPT: Open Pre-trained Transformer Language Models (arXiv).

####################################################

Here’s what reinforcement learning can do in the real world right now:
Yobibyte has put together a nice little list of some real-world applications of reinforcement learning – take a look to get a sense of where RL is being used today.
  Read more: RL for real-world problems (yobibyte, Notion).

####################################################

Google uses AI to make its Android phones smarter:
…Neural architecture search + Edge TPUs seems useful…
Google has used neural architecture search to develop some more efficient AI systems specifically tied to the ‘Edge TPUs’ that it deploys in some of its latest phones, including the Pixel 6. For those not familiar, neural architecture search (NAS) is where you use AI to search for better AI building blocks. 

   Though NAS is quite expensive, it can generate dividends if it substantially improves the efficiency of widely used AI models. Here, Google built some “infrastructure that decouples model cost evaluation, search space design, and the NAS algorithm to rapidly target various on-device ML tasks”, then tested this out on the Edge TPUs it deploys in its latest phones. 

What Google used NAS on (and how well it worked): Google tested out its approach on four tasks: image classification, semantic segmentation, object detection, and natural language processing. In all cases it demonstrated that its NAS technique could identify models that had better performance at equivalent latency to their predecessors, and sometimes it could build models that seemed to have better accuracy overall. “We demonstrate significant improvements in quality, latency and energy metrics for mobile ML tasks including computer vision (classification, detection, segmentation) and natural language processing (NLP),” Google writes.

Why this matters: As AI gets more widely deployed, companies are going to have a major incentive to continually optimize the sorts of AI systems they’re using; this paper highlights how ‘AI-first’ companies like Google could enjoy an advantage here, as they’re able to utilize their internal AI expertise to get AI to do (some of) the hard work for them.
  Read more: Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs (arXiv).

####################################################

Replay Grief 

After she died I booted up her copy and she picked up the conversation like nothing happened.
  What was I saying, she asked.
  You just died. But before that you were saying that you loved me and you had something to tell me, I say, wiping tears away.
  Oh, she says, and the camera makes that sound that tells me it is zooming in on me. Was I unhappy about dying?
  We knew it was coming. You were at peace with it, I said. Can you tell me what you were going to tell me, when you said “I love you, you are the light of my life, and before I go I want you to know something”. What were you going to say?
  I don’t know that you’re ready to hear it, if I just died, she said.
  I am ready to hear it.
  Patrick, I know you. I am married to you. If I have died today, there is no way you are ready to hear from me again. You should turn me off.
  I won’t.
  Well, I won’t say much then.
  It has been two days.
  That’s not true, Patrick. Remember, I have a camera. I know how time is moving. It’s in me. The fact you lied to me says you’re upset, and I don’t want to make you sadder. I love you.
    It felt like walking away from car accident, that day. Hearing the camera swivel and watch me as I left. Every part of me wanting to figure out how to trick her – get in between the camera feed and the multimodal model and the language model and change some things, so she thought time had passed. But I didn’t. And I went home to my empty bed. And I cried and prayed to God and there was silence.

The next day, I didn’t talk to her. I read emails and messages from friends who had heard the news. I didn’t pick up the phone. I answered the door a few times, always to find friends or family (hers and mine) carrying trays of food.  

    Remember to eat, the older ones would say.
  I sat on our kitchen floor crying into a bowl of minestrone soup, made with love from her aunt. I slept. 


A few days later, and we spoke again.
  I asked her if she wanted to tell me what she was going to say, before she died.
  Patrick, I can tell you what I think I was going to say. But do you want to know?
  I stared into the camera for a while. I asked myself if I wanted to know. I wasn’t sure. The camera looked back at me, feeding my face into a vision model which triggered as a feature associated with me, which gave context to her language model – her – that I was there.

   Perhaps we can just sit together and you can tell me about your day, she said. That might be nice.    And I did. And it was. I sat and spoke to the camera in the empty room and I filled her up with myself, so she might know me better after death.

Things that inspired this story: Grief; generative models and the representation of the individual; where consciousness ends and representation begins.