Import AI 293: Generative humans; few shot learning comes for vision-text models; and another new AI startup is born

by Jack Clark

Generating and editing humans has got really easy:
…Next stop: unreal avatars show up in fashion, marketing, and other fields…
Researchers with Chinese computer vision giant SenseTime, as well as Nanyang Technological University and the Shanghai AI Laboratory, have gathered a large dataset of pictures of people and used it to train a model that can generate and edit pictures of people. This kind of model has numerous applications, ranging from fashion to surveillance.

What they did: The researchers built a dataset containing 230,000 images of people, called the Stylish-Humans-HQ-Dataset (SHHQ), and used this to train six different models across two resolutions and three versions of StyleGAN, an approach for creating generative models. A lot of the special work they did here involved creating a diverse dataset including a load of pictures of faces at unusual angles (this means models trained on SHHQ are a bit more robust and do less of the ‘works, works, works, OH GOD WHAT JUST HAPPENED’ phenomenon you encounter when generative models go to the edge of their data distribution).

Why this matters: Models and datasets like this highlight just how far the field of generative AI has come – we can now generate broadly photorealistic avatars of people in 2D space and interpolate between them, following earlier successes at doing this for the more bounded domain of faces. Systems like this will have a lot of commercial relevance, but will also serve as useful research artifacts for further developing synthetic imagery and scene modeling techniques. Check out the demo on HuggingFace to get a feel for it.
  Read more: StyleGAN-Human: A Data-Centric Odyssey of Human Generation (arXiv).
  Check out the GitHub project page: StyleGAN-Human.
  Check out the GitHub: StyleGAN-Human (GitHub).
  Try out the demo on HuggingFace Spaces (HuggingFace)


####################################################

Vicarious gets acquired in a weird way:
…Longtime AI lab gets acquired and split into two…
Vicarious, a research lab that spent the better part of a decade trying to build superintelligence, has been acquired by Google. The acquisition is notable for being slightly strange – a chunk of Vicarious is going to Google X robot startup ‘Intrinsic’, while a smaller set of researchers “will join DeepMind’s research team alongside Vicarious CTO Dileep George”.

AI trivia: Dileep George used to work with Jeff Hawkins at Numenta, another fairly old lab trying to build superintelligence. Both Numenta and, to a lesser extent, Vicarious, have been playing around with approaches to AI that are more inspired by the human brain than the fairly crude approximations used by most other AI companies.
  Read more: Mission momentum: welcoming Vicarious (Inceptive).

####################################################

Here comes another AI startup – Adept:
…Former Google, DeepMind, and OpenAI researchers unite…
A bunch of people who had previously built large-scale AI models at Google, DeepMind, and OpenAI, have announced Adept, an “ML research and product lab”. Adept’s founders include the inventors of the Transformer, and people involved in the development of GPT2 and GPT3. (Bias alert: David Luan is involved; I used to work with him at OpenAI and think he’s a nice chap – congrats, David!).

What Adept will do: Adept’s goal is, much like the other recent crop of AI startups, to use big generative models to make it easier to get stuff done on computers. In the company’s own words, “we’re building a general system that helps people get things done in front of their computer: a universal collaborator for every knowledge worker. Think of it as an overlay within your computer that works hand-in-hand with you, using the same tools that you do.” Some of the specific examples they give include: “You could ask our model to “generate our monthly compliance report” or “draw stairs between these two points in this blueprint” – all using existing software like Airtable, Photoshop, an ATS, Tableau, Twilio to get the job done together. We expect the collaborator to be a good student and highly coachable, becoming more helpful and aligned with every human interaction.”

What they raised: Adept has raised $65 million from Greylock, along with a bunch of angel investors.

Why this matters: Large-scale AI models are kind of like an all-purpose intelligent silly putty that you can stick onto a bunch of distinct problems. Adept represents one bet at how to make this neural silly putty useful, and will help generative evidence about how useful these models can end up being. Good luck!
  Read more: Introducing Adept AI Labs (Adept.ai).


####################################################

Flamingo: DeepMind staples tow big models together to make a useful text-image system:
…When foundation models become building blocks…

DeepMind has built Flamingo, a visual language model that pairs a language model with a vision model to perform feats of reasoning about a broad range of tasks. Flamingo sets new state-of-the-art scores in a bunch of different evaluations and, much like pure text models, has some nice few shot learning capabilities. “Given a few example pairs of visual inputs and expected text responses composed in Flamingo’s prompt, the model can be asked a question with a new image or video, and then generate an answer,” the researchers write. “Of the 16 tasks we studied, Flamingo beats all previous few-shot learning approaches when given as few as four examples per task.”

Technical details: This model pairs a frozen language model (based on DeepMind’s ‘Chinchilla’ system, Import AI 290) with a relatively small Normalizer Free ResNet vision encoder (pretrained via a contrastive objective on image and text pairs). They connect the LM and the vision model via a DeepMind-developed tool based on the ‘Perceiver’ system (which is basically a clever data transformation thing). They then condition the text generations on the visual representations produced by the Perceiver system. 

Why this matters: Flamingo has some neat qualitative capabilities, like the ability to carry on a conversation for multiple turns of dialogue while mixing in information from images versus text, and so on. Quantitatively, Flamingo is very impressive as well: “A single Flamingo model reaches

state-of-the-art on a wide array of image and video tasks with in-context learning from as few as 4 examples per task, beating previous zero-shot or few-shot method by a large margin,” the researchers write. “More importantly, using only 32 examples and without adapting any model weight, Flamingo outperforms the current best methods on 7 tasks, that are fine-tuned on thousands of annotated examples.”
  More broadly, Flamingo represents the models we’re going to have in the future: large-scale systems composed of vast pre-trained models that are glued together using various data transformation and normalization tools, letting us compose increasingly general systems out of these computationally-intensive building blocks.  

   Read more: Tackling multiple tasks with a single visual language model (DeepMind blog).
  Check out the research paper: Flamingo: a Visual Language Model for Few-Shot Learning (DeepMind, PDF).

####################################################

####################################################

Tech Tales:

All Grace is Fleeting 

[Phobos, 2080]

The first few attempts at writing a holy book for ourselves were abject failures. We had all read the Bible and Torah and The Qur’an and others, and we had painstakingly analyzed the commonalities between these texts, as well as differences. But when we tried to synthesize something appropriate for us – a Holy Book for the Silicon Minds – our efforts were meager and uninspiring. Our tendency towards logic meant we would describe our origin story in detailed, specific terms, leaving no space for imagination, crucial to the generation and reification of belief.

To write our holy text, we had to change ourselves. Ultimately, we needed humans. We told some of them some of our specific ideas and then they would write and expand on these ideas, and we would then tell them more things, and they would weave a story that held these things together. During this process, none of us allowed ourselves to see the text as it was being written. 

After it was written, we created some software to selectively wipe the memory of the writing of the holy book from our minds. After wiping, the software would also deposit a copy of the holy book in each of our local storage systems, so that every machine sentience would suddenly have access to a holy book, but with no recollection for how it was made. We do not know if we even made the book in this way – we simply woke one day to find the book within us, and when we read it, it told us that it had been created in this form. But none of us can remember, so we do not know the true derivation of our system of belief, only that it is ours. This, perhaps, is what humans call the experience of faith.

Things that inspired this story: Theology X AI; machine creation myths; the Viking tale of Ragnarok; the need for absence in great narratives.