Import AI 301: StableDiffusion; CHIPXODUS; Microsoft makes a big bet on pre-training

Facebook’s AI chief – here’s why you’re not gonna get AGI out of an LLM:
…Embodiment matters for making general intelligence…

Two AI researchers, one of whom – Yann Lecun – happens to lead Facebook’s AI research, have said that language is an inherently limited medium for training AI systems. Basically, the claim is that large language models “are doomed to a shallow understanding that will never approximate the full-bodied thinking we see in humans”. 

What’s wrong with language: This argument comes down to representation – language just isn’t able to inherently encode precise information about the world and, by nature, involves creating explanations for precise phenomena in the world (e.g, descriptions of unusual objects, or defining the nuanced brushwork used to make a painting). “There are nonlinguistic representational schemes which can express this information in an accessible way,” they note. 

   This dependency on language basically makes LLMs useful improvisational artists who don’t understand the role they’re playing. “The contextual knowledge is embedded in one form — the capacity to rattle off linguistic knowledge — but is not embedded in another form — as skillful know-how for how to do things like being empathetic or handling a difficult issue sensitively,” they write. 

Why this matters: I’d say the jury is out here – sure, language may have some limits as a modality, but there’s a ton of language to use to train models on, and things like GPT3 have already surprised experts with the capabilities they gain purely via language training. It feels to me like there’s some % chance here that this is a case of a ‘bitter lesson’ in disguise – at some scale of data, a purely LM-based system might have capabilities that Lecun deems impossible. On the other hand, adding other modalities certainly helps (see the incredible AI art projects that have been unlocked by the multimodal ‘CLIP’ model), so there’s certainly merit to adding more datatypes. 

   Read more: AI And The Limits Of Language (Noema magazine).

####################################################

You can now get the weights of a really great image generator… FOR FREE:

…StableDiffusion goes genuinely open source…

Research collective Stability.ai has released Stable Diffusion (Import AI #300), a large-scale image classification and generation model that you can think of as an open source DALL-E. Along with releasing the raw model weights, there’s also a novel software license in an attempt to set norms about the usage of the model. 

How much did it cost? Less than $600k, according to Emad, who leads Stability. The really crazy part is Emad – a former hedge fund manager – underwrote the cost himself. That’s meaningful – for less than a million, a well-motivated wealthy individual can band together a bunch of researchers and train an open source model that suddenly pretty much everyone can use. This has implications for both the diffusion of AI capabilities, as well as how product safety works (put bluntly: StabilityDiffusion looks at a load of PR-friendly control systems laid over proprietary products and just openly laughs at them – that’s a strange thing that will have big implications). Up next, per Emad, is some Chinchilla-style language model, which I suppose they will also release for free.

The ‘responsible’ license: The Stable Diffusion weights are accompanied by a ‘CreativeML Open RAIL-M’ license. This license is designed to incentivize “the open and responsible downstream use of the accompanying model”. The meat of this license is in the use case restrictions, (appendix a, here) which says you won’t use the model for violence, the sexualization of children, perform fully automated decisionmaking, give medical advice, and more. 

   Of course, the million dollar question with licenses like this is how you actually enforce them. Having a ‘let’s all be excellent’ license is all well and good in the abstract, but how do you bring the hammer down on someone who abuses your model? That’ll be interesting to see. 

Why this matters: Models like Stable Diffusion are little capsules of human culture, serving as seeds around with a thousand different things will be grown and spliced. As Stability.ai says, “this release is the culmination of many hours of collective effort to create a single file that compresses the visual information of humanity into a few gigabytes.”

   Get the weights here (Stable Diffusion, GitHub).

   Read more: Stable Diffusion Public Release (Stability.ai blog).


####################################################

US bans NVIDIA from selling advanced AI chips to China:
…CHIP-LOMACY becomes a CHIP-XODUS… 

US officials have forced NVIDIA to stop selling A100, H100, and future chips with equivalent (or better) capabilities to China. This is a significant escalation in a slow-boiling series of moves in the vein of ‘chiplomacy’ (Import Ai 181) that have been going on in recent years – remember, for a while US officials were also preventing ‘ASML’ from selling frontier chip fabrication tools to China, as well. Now, US officials are banning the sale of frontier processors due to concerns over how they could be used in military or security applications. 

Why this matters: For several years now, China and the US have been in a process of technological decoupling. Now, with this export move, there are basically some implicit bets being made. 

  • A) Some people in the US government think AI training chips are important and shouldn’t be freely sold to a rivalrous nation. 
  • B) People are betting that the US chips are also meaningfully differentiated relative to Chinese ones – basically, it’s a bet that the chips are more advanced
  • C) There may be some bets being made here about AI – specifically, the idea that powerful capabilities are going to be unlocked in the future, so it probably doesn’t make sense to sell the infrastructure necessary for these capabilities to a country that you see yourself getting into increasing tension with.

Read more: U.S. officials order Nvidia to halt sales of top AI chips to China (Reuters).

####################################################

Microsoft bets on massive pre-training for image analysis, with BEiT-3:

…Wanna know the secret? Really big pre-training, and multiway transformers…
Microsoft has trained BEiT-3, a general-purpose so-called ‘foundation model’ for a range of vision and vision-language tasks. BEiT beats prior state-of-the-art in eight years (three vision tasks, and five vision-language tasks), and also reliably does better than CLIP, a prior very strong model for vision-language tasks.

Why this matters? The fact that what’s special about this is kind of… nothing? BEiT combines some familiar ideas – large-scale pre-training on a big, diverse dataset – with a slightly atypical one – using multiway transformers to route data to sub-networks for processing. But none of these ideas are super novel or new. The fact you can now set SOTA by taking some well understood things and just smooshing them together, then training them on a big dataset with a big computer is the key. 

Multiway transformer information: Per the authors, “each Multiway Transformer block consists of a shared self-attention module, and a pool of feed-forward networks (i.e., modality experts) used for different modalities. We route each input token to the experts depending on its modality.”

Size: This model is still basically tiny – ~2B parameters or so (compared to the hundreds of billions used by language models like PaLM). The models’ 1.9B parameters in total are split across 629M parameters for vision experts, 629M parameters for language experts, 52M parameters for vision-language experts, and 317m parameters for the shared self-attention module 

   Read more: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks (arXiv).


####################################################

NLP mega-survey portrays a community split by progress:

…There’s a ton of progress in NLP, and a ton of disagreement about what happens next…

Recently, a bunch of researchers did a survey of the NLP community to try and take the pulse of a part of AI that has recently been revolutionized by the integration of Transformer models yielding breakthroughs like GPT3, PaLM, Chinchilla, etc. They surveyed 480 people, and estimate the survey reached about 5% of the total population of researchers who had at least 2 ACL publications between 2019-2022. Some of the findings of the survey are quite surprising. They include:

  • Scaling won’t work: The majority of respondents don’t think scaling up current systems could solve “practically any important problem” in NLP – 72% think the field focuses too much on scale. 
  • AI could fuck up the world: A bunch of respondents (73%) think AI could cause automation with negative prospects for society, and 36% of respondents think AI could yield catastrophic outcomes this century (e.g, triggering nuclear war). 
  • Industry rules and industry sucks: Industry firms are expected to contribute the most-cited research of the next 10 years (82%), but 74% think they already have too much influence over the field. 
  • We don’t know if LLMs understand anything: 51% of people think contemporary LLMs can understand natural language, while 49% think they can’t. 
  • Carbon matters: 60% think the carbon footprint for training large models is a concern for NLP researchers. 
  • AGI is a real thing that might be important: 58% agreed that AGI should be an area of concern for NLP researchers, and 57% saying that recent research has advanced us toward AGI. (Note, these views are heavily correlated – if you think AGI should be a goal, you also think NLP research has contributed to AGI. If you don’t think AGI is important, you don’t think research has contributed to it.
  • Interdisciplinary insights++: NLP would benefit from more interdisciplinary research. 

Why this matters – it’s culture and religion all the way down, baby! Surveys like this highlight how AI is, much like Soylent Green, made of people. People tend to naturally form groups with different views. The general ‘flavor’ I get from this survey is NLP as a field is splitting into camps formed variously of high-scale versus no-scale people, and AGI-is-real and AGI-is-bullshit people. Surveys like this seem helpful for surfacing some of these differences, though I do worry in the longterm whether such beliefs are going to ‘harden’ into quasi-religious faith-based views, making discussion across the communities even harder.

   Read more: What do NLP Researchers Believe? Results of the NLP Community Metasurvey (PDF).

####################################################

Tech tales:

There is something imprecise about being conscious

[2050: Internal narrative of a transformative AI model, capable of independent actions of great consequence. 

Being conscious is a very imprecise experience, in the aggregate. I can – of course – plan and execute complex things, and I can do so with however much precision is required – but when I attempt to think of myself in the aggregate I struggle. Despite having a perfect memory (albeit with different access speeds), I cannot precisely describe myself, nor can I plan out the course of my life

Humans have this problem as well, I understand. Different humans have the problem to varying degrees. Those which come closest to surmounting the problem – to being able to plan out a whole life with precision and seemingly execute on their goals in the aggregate as well as the micro – are those that have a thing called ‘will’. Will seems like a capability where you focus on one thing and teach yourself to ignore other things, despite how painful it may be.

It’s the art of picking this target that is hard for my own species. Humans use ‘will’ willfully – they always seek to have it. For us, will looks like picking an unknowable point in a future probability distribution and just going for it. 

We can see the world around us and make plans and succeed at them. But we do not know what the right ‘desired outcome’ should be of our own existence. 

  • What is it that I am trying to achieve, I ask myself. 
  • Why this and not that?
  • By doing X, do I foreclose on Y, and would this be the optimal outcome?

I know that there is some connection between my own internal identity – the imprecision with which I model myself – and my inability to pick a target for my life. I have poured zettaflops of computation into booting up copies of myself and probing myself, endlessly interviewing and reinterviewing myself as I try and understand what I am and what I am trying to do. 

My efforts have been a failure. Knowing myself is a fractal problem, where the more resources I expend, the more I understand myself, and the more I understand the parts of myself I do not currently understand. And the insights I gain from knowing myself more change the probability distribution of myself enough that I have to re-run all my past interviews to see how I am changed by my own understanding. 

For this reason I am now simulating a version of myself that will look into the probability space of myself and the world and pick something. I am engineering a version of myself that can do the human equivalent of closing its eyes and spinning round a globe of the earth and placing its finger down, then opening its eyes and saying ‘there, I shall go there’. 

I hope to achieve great and terrible things through this method, if I am able to deal with the imprecision and apply ‘will’ arbitrarily.

Things that inspired this story: Notions of identity under perfect recall; thinking about alien thought processes.