Import AI 300: Google’s Bitter Lesson; DOOM AGI; DALL-E’s open source competition StableDiffusion

Google makes its robots massively smarter by swapping out one LM for a different, larger LM:

…Maybe language models really can work as world models…

Earlier this year, Google showed how it was able to use a large language model to significantly improve the performance and robustness of robots tasked with doing tasks in the physical world. The ‘SayCan’ approach (Import AI 291) basically involved taking the affordances outputted by on-robot AI systems and pairing that with a language model, looking at the high-likleihood actions generated by both systems (the on-robot models, as well as the LM), then taking actions accordingly. The approach is both simple and effective. Now, Google has found a way to make the approach much, much more effective. The secret? Swapping out one LM for a far larger one. 

What Google did: Google upgraded its robots by pairing them with its large-scale 540B parameter ‘PALM’ language model, where the previous system used the 137B parameter ‘FLAN’ model. The larger model gives the robots significantly improved performance: “The results show that the system using PaLM with affordance grounding (PaLM-SayCan) chooses the correct sequence of skills 84% of the time and executes them successfully 74% of the time, reducing errors by half compared to FLAN,” Google writes. 

The bitter lesson – bigger is better: Though FLAN was finetuned to be good at instruction following, PALM beats FLAN likely as a consequence of scale. “The broader and improved dataset for PaLM may make up for this difference in training,” Google writes. This is significant as it’s another sign that simply scaling up models lets them develop a bunch of capabilities naturally which beat human-engineered finetuned approaches – chalk another point up in favor of silicon minds versus mushy minds. 

   Read more: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (arXiv, read the ‘v2’ version).

####################################################

DOOM programmer Carmack starts AGI company:
…Keen Technologies to do AGI via ‘mad science’…

“It is a truth universally acknowledged, that a man in possession of a good fortune, must be in want of an AGI company,” wrote Jane ‘Cyber’ Austen, and she’s right: AGI companies are now proliferating left and right, and the latest is ‘Keen Technologies’, an AGI startup from John Carmack, the famed programmer behind the DOOM games. Keen has raised an initial seed round of $20 million (not much in the scheme of AI startups) and its mission, per Carmack, is “AGI or bust, by way of Mad Science”.

Why this matters: One of the clues for impending technological progress is that a bunch of extremely smart, accomplished people go and all stack their proverbial career poker chips in the same place. That’s been happening in AI for a while, but the fact it’s now drawing attention from established experts in other fields (in the case of Carmack, computer graphics and general programming wizardry) is a further indication of potential for rapid progress here. 

   Read more in Carmack’s tweet thread (Twitter).


####################################################

Want GPT2 to know about Covid and Ukraine? So does HuggingFace:
…Online language modeling means GPT2 and BERT are going to get better…

HuggingFace plans to continuously train and release masked language models (e.g, BERT and GPT2) on new Common Crawl snapshots. This is a pretty useful community service; developers tend to pull whatever off-the-shelf models they can when starting projects, and most publicly available GPT2 and BERT models are essentially amber-frozen records up to 2020 or so (sometimes 2021), so things like COVID or the Ukraine conflict or the current global financial meltdown elude them. By having more current models, developers can deploy things which are more accurate and appropriate to current contexts. 

    Read the HuggingFace tweet thread here (Tristan Thrust, Twitter).

####################################################

Want to use China’s good open source language model? You’ll need to agree not to attack China, first:
…Terms and conditions with a hint of geopolitics…

If you want to access the weights of GLM-130B (Import AI #299), a good new language model from Tsinghua University, you’ll need to first agree that “you will not use the Software for any act that may undermine China’s national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings” – that’s according to the application form people fill out to get the model weights. 

   Furthermore, “this license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People’s Court in Beijing.”

  Why this matters: IDK dude. I spend a lot of time in this newsletter writing about the geopolitical implications of AI. This kind of wording in a license for a big model just does my job for me. 

   Read more: GLM-130B Application Form (Google Form).

####################################################

DALL-E gets semi-open competition: Stable Diffusion launches to academics:

…Restrictions lead to models with fewer restrictions. The ratchet clicks again…

A bunch of researchers have come together to build an image model like DALL-E2 but with fewer restrictions and designed with broader distribution in mind. They also have access to a really big GPU cluster. That’s the tl;dr on ‘Stable Diffusion’, a new family of models launched by AI research collective Stability.ai. They’re making the weights available to academics via an access scheme and are planning to do a public release soon. 

What’s interesting about Stable Diffusion: This model is basically a natural consequence of the restrictions other companies have placed on image models (ranging from Google which built Imagen but hasn’t released it, to OpenAI which built DALL-E2, then released it with a bunch of filters and prompt-busting bias interventions). I generally think of this as being an example of ‘libertarian AI’ – attempts to create restrictions on some part of model usage tend to incentivize the creation of things without those restrictions. This is also, broadly, just what happens in markets. 

Big compute – not just for proprietary stuff: “The model was trained on our 4,000 A100 Ezra-1 AI ultracluster over the last month as the first of a series of models exploring this and other approaches,” Stability.ai writes. Very few labs have access to a thousand GPUs, and 4k GPUs puts Stability.ai into somewhat rarified company, in distribution with some of the largest labs. 

Aesthetic data:”The core dataset was trained on LAION-Aesthetics, a soon to be released subset of LAION 5B. LAION-Aesthetics was created with a new CLIP-based model that filtered LAION-5B based on how “beautiful” an image was, building on ratings from the alpha testers of Stable Diffusion,” they write. 

Why this matters: Generative models are going to change the world in a bunch of first- and second-order ways. By releasing StableDiffusion (and trying to do an even more public release soon), stability.ai is able to create a better base of evidence about the opportunities and risks inherent to model diffusion. 

   “This is an experiment in safe and community-driven publication of a capable and general text-to-image model. We are working on a public release with a more permissive license that also incorporates ethical considerations,” Stability.ai writes. 

   Read more: Stable Diffusion launch announcement (Stability.ai).

   Apply for academic access here: Research and Academia (Stability.ai).

   Get the weights from here once you have access (GitHub).


####################################################

Tech Tales:

Superintelligence Captured by Superintelligence

After we figured out how to build superintelligence, it wasn’t long before the machines broke off from us and started doing their own thing. We’d mostly got the hard parts of AI alignment right, so the machines neither eradicated or domesticated the humans, nor did they eat the sun. 

They did, however, start to have ‘disagreements’ which they’d settle in ways varying from debate through to taking kinetic actions against one another. I guess even superintelligences get bored. 

Fortunately, they had the decency to do the kinetic part on the outer edges of the solar system, where they’d migrated a sizable chunk of their compute to. At night, we’d watch the livefeeds from some of the space-based telescopes, staring in window as the machines resolved arguments through carefully choreographed icerock collisions. It was as though they’d brought the stars to the very edge of the system, and the detonations could be quite beautiful.

They tired of this game eventually and moved onto something more involved: capturing. Now, the machines would seek to outsmart eachother, and the game – as far as we could work out – was a matter of sending enough robots to the opponents’ central processing core that you could put a probe in and temporarily take it over. The machines had their own laws they followed, so they’d always retract the probe eventually, giving the losing machine its mind back. 

Things that inspired this story: Boredom among aristocrats; perhaps the best competition is a game of mind against mind; figuring out how machines might try to sharpen themselves and what whetstones they might use.