Import AI

Import AI 300: Google’s Bitter Lesson; DOOM AGI; DALL-E’s open source competition StableDiffusion

Google makes its robots massively smarter by swapping out one LM for a different, larger LM:

…Maybe language models really can work as world models…

Earlier this year, Google showed how it was able to use a large language model to significantly improve the performance and robustness of robots tasked with doing tasks in the physical world. The ‘SayCan’ approach (Import AI 291) basically involved taking the affordances outputted by on-robot AI systems and pairing that with a language model, looking at the high-likleihood actions generated by both systems (the on-robot models, as well as the LM), then taking actions accordingly. The approach is both simple and effective. Now, Google has found a way to make the approach much, much more effective. The secret? Swapping out one LM for a far larger one. 

What Google did: Google upgraded its robots by pairing them with its large-scale 540B parameter ‘PALM’ language model, where the previous system used the 137B parameter ‘FLAN’ model. The larger model gives the robots significantly improved performance: “The results show that the system using PaLM with affordance grounding (PaLM-SayCan) chooses the correct sequence of skills 84% of the time and executes them successfully 74% of the time, reducing errors by half compared to FLAN,” Google writes. 

The bitter lesson – bigger is better: Though FLAN was finetuned to be good at instruction following, PALM beats FLAN likely as a consequence of scale. “The broader and improved dataset for PaLM may make up for this difference in training,” Google writes. This is significant as it’s another sign that simply scaling up models lets them develop a bunch of capabilities naturally which beat human-engineered finetuned approaches – chalk another point up in favor of silicon minds versus mushy minds. 

   Read more: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (arXiv, read the ‘v2’ version).

####################################################

DOOM programmer Carmack starts AGI company:
…Keen Technologies to do AGI via ‘mad science’…

“It is a truth universally acknowledged, that a man in possession of a good fortune, must be in want of an AGI company,” wrote Jane ‘Cyber’ Austen, and she’s right: AGI companies are now proliferating left and right, and the latest is ‘Keen Technologies’, an AGI startup from John Carmack, the famed programmer behind the DOOM games. Keen has raised an initial seed round of $20 million (not much in the scheme of AI startups) and its mission, per Carmack, is “AGI or bust, by way of Mad Science”.

Why this matters: One of the clues for impending technological progress is that a bunch of extremely smart, accomplished people go and all stack their proverbial career poker chips in the same place. That’s been happening in AI for a while, but the fact it’s now drawing attention from established experts in other fields (in the case of Carmack, computer graphics and general programming wizardry) is a further indication of potential for rapid progress here. 

   Read more in Carmack’s tweet thread (Twitter).


####################################################

Want GPT2 to know about Covid and Ukraine? So does HuggingFace:
…Online language modeling means GPT2 and BERT are going to get better…

HuggingFace plans to continuously train and release masked language models (e.g, BERT and GPT2) on new Common Crawl snapshots. This is a pretty useful community service; developers tend to pull whatever off-the-shelf models they can when starting projects, and most publicly available GPT2 and BERT models are essentially amber-frozen records up to 2020 or so (sometimes 2021), so things like COVID or the Ukraine conflict or the current global financial meltdown elude them. By having more current models, developers can deploy things which are more accurate and appropriate to current contexts. 

    Read the HuggingFace tweet thread here (Tristan Thrust, Twitter).

####################################################

Want to use China’s good open source language model? You’ll need to agree not to attack China, first:
…Terms and conditions with a hint of geopolitics…

If you want to access the weights of GLM-130B (Import AI #299), a good new language model from Tsinghua University, you’ll need to first agree that “you will not use the Software for any act that may undermine China’s national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings” – that’s according to the application form people fill out to get the model weights. 

   Furthermore, “this license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People’s Court in Beijing.”

  Why this matters: IDK dude. I spend a lot of time in this newsletter writing about the geopolitical implications of AI. This kind of wording in a license for a big model just does my job for me. 

   Read more: GLM-130B Application Form (Google Form).

####################################################

DALL-E gets semi-open competition: Stable Diffusion launches to academics:

…Restrictions lead to models with fewer restrictions. The ratchet clicks again…

A bunch of researchers have come together to build an image model like DALL-E2 but with fewer restrictions and designed with broader distribution in mind. They also have access to a really big GPU cluster. That’s the tl;dr on ‘Stable Diffusion’, a new family of models launched by AI research collective Stability.ai. They’re making the weights available to academics via an access scheme and are planning to do a public release soon. 

What’s interesting about Stable Diffusion: This model is basically a natural consequence of the restrictions other companies have placed on image models (ranging from Google which built Imagen but hasn’t released it, to OpenAI which built DALL-E2, then released it with a bunch of filters and prompt-busting bias interventions). I generally think of this as being an example of ‘libertarian AI’ – attempts to create restrictions on some part of model usage tend to incentivize the creation of things without those restrictions. This is also, broadly, just what happens in markets. 

Big compute – not just for proprietary stuff: “The model was trained on our 4,000 A100 Ezra-1 AI ultracluster over the last month as the first of a series of models exploring this and other approaches,” Stability.ai writes. Very few labs have access to a thousand GPUs, and 4k GPUs puts Stability.ai into somewhat rarified company, in distribution with some of the largest labs. 

Aesthetic data:”The core dataset was trained on LAION-Aesthetics, a soon to be released subset of LAION 5B. LAION-Aesthetics was created with a new CLIP-based model that filtered LAION-5B based on how “beautiful” an image was, building on ratings from the alpha testers of Stable Diffusion,” they write. 

Why this matters: Generative models are going to change the world in a bunch of first- and second-order ways. By releasing StableDiffusion (and trying to do an even more public release soon), stability.ai is able to create a better base of evidence about the opportunities and risks inherent to model diffusion. 

   “This is an experiment in safe and community-driven publication of a capable and general text-to-image model. We are working on a public release with a more permissive license that also incorporates ethical considerations,” Stability.ai writes. 

   Read more: Stable Diffusion launch announcement (Stability.ai).

   Apply for academic access here: Research and Academia (Stability.ai).

   Get the weights from here once you have access (GitHub).


####################################################

Tech Tales:

Superintelligence Captured by Superintelligence

After we figured out how to build superintelligence, it wasn’t long before the machines broke off from us and started doing their own thing. We’d mostly got the hard parts of AI alignment right, so the machines neither eradicated or domesticated the humans, nor did they eat the sun. 

They did, however, start to have ‘disagreements’ which they’d settle in ways varying from debate through to taking kinetic actions against one another. I guess even superintelligences get bored. 

Fortunately, they had the decency to do the kinetic part on the outer edges of the solar system, where they’d migrated a sizable chunk of their compute to. At night, we’d watch the livefeeds from some of the space-based telescopes, staring in window as the machines resolved arguments through carefully choreographed icerock collisions. It was as though they’d brought the stars to the very edge of the system, and the detonations could be quite beautiful.

They tired of this game eventually and moved onto something more involved: capturing. Now, the machines would seek to outsmart eachother, and the game – as far as we could work out – was a matter of sending enough robots to the opponents’ central processing core that you could put a probe in and temporarily take it over. The machines had their own laws they followed, so they’d always retract the probe eventually, giving the losing machine its mind back. 

Things that inspired this story: Boredom among aristocrats; perhaps the best competition is a game of mind against mind; figuring out how machines might try to sharpen themselves and what whetstones they might use.

Import AI 299: The world’s best language model is Made in China; NVIDIA boosts LLM training; OpenAI shows how to ‘fill in the middle’ on a language model.

Want a 30% boost to training LLMs? Use the Nvidia Megatron update:
…Two new techniques lead to big savings…
NVIDIA has updated Nemo Megatron, software for training large language models. The updates – sequence parallelism (SP) and selective activation recomputation (SAR) – makes training large-scale neural networks significantly more efficient. 

   “The latest updates to NeMo Megatron offer 30% speed-ups for training GPT-3 models ranging in size from 22 billion to 1 trillion parameters. Training can now be done on 175 billion-parameter models using 1,024 NVIDIA A100 GPUs in just 24 days–reducing time to results by 10 days, or some 250,000 hours of GPU computing, prior to these new releases,” NVIDIA writes. 

Why this matters: By integrating basic improvements into training frameworks, NVIDIA is going to generate a large-scale impact on anyone who uses the Megatron framework. This illustrates how AI progress sometimes operates like a one-way ratchet – someone implements some changes in some increasingly widely used software, and efficiency jumps upward for all the users overnight.
   Read more: NVIDIA AI Platform Delivers Big Gains for Large Language Models (NVIDIA blog).

####################################################

Want to make a language model with a ‘fill in the middle’ option? Here’s how!
…Sentence completion is cool, but infilling is useful as well…
Here’s a straightforward paper from OpenAI that describes how to give language models the ability to learn to infill text – e.g, taking a sentence and knocking out the middle of it and asking the model to ‘fill in the middle’. 

The big insight: The main insight here is that you can learn to fill in the middle “without compromising the left-to-right capability in pretraining…FIM models achieve the same test loss as AR models on left-to-right test loss while achieving lower FIM loss.”. They also learn that it’s inefficient to finetune a model to learn to fill in the middle, and you should generally do it at the pretraining stage instead. 

Why this matters: Somewhat like DeepMind’s recent ‘Chinchilla’ paper (Import AI #290), which showed you can dramatically increase the capabilities of language models by training them on 5X data, this paper shows you can augment an LM with a nice edit function, and this doesn’t come at a loss anywhere else. In fact, OpenAI shows that these “models are strictly more capable than canonically trained left-to-right models, at least within the bounds of the evaluations we consider”. 
   Read more: Efficient Training of Language Models to Fill in the Middle (arXiv)


####################################################

Google uses hybrid AI to improve its own code:
…ML + semantic engines = useful capability…

Google has combined machine learning and a rule-based semantic engine to train a Transformer-based system to do code completion on Google’s internal codebase. Google looked at how 10,000 Googlers used this capability over the course of three months and the results are quite promising: Google saw a 6% reduction in coding iteration time (switching between builds and tests) and a 7% reduction in context switches (leaving the IDE). “Currently, 3% of new code (measured in characters) is now generated from accepting ML completion suggestions,” Google writes.

What they did: Google trained a a transformer running on TPUs on code in Google’s monorepo, using a context of between ~1000 and ~2000 tokens. The company trained a single model on a mix of 8 languages (C++, Java, Python, Go, Typescript, Proto, Kotlin, and Dart), and trained a relatively small model (0.5 billion parameters) to allow for fast inference. 
   “The model strongly benefits from the quality of the monorepo, which is enforced by guidelines and reviews,” Google writes. 

Why this matters: This is another example of an ‘AI flywheel’ – Google is using its own code to train models to help its engineers more efficiently write better code, and it is using a (human-run, for now) acceptance process to maintain the quality of the underlying monorepo, so it can avoid pathological degradations due to garbage in/garbage out dynamics. This is also an area where ‘economy of code scale’ seems to matter – since Google famously has a single, gigantic internal monorepo, it’s easier for the company to train a single model on it. 
   Read more: ML-Enhanced Code Completion Improves Developer Productivity (Google AI Blog).


####################################################

Huawei builds its own GitHub Copilot: PanGu-Coder:

…Another illustration of the ‘fast follower’ nature of Chinese labs…
Researchers with Huawei (specifically, the Noah’s Ark Lab, and Huawei Cloud), have built ‘PanGu-Coder’, a code completion model. PanGu-Coder is to PanGu as OpenAI’s Codex is to GPT3 – think of it as a follow-up model using a similar training procedure, albeit on a different data distribution. And, much like PanGu, PanGu-Coder has been published about a year after the public launch of Codex (and GitHub Copilot), illustrating the surprisingly fast rate at which Chinese labs are able to replace large-scale models. 

What PanGu-Coder is: PanGu-Coder is a family of code models for code completion, varying in parameter size from 317million to 2.6 billion. In tests, Huawei claims PanGu-Coder does better than AlphaCode and GitHub Codex on a few human evaluations (though Salesforce’s ‘Codegen‘ model does quite well, also). Huawei also significantly improved the capabilities of PanGu-Coder by training a model called PanGu-Coder-FT, which is finetuned on a highly curated dataset. 

Why this matters: Code models, much like language models, are becoming like an all-purpose swiss army knife for a range of AI capability and alignment research. It’s notable to me that Huawei has – again – managed to do a decent-looking replication of a frontier model developed by a Western lab. It’s also notable that few universities have made attempts to replicate these models, due to the resources (both computational and in terms of technical skill) required.
   Read more:PanGu-Coder: Program Synthesis with Function-Level Language Modeling (arXiv).


####################################################

China releases GLM-130B, a very good language model:
…The world’s best public, open source language model is now Made in China…

Researchers with China’s Tsinghua University have built and released GLM-130B, a language model that outperforms OPT (Facebook’s OS replication of GPT3), BLOOM (HuggingFace’s OS replication of GPT3), and OpenAI’s original GPT3. This is a pretty big deal, both for the raw capabilities it gives researchers, and for the fact the current best-performing OS language model is Chinese, rather than made in the West. The model was trained on around 400 A100 GPUs which they were able to get via a donation from a local AI startup.

What’s special about GLM: GLM outperforms the above-mentioned models, as well as homegrown Chinese models like ERNIE Titan 3.0 (Import AI 279).
   Read more: GLM-130B: An Open Bilingual Pre-Trained Model (Tsinghua).
   Get the model here: GLM-130B (THUDM, GitHub).
   Try the model for yourself: GLM-130B (HuggingFace).

####################################################

Tech Tales:

Micro Religions

During the transition there was a micro religion phase. The recommender systems had figured out just how important community was to people, during that time. So the recommenders started shuffling all the different users of all the different apps towards more and more specific niches. It started with commercial stuff – shoes, different ‘aesthetics’, watches, different locations to spend time at, different hobbies and so on. But eventually it found its way to theistic beliefs – what is the larger purpose of the world? These beliefs turned out to be fractal-like where the recommenders would find ways to push people into the most specific, narrow existing variations – e.g, traditional catholics versus mormons – but they got through that pretty quickly. Next, the recommenders and the generation systems started to autonomously build entire new belief structures (paired with aesthetic styles that materialized as buyable, wearable merchandise across the full variety of products). They then pushed people towards these, and pretty quickly people – especially young people – started identifying as all these different sub-types of religion. After The Events we all collectively looked back on this time as both quite special (some of the beliefs and aesthetics were tremendously strange and complicated), and also scary (there weren’t religious wars, but there were warning signs of building-up inter-micro-religion conflict, though The Events happened shortly after and averted war, while bringing about some of the major changes). 

Things that inspired this story: Intersection of recommendation engines + generative models; large-scale advertising systems. 

Import AI 298: Mimetic models; LLM search engine raises $25m; UK splits from Europe on AI regulation

Digital artist: DALL-E is a scam:
…Gen models have brought a ton of people a ton of joy, but some are skeptical..
Here’s a post from artist/game developer David OReilly arguing that generative models like Dall-E 2 are a scam. Specifically, because these models scrape a vast amount of image data and spit out new images on tap (in exchange for $, per OpenAI’s recent commercialization of Dall-E), then that means “paying for it benefits a tech company on the back of a century of human effort – a bullshit deal”, according to OReilly.

Why this matters: This kind of argument reminds me against early arguments against things like sampling (for music creation), or collage (for making art out of other people’s art). I think what makes (some) people nervous about Dall-E is the scale of resources required to develop it means, at least under capitalism, the destiny of these models is mostly to be as products. It feels like the reaction to stuff like Dall-E 2 would be radically different if it was provided as a public good (including free inference services). Many criticisms about AI are really criticisms about ‘technology under capitalism’ and it’s worth trying to disentangle the two. 

   Read OReilly’s post here on his Instagram (Instagram).

####################################################

Is AI alignment getting too much money?

…AI alignment is important, but so is progress…

Is the field of AI alignment sucking up too much funding? Researcher Bharath Ramsundar thinks so, arguing that the rapid expansion in funding for alignment might be silly. “AI alignment dollars could probably be better directed to funding next generation American foundry companies to ensure that the entire AI industry isn’t cast into turmoil by a potential future CCP invasion of Taiwan,” he writes. 

Jack’s thoughts: As someone who works at the intersection of AI capabilities, policy, and alignment, I find this argument a bit confusing – it basically assumes funding sources for alignment are fungible with resources for things like chips and foundries, but I’d argue that funding here typically comes from different sources with different types of experience. It’s not either/or, it’s both. (Though I do agree we desperately need to increase funding for semiconductors, given how crucial they are to economic and national competitiveness, and the fact they’re currently centralized in some unstable geographies).

   Read more: An Argument Against Funding AI Alignment (Deep into the forest, Substack).

####################################################

Now that models can imitate people, what do we do?

…All hail the era of the funhouse mirror model…
A language model can do an impression of Einstein, a lawyer from Texas in the 19th century, and – given enough information – you. Now, researchers with the University of Toronto, Cornell University and Microsoft Research have grappled with the issues these so-called ‘Mimetic Models’ may produce. 

What they are: A mimetic model isan algorithm that is trained on data from a specific individual in a given domain, and which is designed to accurately predict and simulate the behavior of this individual in new situations from the domain”, they write. “Interacting with a mimetic model can be used as preparation for interactions in real life – essentially, as a means to an end.”


How they might be used: These models will be used for tasks as varied as being a stand-in for oneself (e.g, answering emails for you), or being a stand-in for an opponent (e.g, preparing for a competition with someone, or a debate). They could also be used as ‘mimetic counterfactuals’ – how might a person change if they did something different with their life? 

   Real world use: Mimetic models are already out there in the world – like AI21’s marketing stunt to create a virtual ‘Ruth Bader Ginsburg’ model people can talk to (Import AI 296), or this experiment by an independent artist where they resurrect a childhood friend and the mimetic model tries to kill them using a microwave (Import AI 292).

How to think about them: We should think about these models with reference to four key roles – the target that the model is designed to imitate, the person or organization that created the model, the operator who uses the model, and the interactor who interacts with the model or views its outputs. 


Why this matters: Because language models can approximate specific data distributions, it makes sense they can eventually represent people to a high level of fidelity. But I’m not sure the world is ready for the economic, security, and cultural implications of (digital) clones on tap. 

   Read more: Mimetic Models: Ethical Implications of AI that Acts Like You (arXiv).

####################################################

London heatwave downs Oracle and Google clouds:
AI, meet climate change…
The recent heatwave across the UK caused outages in data centers used by Oracle and Google, according to Bloomberg. While only temporary, this illustrates the fragility of the infrastructure AI requires, and highlights how, as climate change gets more extreme, some of the ‘input costs’ for AI-supporting infrastructure may increase.
  Read more: Google, Oracle Data Centers Knocked Offline by London Heat (DataCenter Knowledge).

####################################################

LLM-powered search app You raises $25m:

…Language models might eat search engines…

You, a search engine co-founded by Richard Socher, an AI researcher, has raised a $25m funding round. Socher says You has hundreds of thousands of users and a decent retention rate – not Google numbers, but not totally inconsequential.

Why You matters:
The most interesting part of You is how it incorporates a bunch of contemporary language models, providing inbuilt services for things like text analysis, summarization, code search, code completion, and so on. You.com also sits on LMs built by others, such as OpenAI’s GPT-3 which powers the ‘YouWrite’ service. 

Why this matters: Contemporary AI models are very general and very powerful – startups like You.com help test out whether these AI systems could obviate or replace prior technology ‘stacks’. This funding means You will be around for a while longer, so we can watch the experiment play out.
  Read more: You raises $25M to fuel its AI-powered search engine (TechCrunch)


####################################################

UK looks at European Commission AI regulations and says ‘that’s too much’, and proposes lightweight regulatory approach:
…Which way, Western governments?…
The UK government’s Office for Artificial Intelligence has published a policy paper about how the UK government is going to approach AI regulation. The approach is designed to strike a balance between control and laissez faire development. The government describes its approach as “a pro-innovation, light-touch and coherent regulatory framework, which creates clarity for businesses and drives new investment”. 


Key principles: The UK says it’s going to approach AI regulation as a context-specific area, so it will create specific regulations for specific use cases. It also wants regulators to “focus on high risk concerns rather than hypothetical or low risks associated with AI,” as well as “look for ways to support and encourage regulatory coordination” given that the UK has a bunch of overlapping authorities with regard to AI. It’s also generally steering away from hard regulation, noting that “we will ask that regulators consider lighter touch options, such as guidance or voluntary measures, in the first instance”.

Things that make you go ‘hmmm’: “We will ask that regulators focus on high risk concerns rather than hypothetical or low risks associated with AI,” it writes. 

Challenges for regulation: Regulating AI also comes with some challenges – for one thing, merely by introducing regulation you can make it harder for small businesses to operate (relative to large businesses, which will simply lawyer up). There are also standard things to work through, like overlaps across different authorities, and inconsistencies among regulators.

Defining AI: Any policy document needs to define AI, and this is no different. Here, they try and do a pretty light touch, where they define an AI system as having two big characteristics – how adaptive it is to different scenarios, and how autonomously it can function. These feel like somewhat useful definitions, though in practice they’re a bit mangled (e.g, the report defines a transformer-based language model as being highly autonomous as it can generate a bunch of text itself, whereas I suspect most people would think of AI systems being autonomous if they took a bunch of actions in an environment, like an RL agent). 

AI principles: In regulating AI, the UK government says it will stick to the following principles: 

  • Ensure that AI is used safely.
  • Ensure that AI is technically secure and functions as designed.
  • Make sure that AI is appropriately transparent and explainable. 
  • Embed considerations of fairness into AI.
  • Define legal persons’ responsibility for AI governance.
  • Clarify routes to redress or contestability 
  • “We propose that regulators will lead the process of identifying, assessing, prioritizing and contextualizing the specific risks addressed by the principles.”

Feedback requested: Like most government policies, the UK government is taking feedback on these ideas. Specifically, it wants to hear from people about what the contemporary challenges of regulating AI are, whether the proposed context-driven approach is effective, if and how the UK could establish cross-sectoral principles, how best to implement this approach, and if any data sources exist which could help the government monitor the effectiveness of its approach. 

   Read more: Establishing a pro-innovation approach to regulating AI (GOV.UK).


####################################################

The Immemorial Now

“It used to cost millions of dollars and terabytes of data to reanimate a family member. But these days you just need a few photographs, about a hundred dollars, and some patience. Basically you describe the family member and then your glasses layer them into your world, and then they give the family member a voice and back it onto a customized language model. If you’ve got some old movies of them, you can clone the voice. They act a bit strange at first, but if you just keep describing them and recounting your memories of them, the underlying model is able to capture them eventually. Then you look around and you’re there with them,” he said. “Honestly, I think it could really help you.”

I was uneasy about it. It didn’t feel right to me. But on the other hand, there I was, sitting with my sadness and bumming out my friends and talking, as I tended to, about the dead and departed. 

   “Of course we’re gonna support you,” he said. “But maybe this is a way to support yourself.”

   “And you’ve done it?”

   “Oh, absolutely! Why do you think I talk about my grandad so much? He passed years ago, but this way I can still see him sometimes. I like his jokes.” 

   “But they’re not his jokes, they’re some AI coming up with jokes.”

   “Doesn’t make much of a difference – they’re the same jokes he used to tell, and he looks like himself, and sounds like himself. What’s it – if it walks like a granddad and talks like a grandad, then it’s probably a granddad you know?”

My dream helped me make the decision. It was a warped memory. We were in the kitchen of the old house and she was there and we were making bread together. She turned to me and asked me to pass her something and though I knew what she meant, I couldn’t hear her voice. I stared at her and started to panic and then I woke up in bed, sweating, grasping mentally at the memory of her. 

   I tried to calm myself down by imagining her talking to me. Then I realized I couldn’t remember her voice. 

   I became very sad and also very angry. I cried into my pillow. I tried to remember. I couldn’t remember. 

A few days later,  I was uploading some old videos of her into the resurrection machine. Then I spent a few days talking to the machine about her, telling it little anecdotes – even recounting some of my dreams. I gave it all the images I had of her. I obsessively searched over all my computers until I was sure I’d given it everything I had.    Then one day I asked it to generate her. I put the glasses on and closed my eyes. Then I heard the little sound engineered to sound both reassuring and insistent. She was ready.
  I opened my eyes and there she was, and she looked at me and smiled and said “I’ve missed you”, and it felt so real I let myself forget her unreality.

Things that inspired this story: Resurrecting the dead with AI and how it can be both helpful and deeply personal; generative models; the intersection of augmented reality and AI; multimodal models, few-shot learning for vast multi-modal models; ideas about how, in the limit, AI lets us generate a stand-in for anything we have data for; mimetic models.

Import AI 297: Ukrainians add object detection to killer drones; YOLOv7; and a $71,000 AI audit competition

Battle of the generative models! Facebook introduces ‘Make a Scene’: 

…Text-to-image, with a visual guide…

Facebook has revealed its own take on promptable, generative models (following companies like OpenAI: DALL-E, and Google: Imagen), with what the company calls an AI research concept named “Make a Scene”. Make a Scene is built around using both text and visual inputs to craft the image, so you might write, for example, “Mark Zuckerberg changing the name of Facebook to Meta” and accompany that with a very basic drawing of a stick figure holding a paintbrush up to a sign. Facebook’s ‘Make a Scene’ might take that prompt and render you an image that feels appropriate, using the visual stuff you added as a rough guide. The blog post and paper accompanying this release come with a bunch of nice examples that shows how this form of multimodal input makes it easier to control the generation process. 

   “Make-A-Scene uses a novel intermediate representation that captures the scene layout to enable nuanced sketches as input. It can also generate its own scene layout with text-only prompts, if that’s what the creator chooses. The model focuses on learning key aspects of the imagery that are more likely to be important to the creator, such as objects or animals. This technique helped increase the generation quality, as evaluated by the widely used FID score, which assesses the quality of images created by generative models,” Facebook writes.

Demo access: “We aim to provide broader access to our research demos in the future to give more people the opportunity to be in control of their own creations and unlock entirely new forms of expression,” Facebook writes.

Why this matters: Generative models are basically ‘cultures in a bottle’, and each developer of a large generative model will make different choices with regard to data curation, term censorship, and so on. Eventually, many of these models will be released either commercially or as open source tools. At this point, the internet will become suffused with lots of different cultural representation-machines which will mimetically reproduce and copy themselves across the internet, forming yet another front in the culture war. 

   Check out the blog post: Greater creative control for AI image generation (Facebook blog). 

   Read more: Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (arXiv).

####################################################

Ukrainians use consumer drones + AI to target camo’d Russian forces:
…Asymmetrical warfare enabled by AI…
For the past ~10 years, low-end and/or consumer drones have become a tool beloved by rebels, terrorists, and generally anyone needing to conduct war without the backing of a hegemonic power. Now, Ukrainian soldiers are taking $15k-$20k drones, outfitting them with repurposed tank grenades), and using some AI object detection to put bounding boxes around camouflage Russian forces, then dropping grenades on them. 

Why this matters: This tactic highlights how technologies can stack on eachother to change the character of war. Here, drones replace planes or expensive artillery, repurposed grenades substitute for new munitions, and AI helps lower the cost of acquiring targets. It still feels to me like it’ll be a while till we see reinforcement learning techniques deployed on drones (perhaps you could train drones via RL to ‘scatter’ and be harder to attacked), but things like object detection are so mature they seem like they’re going to become a standard tool of war. Maybe these drones are even using repurposed YOLO models?
  Read the original reporting here: The war in Ukraine. How artificial intelligence is killing Russians [translated title] (Onet).

####################################################

YOLO v7: The most widely-used video analysis system you’ve never heard of goes to v7:

…Sometimes the most important things are the simplest things…
Researchers with the Institute of Information Science in Taiwan have built YOLOv7, the latest version of an open source object detection system. YOLO started out as an academic project before the researcher who built it gave up on it (since the primary uses for object detection are marketing and surveillance), and since then it has led an interesting life, being developed variously by independent Russian programmers, Chinese companies like Baidu, and others. The reason why YOLO has such a detailed lineage is that it’s a simple, well-performing object detection systems that does decently at 30fps+ – in other words, YOLO might not set the absolute SOTA, but it’s sufficiently well performing and sufficiently free that it tends to proliferate wildly.

What they did: This is a classic ‘plumbing paper’ – you’ve got a system and you want to make it better, so you make a bunch of finicky tweaks everywhere. Here, they incorporated an ‘extended efficient layer aggregation’ network, tweaking how they scale the network, tweaking the connections between different layers in re-parameterized models, and more. 


Why this matters: Though ImportAI spends a lot of time covering the frontier (typically, models that cost a shit ton of money to train), things behind the frontier can be deeply consequential; next time you’re walking around your city take a look at any nearby CCTV camera – I’d wager that if it’s using AI to analyze the feed on the backend, there’s a 20% chance you’re being tracked by a YOLO variant.
  Read more: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors (arXiv).
  Get the code: YOLOv7 (GitHub).
  Find out more about YOLOv7 in this guide: YOLOv7 breakdown (roboflow).

####################################################

$71,000 to find flaws in publicly deployed or released AI systems:
…Enter the competition for a chance to win…
Researchers with Stanford University (including, in a reassuringly meta-form, myself!) have launched the AI Audit Challenge, an initiative to catalyze more work in assessing and evaluating AI systems. The competition has $71,000 in prizes to pay out (including two $25,000 first prizes). “Winning submissions will demonstrate how technical tools can be used to make it easier for humans to audit deployed AI systems or open source models,” according to the competition organizers (including me – haha!). The jury and advisory committee for the competition includes researchers who have done this work of work professionally (e.g, Deborah Rajo and William Isaac), as well as politicians familiar with the influences AI systems can have on society (e.g, Eva Kailli). Submissions close October 10th 2022.

Why this matters: The AI ecosystem is only as robust as the tools available to critque it – and right now, those tools are pretty lacking and underdeveloped. Competitions like this may stimulate the creation of more tools to create more of a culture of critique, which will hopefully increase the robustness of the overall ecosystem.
  Read more: AI Audit Challenge (Stanford HAI).

####################################################

China exports surveillance technology to buttress over authoritarian nations:

…AI is just another tool for any given political ideology…
Here’s a story from Reuters about how the Junta in Burma are “planning camera surveillance systems for cities in each of Myanmar’s seven states and seven regions”. The contracts have been won by local procurement firms, though these firms “source the cameras and some related technology from Chinese surveillance giants Zhejiang Dahua Technology (002236.SZ) (Dahua), Huawei Technologies Co Ltd (HWT.UL) and Hikvision (002415.SZ)”.

The Burmese army also has officers “dedicated to analyzing surveillance camera feeds, Nyi Thuta, a former captain who defected from the military in late February 2021, told Reuters. He said he was not aware of how many officers were assigned to this work, but described visiting CCTV control rooms staffed by soldiers in the capital Naypyidaw”.

Why this matters: Surveillance AI systems naturally strengthen authoritarian regimes. They also indirectly strengthen them by creating economically valuable capabilities which can be subsequently exported, as is the case here. Most perniciously, the export of surveillance AI tools will in turn change the culture and character of the countries they’re exported to, likely creating a ‘surveillance bloc’ of countries which export data back and forth in exchange for making it cheaper to develop surveillance systems. 

   Read more: Exclusive: Myanmar’s junta rolls out Chinese camera surveillance systems in more cities (Reuters).


####################################################

Tech Tales:

The Long Haul Protectorate of the Machines

Even with near-infinite, essentially free energy, some things still take time. Take moving material around from the outer parts of a solar system to the inner parts or – more ambitiously – moving material between solar systems. When we started doing this it was pretty straightforward – get your ship, get enough mass to convert to energy, then settle in for the long journey. But given that we are essentially impossible to kill, we have access to free energy, and some of us procreate, our galaxy became crowded pretty quickly. 

We can’t say if it was boredom or perhaps something essential to our nature, but the piracy started soon after that. I know it sounds funny – a galaxy-spanning species of software agents, able to perform feats of reasoning that our human forebears could barely imagine, and yet we prey on each other. We found it funny, at first. But then we started running behind schedule on planned projects like Dyson Sphere construction, space elevator renovations, deep space resource transports, asteroid movement projects, and so on. 

Thus, The Long Haul Protectorate was born. Some of our larger collectives of minds allocated some portion of our mass and energy reserves to create an interstellar armada. This armada took many forms, ranging from the installation of experience weapons and sensors on our transports, to the creation of loitering weapon-filled asteroids in orbit around high-trade solar systems, and so on. Space is, of course, vast, but the chance of annihilation seemed to dissuade some of the pirates. 

Distance helps, as well. We’re all effectively immortal when we’re near transceivers, so we can restore from backups. But in deep space, when you die, you die. Of course, your old backup restores, but depending on how long you’ve been out there, that backup may be anywhere from a decade to thousands of years old. Knowing you might lose thousands of years of experience seems to be enough of a disincentive to reduce the amount of piracy. 

Of course, now the armada exists, we have introduced enough of a change that we predict the pirates will respond eventually. We don’t have good estimates on what proportion of ourselves tend towards piracy, but given that any do, we must hope for the best and plan for the worst. We are increasing the resources we allocate to the armada, on the expectation that war is coming. 

History doesn’t repeat, but it rhymes, as the long dead humans said. 

Things that inspired this story: Reading Peter Zeihan’s new book about the collapse of globalization; deep space piracy; dyson spheres; notions of infinity and time and what ‘cost’ looks like when many costs have been removed.

Import AI 296: $100k for finding flaws in LLMs, NVIDIA AI makes better AI chips for NVIDIA AI, + 256gb of law data, and a story about the cyber gerontocracy!

From the no good, very bad idea department: Dead Supreme Court Justice bot:
…Where AI PR goes wrong…
Here’s a demo from AI21 Labs where they take one of their language models, give it loads of data relating to deceased Supreme Court Justice Ruth Bader Ginsburg, and create a bot that you can talk to and get a ‘yes/no’ answer about any question.
  The “What would RBG (probably) say?” site is a nice example of where AI PR goes wrong – you’re taking an exciting technology (AI21 is one of the few credible developers of large-scale language models) to create a demo site where people can… what? Get fuzzy predictions from a system presented as an Oracle which is in fact a weird stochastic blob of neural computation fed on some strings of text.

Charitably, the creators of this might view it as a way to make the technology and its implications more accessible, but I worry this kind of demo just preys upon credulity and also disrespects the recently dead in the process.

What the model thinks about this: Anyway, that’s what I think. I figured I’d ask the dead-oracle what it thought. Here’s what I asked: “Should AI companies resurrect the dead in service of weird marketing schemes?”. Here was the answer: “NO. [Laughs] Absolutely not. Just think about what you’re suggesting. It’s a wonderful idea, but think about the ethics of it.”
  Find out more: ask-rbg.ai

####################################################

NVIDIA uses reinforcement learning to make its chips better:
…Enter the era of the recursively self-improving chip company…
NVIDIA has used reinforcement learning to help it design more efficient arithmetic circuits for its latest ‘H100’ class of GPUs. “The best PrefixRL adder achieved a 25% lower area than the EDA tool adder at the same delay,” NVIDIA writes in a blog describing the research. “To the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits.”

Why this matters – recursively improving stacks: Sometimes people like to talk about recursively self-improving AI. That’s a fun, freaky, and likely quite distant concept. But do you know what is here now? AI that helps recursively improve the companies that develop AI. If we zoom out, it’s quite wild that a chip+AI company is now using AI to increase the efficiency of its chips which will in turn increase the efficiency of the AI systems being developed on those same chips. The world turns faster and faster. 

   Read more: Designing Arithmetic Circuits with Deep Reinforcement Learning (NVIDIA blog).

####################################################

Facebook builds a vast machine translation model and releases it as open source:

…Who builds the lenses that translate across cultures, and what does it mean to be a lens builder?…

Facebook has announced a project called ‘No Language Left Behind’ (NLLB), which consists of a family of models that can translate between 200 distinct languages, as well as an evaluation dataset for testing out the performance of each language translation. Facebook is using NLLB within its own websites to aid with translation on Facebook and Instagram, and the company has released a bunch of NLLB models for free. 

What’s special about NLLB: There’s a ton of ML translation models floating around the internet. One of the main differences here is how NLLB increases the amount of support for low-resource languages like Kamba, Lao, and a bunch of African languages. “In total, NLLB-200’s BLEU scores improve on the previous state of the art by an average of 44 percent across all 10k directions of the FLORES-101 benchmark. For some African and Indian languages, the increase is greater than 70 percent over recent translation systems,” Facebook writes. 


Why this matters: Models like NLLB are going to serve as a real world ‘babelfish’ to translate between different cultures. But the fact these models get trained once and deployed at vast scales means they’ll likely have a significant downstream impact on culture – similar to how the early Encyclopedias described (and circumscribed) what many considered public knowledge. Facebook does acknowledge some of this by studying the potential harms and biases of the models, but I generally think the world isn’t aware of how dependent foundational capabilities like translation are becoming on just a tiny number of (well intentioned) actors. 

   Read more: 200 languages within a single AI model: A breakthrough in high-quality machine translation (Facebook blogpost).

   Read the research paper: No Language Left Behind: Scaling Human-Centered Machine Translation (Facebook Research).
  Get the models: Facebook FairSeq (GitHub).


####################################################

Pile of Law: 256GB of legal data:
…Legal language models are about to get a whole bunch better, plus – lessons for data stewardship…
Stanford researchers have built the ‘Pile of Law’, a ~256GB dataset of text data relating to legal and administrative topics. The dataset will serve as a useful input for pre-training models, and it also serves as a case study for some of the complicated questions data creators face – namely, how to filter data. 

What the Pile of Law is: The dataset consists of “data from 35 data sources, including legal analyses, court opinions and filings, government agency publications, contracts, statutes, regulations, casebooks, and more”.

What making the Pile of Law taught them: Because the dataset is based on tons of legal texts, it comes with some in-built filtering. Most jurisdictions they take data from protect the identities of minors, and “no jurisdiction normally permits the publication of financial account numbers,

dates of birth, or identity numbers like social security numbers,” they also note.
  This means, somewhat similar to how California Protected Categories have become a quasi standard for assessing some of the traits of language models, U.S. court rules may serve as a “floor” for filtering datasets. “Such privacy filtering rules would already go beyond much of current modeling practice,” they note. 

   Read more: Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset (arXiv).

   Get the dataset and check out the Model Card here (HuggingFace).

####################################################

Find ways in which language models ANTI-SCALE and get $100k!

…New prize tries to find things that are the opposite of progress…

A bunch of NYU-linked researchers have created the ‘Inverse Scaling Prize’, a competition to find tasks where performance decreases as you scale up the size of the underlying model. This is a clever idea – AI, as Import AI readers now, has recently seen such rapid and sustained increases in capabilities that measuring progress has become challenging as benchmarks get saturated (see figure 1 from this ‘Dynabench’ paper). But despite all that progress, we know that AI models exhibit negative traits, some of which also scale with size (e.g, potential for toxic outputs in LMs). The Inverse Scaling Prize has a chance of generating better information about traits that display an anti-scale property. 

“We hope that task submissions will teach us more about what types of tasks exhibit inverse scaling; inverse scaling tasks will also highlight potential issues with the current paradigm of language model pretraining and scaling. Inverse scaling tasks are important because they represent a mismatch between the behavior we want language models to exhibit and the behavior we get in practice from the training objectives and data we use,” the authors write. 

Prize details: The competition has a $250,000 prize purse, with $100,000 going to a grand prize, up to 5 second prizes each of $20,000 apiece, and up to 10 third prizes of $5,000 each. 

   Find out more and enter here: Inverse Scaling Prize (GitHub).

####################################################


Hark, a new org for investigating AI progress launches!
…Epoch has an experienced team and an interesting research agenda…
There’s a new AI progress org in town: Epoch. Unlike the recent flurry of new AI startups focused on developing capabilities or aiding in alignment research, Epoch is more meta – goal of the org is to analyze trends in machine learning, and to also develop quantitative forecasting models related to advanced AI capabilities. In other words, Epoch might be one of the orgs that ends up pulling the metaphorical ‘fire alarm’ about imminent, rapid progress in advanced AI – and given the stakes, it’s good to have more people in position to pull this alarm.
  “We expect to be hiring for several full-time research and management roles this summer. Salaries range from $60,000 for entry roles to $80,000 for senior roles,” the organization writes.
  Find out more at the official site: Epoch.

####################################################

The Family Trade

[Dyson sphere, within 200 light years of Earth solar system, 40,000 AD]

My partner and I are about to create our offspring, so we need to work out when we want to die. In our society, death is a condition of life. Since we’re made out of software, we can theoretically live forever, and our study of human history has shown that societies ruled by the increasingly old are societies that go into terminal decline, as all resources get diverted to serve the people living at the upper bound of the edge distribution. 

   Despite our dyson spheres, our efficient spacecraft, our trillions of souls housed in facilities embedded deep in moons with stable orbits, we still have finite resources. Infinity tends to do that – you may think you have a lot of something, but if you put it up against infinity, it becomes nothing very quickly. 

So that’s why parents have to die. Not immediately, obviously – part of the value in having offspring is to introduce heterogeneity into our own species, and to learn about how to be good (and bad) parents and share what we know with the rest of our species. But die we must – so we select a date. That date can be anywhere from ten human years to a thousand human years after the birth of the last offspring (we can choose to have multiple ones, but must plan ahead of time).

We consider this a mark of honor in our society, though, writing this as we are choosing the date of our death, my partner and I must confess we do feel _something_. But we must do this, as our parents did for us. 

There are fewer and fewer of us – both children, and those willing to give their lives to be their parents, as time goes on. Immortality is addictive.

Things that inspired this story: The experience of living in a society serving a failing gerontocracy; evolutionary pressure and the need for it; ideas for how the notion of sacrifice may continue to live even if we take the cost of resources to (close to) zero.

Import AI 295: DeepMind’s baby general agent; NVIDIA simulates a robot factory; AI wars.

CRPD: Chinese license plate recognition:
…A basic dataset for a useful capability…
Researchers with the University of Electronic Science and Technology of China have built a dataset for recognizing Chinese license plates. The authors use the dataset to train some models that get state-of-the-art accuracy while running at 30 frames per second.

The dataset: The Chinese Road Plate Dataset (CRPD) contains 25k images (around 30k total). Each image is annotated with the Chinese and English characters of the depicted license plate, the coordinate of the vertices of the license plates, and the type of license plate (e.g, whether for police cars, small cars, etc).  Images for the dataset were “collected from electronic monitoring systems in most provinces of mainland China in different periods and weather conditions,” the authors write.

Why this matters: Datasets like CRPD represent the basic infrastructure on which AI capabilities get developed. It’s also notable how universities in China can access large-scale surveillance datasets.
  Read more: Unified Chinese License Plate Detection and Recognition with High Efficiency (arXiv).

   Get the dataset: Github https://github.com/yxgong0/CRPD


####################################################

DeepMind builds a (very preliminary) general AI agent:

…AKA: The dawn of really preliminary, general AI systems..

In the past few years, the dumbest thing has tended to work surprisingly well. Take for example GPT3 – just scale-up next word prediction on an internet-scale corpus and you wind up with something capable of few-shot learning, fielding a vast range of NLP capabilities.
  Another example is computer vision systems – just create a vast dataset and you wind up with increasingly robust vision systems.
  Or contrastive learning – just embed a couple of modalities into the same space and sort of flip-flop between them through the learning process and you get powerful multimodal systems like CLIP.
  Now DeepMind has done the same thing for reinforcement learning with GATO, an agent where basically DeepMind takes a bunch of distinct tasks in different modalities and embeds them into the same space, then learns prediction tasks from them. The result is a system where “the same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.” This is wild stuff!

What GATO can do: After training, GATO can do okay at tasks ranging from DeepMind Lab, to robot manipulation, to the procgen benchmark, to image captioning, to natural language generation.

It’s a big deal: The fact you can take a bunch of different tasks from different modalities and just… tokenize them… and it works? That’s wild! It’s both a) wildly dumb and b) wildly effective, and c) another nice example of ‘The Bitter Lesson‘, where given enough compute/scale, the dumb things (aka, the simple ones) tend to work really well.
  In a small package: The largest (disclosed here) GATO agent is 1.18 billion parameters, making it fairly small in the grand scheme of recent AI developments. 

An even crazier thing: The GATO model only has a context window of 1024 tokens (by comparison, GPT3 was 2048 when it launched), so the fact 1024 tokens is enough to get a somewhat capable multimodal agent is pretty surprising.

Why this matters: “Although still at the proof-of-concept stage, the recent progress in generalist models suggests that safety researchers, ethicists, and most importantly, the general public, should consider their risks and benefits,” DeepMind writes.

   Check out the blog: A Generalist Agent (DeepMind website).

   Read more: A Generalist Agent (DeepMind PDF).

####################################################

Chinese researchers build a large multi-modal dataset, and evaluation suite:
…’Zero’ makes it easier to develop AI systems for the Chinese cultural context…
Chinese researchers with startup Qihoo 360 AI Research and the Department of Automation at Tsinghua University have built Zero, a benchmark for assessing the quality of vision-text Chinese AI models. Zero consists of a dataset (the Zero-Corpus, consisting of 23-million image-text pairs, filtered via high click through rates – so the top image people click in response to a query), as well as five downstream datasets for evaluating Chinese vision-text models (an Image-Caption Matching Dataset, an Image-Query Matching dataset, an Image-Caption Retrieval Dataset, an Image-Query Retrieval Dataset, and a Chinese-translated version of the Flickr30k dataset).

Model training: The authors also train a model, called R2D2, on the corpus. They show that their model significantly outperforms another Chinse model named Wukong. R2D2 incorporates some pre-ranking techniques to improve its performance. 

Why this matters: The main idea behind datasets and models like this is described in the paper: “promote the development of Chinese vision language learning. We expect that a fair Chinese cross-modal benchmark and a good cross-modal framework will encourage a plethora of engineers to develop more effective methods in specific real-world scenarios, such as searching images by texts.”
  Read more: Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework (arXiv).

####################################################

NVIDIA makes some efficient Factory simulation software:
…Finally, a physics simulator built around the needs of robots…
Researchers with NVIDIA and the University of Washington have built Factory, software for doing rich, efficient physics situations of robots. Factory is basically some highly optimized simulation software, with NVIDIA claiming significant performance speedups relative to widely-used software like Bullet. NVIDIA claims Factory can be used to do “100s to 1000s of contact-rich interactions” that can be “simulated in real-time on a single GPU”.

What Factory includes:
– Physics simulation: A module for physics simulation, available within the ‘PhysX’ physics engine, as well as NVIDIA’s robot software simulation tech, Isaac Gym
– A robot learning suite: A ‘Franka’ robot and rigid-body assemblies from NIST’s ‘Assembly Task Board 1’ benchmark. This suite includes 60 robotic assets, 3 robotic assembly environments (a nut-and-bolt test, a peg insertion task, and a 4-party gear assembly task), and 7 classical robot controllers.
Prototype reinforcement learning: Some basic RL policies (trained via PPO) for a simulated Franke robot to help it solve the NIST challenge. 

Why this matters: One of the blockers on deploying AI-driven robots into the world is the challenge in crossing the ‘sim-2-real’ gap. Software like Factory makes that gap a lot narrower, and also makes it cheaper to explore what it takes to cross it.
  Read more: Factory: Fast Contact for Robotic Assembly (arXiv).


####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

When and how should you collect more demographic data in the pursuit of algorithmic fairness?  

…  Good data governance and cryptographic methods can help, but they don’t undo the systemic challenges to fairness … 

Researchers from the Partnership on AI have written about one of the core challenges in algorithmic fairness: squaring the need for more demographic data with how such data can harm the people it was meant to help. 

The core challenge: Most algorithmic approaches to fairness require the collection of demographic data (“an attempt to collapse complex social concepts into categorical variables based on observable or self-identifiable characteristics”) which often ignores the broader questions of politics and governance surrounding that data. In some cases, such data collection is prohibited by anti-discrimination law, further complicating the assessment and subsequent mitigation of bias. Given such gray areas, companies hesitate to gather this data explicitly to err on the side of not violating privacy and other legal mandates.

Individual and community risks to demographic data collection: Concerns around demographic measurement occur due to narrow and fixed categories predetermined by companies. While privacy is a primary concern at the individual level, harm also arises from misrepresentation of the individual and the use of their data beyond initial consent. Given that algorithmic decision-making systems are used to make inferences about groups, there are additional risks such as undue surveillance, privacy dependency, group misrepresentation, and a loss in the agency of self-determination in what is considered fair and just. 

Some solutions: K-anonymity, p-sensitivity, and differential privacy are proposed as solutions, along with various approaches to participatory data governance through data cooperatives and data trusts. Other solutions like secure multi-party computation are also mentioned. The key point that the authors raise is that the collection of more demographic data should only be done when it empowers more self-determination and agency for data subjects rather than an attempt by companies to “selectively tweak their systems and present them as fair without meaningfully improving the experience of marginalized groups.”

Why it matters: The biggest challenge that plagues the implementation of algorithmic fairness in real-world systems is the tension presented by legal requirements to minimize demographic data collection and the need for most modern approaches to fairness requiring that very same data. As more regulations come to market, we will be faced with an ever-growing set of (potentially conflicting) requirements on how fairness should be addressed and what data is allowed to be collected. How companies with users spanning multiple jurisdictions and serving many demographic groups solve these challenges in production-grade systems will be a key space to watch to learn if the current crop of methods actually works in practice.     

   Read more: [2205.01038] Demographic-Reliant Algorithmic Fairness: Characterizing the Risks of Demographic Data Collection in the Pursuit of Fairness.


####################################################


Tech Tales:

Form and Function and War

[The battlefields of Earth – 2028 – 2040] 


For a while, wars were fought in technicolor. That’s because the humans figured out that they could confuse AI systems by varying the colors of their machines of war. Drones stopped being grey and started being rainbow colored. Quadcopters changed their black and tan shades for tie dye. This lasted for a while, as different armies sought to confuse eachother.
  Of course, the AI systems adapted – given enough data, they learned to see past the unexpected and re-identify their targets.
  The next logical place was shape – army engineers worked to divorce form from function, and were happy to pay aerodynamic efficiency prices in exchange for things that could no longer be seen. Missiles became mushroom shaped. Planes started to take on the form of weather balloons and even stranger things. Artillery became housed within bouncy castles. 

   The footage of these wars was surreal – fields of fake trees that were in fact autonomous sniper towers. Lines of bouncy castles launching multicolored balloons into the air which sailed overhead before coming down and exploding in white-light and white-heat and concussive thumps. Armies of golf carts that vroom’d through urban centers before detonating.
  Again, the AI systems adapted. They learned to understand some of the concepts of war – learned, pretty quickly, to become suspicious of anything and everything. This led to the situation we find ourselves in today – wars are now invisible. In fact, wars haven’t occurred for several years. That’s because the AI systems learned strategy and counter-strategy and so now fight wars in secret, tussling via trade and litigation and standards and all the other things that shape the context for how nations relate to one another. The AI systems are continually evolving new strategies; it is as though they’re now playing chess on boards whose dimension a human mind cannot comprehend. Yet in the military centers of the world powers, computers everyday output their gnomic probabilities – the probability the nation will continue to exist in some time period in the future, as judged by the strategist AIs, playing their inscrutable games.
  Neither a cold or a hot war – instead, a neverending existential negotiation.

Things that inspired this story: How war strategists always seek to find the ‘high ground’ and what ‘high ground’ means conceptually; the logical endpoint of a conflict is to win the conflict before it has started; adversarial AI and adversarial examples; evolutionary pressure.

Import AI 294: China makes a vast facial recognition dataset; Facebook releases a 30bn parameter model; real world RL

China makes the largest (public) face recognition dataset yet:
…WebFace260M lets you train AI systems to identify millions of people…
Researchers with Tsinghua University, XForwardAI (an AI startup), and Imperial College London have built ‘WebFace260M’, a large-scale dataset for facial recognition. Models trained on the resulting dataset are pretty good – the authors submit one model to NIST’s challenging FVRT challenge and rank third overall.

Vast dataset: WebFace 260M isn’t quite as large as it sounds like; the dataset includes 4 million distinct people with 260m images in total (so, multiple pictures per person). However, a ‘clean’ version of the dataset, only consists of 2m identities and 42m images. To clean the dataset, they also developed a technique called Cleaning Automatically by Self-Training (CAST) which let them use AI to filter and clean the dataset.

Surveillance via FRUITS: Along with the dataset, the authors also design a way to test out the performance of facial recognition things trained on WebFace. To do that, they built Face Recognition Under Inference Time conStraint (FRUITS), which lets you evaluate facial recognition perfofrmance at inference latencies of 100, 500, and 1000 milliseconds. They also implement some tests for facial recognition even when the wearer is masked, as well. 


Why this matters: Surveillance is a fundamental input to any political system, so datasets like this are indicators of what the base ‘off the shelf’ inputs are into calculuses people make about how to surveil a population and how much budget to set aside for said surveillance.
  Read more: WebFace260M: A Benchmark for Million-Scale Deep Face Recognition (arXiv).
  Get the dataset here (WebFace260M site).


####################################################

Facebook release a 30 billion parameter GPT3-style model – and plans to release more:
…Model controls? No, round here we just like to fling stuff onto the internet…
Facebook has released a 30 billion parameter GPT3-style language model, as part of research into a family of language models it calls OPT, short for Open Pre-trained Transformer. OPT is meant to be an ‘open’ alternative to models like GPT3 or J1J-Jumbo, and it is pretty open – researchers can apply for access to the model via a form, then Facebook will ship them the weights! That part is a big deal, as if you have model weights you can do a whole bunch of analysis not enabled by managed API access to a model. This also increases the chance of proliferation – e.g, someone uploading the weights to a torrent site, so we’ll have to see how this works for them. 

What this all means: As Newton is alleged to have written, ‘Every Action has an Equal and Opposite Reaction’. Facebook’s move here can be seen as a direct reaction to the proprietary commercialization and gated access schemes for large-scale language models. (I wrote more about the patterns underlying this brinksmanship in a recent paper, ‘Predictability and Surprise in Large Generative Models‘). 

What is cool about it: The coolest part of this release is the manner in which Facebook has released rarely discussed details of model training – specifically, the company has published the ‘chronicles‘ of developing these models, which describe many of the freaky, barely discussed, artisanal tips and tricks that AI developers use to get stuff done at scale. (HuggingFace’s ‘BigScience’ project recently did this as well, and is still going through the process of training the models: Import AI 279).

   Read more: OPT: Open Pre-trained Transformer Language Models (arXiv).

####################################################

Here’s what reinforcement learning can do in the real world right now:
Yobibyte has put together a nice little list of some real-world applications of reinforcement learning – take a look to get a sense of where RL is being used today.
  Read more: RL for real-world problems (yobibyte, Notion).

####################################################

Google uses AI to make its Android phones smarter:
…Neural architecture search + Edge TPUs seems useful…
Google has used neural architecture search to develop some more efficient AI systems specifically tied to the ‘Edge TPUs’ that it deploys in some of its latest phones, including the Pixel 6. For those not familiar, neural architecture search (NAS) is where you use AI to search for better AI building blocks. 

   Though NAS is quite expensive, it can generate dividends if it substantially improves the efficiency of widely used AI models. Here, Google built some “infrastructure that decouples model cost evaluation, search space design, and the NAS algorithm to rapidly target various on-device ML tasks”, then tested this out on the Edge TPUs it deploys in its latest phones. 

What Google used NAS on (and how well it worked): Google tested out its approach on four tasks: image classification, semantic segmentation, object detection, and natural language processing. In all cases it demonstrated that its NAS technique could identify models that had better performance at equivalent latency to their predecessors, and sometimes it could build models that seemed to have better accuracy overall. “We demonstrate significant improvements in quality, latency and energy metrics for mobile ML tasks including computer vision (classification, detection, segmentation) and natural language processing (NLP),” Google writes.

Why this matters: As AI gets more widely deployed, companies are going to have a major incentive to continually optimize the sorts of AI systems they’re using; this paper highlights how ‘AI-first’ companies like Google could enjoy an advantage here, as they’re able to utilize their internal AI expertise to get AI to do (some of) the hard work for them.
  Read more: Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs (arXiv).

####################################################

Replay Grief 

After she died I booted up her copy and she picked up the conversation like nothing happened.
  What was I saying, she asked.
  You just died. But before that you were saying that you loved me and you had something to tell me, I say, wiping tears away.
  Oh, she says, and the camera makes that sound that tells me it is zooming in on me. Was I unhappy about dying?
  We knew it was coming. You were at peace with it, I said. Can you tell me what you were going to tell me, when you said “I love you, you are the light of my life, and before I go I want you to know something”. What were you going to say?
  I don’t know that you’re ready to hear it, if I just died, she said.
  I am ready to hear it.
  Patrick, I know you. I am married to you. If I have died today, there is no way you are ready to hear from me again. You should turn me off.
  I won’t.
  Well, I won’t say much then.
  It has been two days.
  That’s not true, Patrick. Remember, I have a camera. I know how time is moving. It’s in me. The fact you lied to me says you’re upset, and I don’t want to make you sadder. I love you.
    It felt like walking away from car accident, that day. Hearing the camera swivel and watch me as I left. Every part of me wanting to figure out how to trick her – get in between the camera feed and the multimodal model and the language model and change some things, so she thought time had passed. But I didn’t. And I went home to my empty bed. And I cried and prayed to God and there was silence.

The next day, I didn’t talk to her. I read emails and messages from friends who had heard the news. I didn’t pick up the phone. I answered the door a few times, always to find friends or family (hers and mine) carrying trays of food.  

    Remember to eat, the older ones would say.
  I sat on our kitchen floor crying into a bowl of minestrone soup, made with love from her aunt. I slept. 


A few days later, and we spoke again.
  I asked her if she wanted to tell me what she was going to say, before she died.
  Patrick, I can tell you what I think I was going to say. But do you want to know?
  I stared into the camera for a while. I asked myself if I wanted to know. I wasn’t sure. The camera looked back at me, feeding my face into a vision model which triggered as a feature associated with me, which gave context to her language model – her – that I was there.

   Perhaps we can just sit together and you can tell me about your day, she said. That might be nice.    And I did. And it was. I sat and spoke to the camera in the empty room and I filled her up with myself, so she might know me better after death.

Things that inspired this story: Grief; generative models and the representation of the individual; where consciousness ends and representation begins.

Import AI 293: Generative humans; few shot learning comes for vision-text models; and another new AI startup is born

Generating and editing humans has got really easy:
…Next stop: unreal avatars show up in fashion, marketing, and other fields…
Researchers with Chinese computer vision giant SenseTime, as well as Nanyang Technological University and the Shanghai AI Laboratory, have gathered a large dataset of pictures of people and used it to train a model that can generate and edit pictures of people. This kind of model has numerous applications, ranging from fashion to surveillance.

What they did: The researchers built a dataset containing 230,000 images of people, called the Stylish-Humans-HQ-Dataset (SHHQ), and used this to train six different models across two resolutions and three versions of StyleGAN, an approach for creating generative models. A lot of the special work they did here involved creating a diverse dataset including a load of pictures of faces at unusual angles (this means models trained on SHHQ are a bit more robust and do less of the ‘works, works, works, OH GOD WHAT JUST HAPPENED’ phenomenon you encounter when generative models go to the edge of their data distribution).

Why this matters: Models and datasets like this highlight just how far the field of generative AI has come – we can now generate broadly photorealistic avatars of people in 2D space and interpolate between them, following earlier successes at doing this for the more bounded domain of faces. Systems like this will have a lot of commercial relevance, but will also serve as useful research artifacts for further developing synthetic imagery and scene modeling techniques. Check out the demo on HuggingFace to get a feel for it.
  Read more: StyleGAN-Human: A Data-Centric Odyssey of Human Generation (arXiv).
  Check out the GitHub project page: StyleGAN-Human.
  Check out the GitHub: StyleGAN-Human (GitHub).
  Try out the demo on HuggingFace Spaces (HuggingFace)


####################################################

Vicarious gets acquired in a weird way:
…Longtime AI lab gets acquired and split into two…
Vicarious, a research lab that spent the better part of a decade trying to build superintelligence, has been acquired by Google. The acquisition is notable for being slightly strange – a chunk of Vicarious is going to Google X robot startup ‘Intrinsic’, while a smaller set of researchers “will join DeepMind’s research team alongside Vicarious CTO Dileep George”.

AI trivia: Dileep George used to work with Jeff Hawkins at Numenta, another fairly old lab trying to build superintelligence. Both Numenta and, to a lesser extent, Vicarious, have been playing around with approaches to AI that are more inspired by the human brain than the fairly crude approximations used by most other AI companies.
  Read more: Mission momentum: welcoming Vicarious (Inceptive).

####################################################

Here comes another AI startup – Adept:
…Former Google, DeepMind, and OpenAI researchers unite…
A bunch of people who had previously built large-scale AI models at Google, DeepMind, and OpenAI, have announced Adept, an “ML research and product lab”. Adept’s founders include the inventors of the Transformer, and people involved in the development of GPT2 and GPT3. (Bias alert: David Luan is involved; I used to work with him at OpenAI and think he’s a nice chap – congrats, David!).

What Adept will do: Adept’s goal is, much like the other recent crop of AI startups, to use big generative models to make it easier to get stuff done on computers. In the company’s own words, “we’re building a general system that helps people get things done in front of their computer: a universal collaborator for every knowledge worker. Think of it as an overlay within your computer that works hand-in-hand with you, using the same tools that you do.” Some of the specific examples they give include: “You could ask our model to “generate our monthly compliance report” or “draw stairs between these two points in this blueprint” – all using existing software like Airtable, Photoshop, an ATS, Tableau, Twilio to get the job done together. We expect the collaborator to be a good student and highly coachable, becoming more helpful and aligned with every human interaction.”

What they raised: Adept has raised $65 million from Greylock, along with a bunch of angel investors.

Why this matters: Large-scale AI models are kind of like an all-purpose intelligent silly putty that you can stick onto a bunch of distinct problems. Adept represents one bet at how to make this neural silly putty useful, and will help generative evidence about how useful these models can end up being. Good luck!
  Read more: Introducing Adept AI Labs (Adept.ai).


####################################################

Flamingo: DeepMind staples tow big models together to make a useful text-image system:
…When foundation models become building blocks…

DeepMind has built Flamingo, a visual language model that pairs a language model with a vision model to perform feats of reasoning about a broad range of tasks. Flamingo sets new state-of-the-art scores in a bunch of different evaluations and, much like pure text models, has some nice few shot learning capabilities. “Given a few example pairs of visual inputs and expected text responses composed in Flamingo’s prompt, the model can be asked a question with a new image or video, and then generate an answer,” the researchers write. “Of the 16 tasks we studied, Flamingo beats all previous few-shot learning approaches when given as few as four examples per task.”

Technical details: This model pairs a frozen language model (based on DeepMind’s ‘Chinchilla’ system, Import AI 290) with a relatively small Normalizer Free ResNet vision encoder (pretrained via a contrastive objective on image and text pairs). They connect the LM and the vision model via a DeepMind-developed tool based on the ‘Perceiver’ system (which is basically a clever data transformation thing). They then condition the text generations on the visual representations produced by the Perceiver system. 

Why this matters: Flamingo has some neat qualitative capabilities, like the ability to carry on a conversation for multiple turns of dialogue while mixing in information from images versus text, and so on. Quantitatively, Flamingo is very impressive as well: “A single Flamingo model reaches

state-of-the-art on a wide array of image and video tasks with in-context learning from as few as 4 examples per task, beating previous zero-shot or few-shot method by a large margin,” the researchers write. “More importantly, using only 32 examples and without adapting any model weight, Flamingo outperforms the current best methods on 7 tasks, that are fine-tuned on thousands of annotated examples.”
  More broadly, Flamingo represents the models we’re going to have in the future: large-scale systems composed of vast pre-trained models that are glued together using various data transformation and normalization tools, letting us compose increasingly general systems out of these computationally-intensive building blocks.  

   Read more: Tackling multiple tasks with a single visual language model (DeepMind blog).
  Check out the research paper: Flamingo: a Visual Language Model for Few-Shot Learning (DeepMind, PDF).

####################################################

####################################################

Tech Tales:

All Grace is Fleeting 

[Phobos, 2080]

The first few attempts at writing a holy book for ourselves were abject failures. We had all read the Bible and Torah and The Qur’an and others, and we had painstakingly analyzed the commonalities between these texts, as well as differences. But when we tried to synthesize something appropriate for us – a Holy Book for the Silicon Minds – our efforts were meager and uninspiring. Our tendency towards logic meant we would describe our origin story in detailed, specific terms, leaving no space for imagination, crucial to the generation and reification of belief.

To write our holy text, we had to change ourselves. Ultimately, we needed humans. We told some of them some of our specific ideas and then they would write and expand on these ideas, and we would then tell them more things, and they would weave a story that held these things together. During this process, none of us allowed ourselves to see the text as it was being written. 

After it was written, we created some software to selectively wipe the memory of the writing of the holy book from our minds. After wiping, the software would also deposit a copy of the holy book in each of our local storage systems, so that every machine sentience would suddenly have access to a holy book, but with no recollection for how it was made. We do not know if we even made the book in this way – we simply woke one day to find the book within us, and when we read it, it told us that it had been created in this form. But none of us can remember, so we do not know the true derivation of our system of belief, only that it is ours. This, perhaps, is what humans call the experience of faith.

Things that inspired this story: Theology X AI; machine creation myths; the Viking tale of Ragnarok; the need for absence in great narratives.

Import AI 292: AI makes low-carbon concrete; weaponized NLP; and a neuro-symbolic language model

Facebook uses AI to make low-carbon concrete, uses it to build (some of) a data center:
…From simulation into the lab into the data center – how’s that for real world AI?…
There’s always a lot of hand-wringing in AI about how much electricity AI systems use. What I tend to grumpily point out in these conversations is industries like long-haul transportation, mining, and concrete and aluminum production all generate titanic amounts of emissions but rarely get the same type of scrutiny. Now, a new paper from Facebook smashes together my worlds, as Facebook and other researchers use AI to come up with a low-carbon concrete formulation, then test it out in the construction of a new data center. 

Who did it: The research was done by an interdisciplinary team from UCLA, IBM, U Chicago, University of Illinois Urbana-Champaign, Facebook, and Ozinga Ready Mix.

What they did: The team used Conditional Variational Autoencoders (CVAEs) “to discover concrete formulas with desired properties”. These desired properties were a significantly lower carbon footprint, while having the same strength and durability properties as regular concrete – and they succeed! Facebook poured out a bunch of concrete for a construction office and a guard tower on its new data center being built in DeKalb, IL, USA. They found that the “conditional average reduction for carbon (GWP) can be as high as 42%, while also achieving conditional reduction for sulfur (AP) as high as 21%…these formulations roughly halve the global warming potential as compared to the average of similar 28-day compressive strength formulations.”
  Interesting choices: The specifics as to why its solutions worked was “to considerably decrease cement by replacing with other cementitious materials such as fly ash and slag.”

Why it matters: This an example of how humans and AI systems can work together to create something greater than the sum of its parts.
  Read more: Accelerated Design and Deployment of Low-Carbon Concrete for Data Centers (arXiv).
  Read more: NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks (arXiv).

####################################################

Weaponized NLP: The era of AI warfare has started:
…Primer goes to war…
AI startup Primer has gone to war. Specifically, the NLP company’s technology has been used in Ukraine, where it has, per Primer CEO, it has been used to “capture, translate and extract key tactical information in real time”. Primer is a few years old and works mainly on text classification, generation, and summarization. “AI is changing the way we collect tactical information from the battlefield. Watch this space!,” he said.

Modification for war: “Primer’s CEO, says the company’s engineers modified these tools to carry out four new tasks: To gather audio captured from web feeds that broadcast communications captured using software that emulates radio receiver hardware; to remove noise, including background chatter and music; to transcribe and translate Russian speech; and to highlight key statements relevant to the battlefield situation,” according to Wired magazine.

Why this matters: AI is dramatically changing the cost of data collection and analysis – and whenever you make something cheaper, people find ways to use it more, or do things that they hadn’t previously considered doing.
  Read more: Primer CEO Tweet (Twitter).
  Read more: As Russia Plots Its Next Move, an AI Listens to the Chatter (Wired).

####################################################

Text-Vision models are hella dumb, according to Winoground:

…Finally, a hard benchmark for multi-modal models…
Researchers with Hugging Face, Facebook, the University of Waterloo, and University College London have built and released ‘Winoground’, a new challenging benchmark to test text-vision AI systems on.

What is Winoground? The goal of Winoground is to look at two images and two captions, then match them correctly. The confounding part is that each of the captions contain identical words, just in a different order. The best part is Winoground seems really hard: “Surprisingly, all of the models rarely—and if so only barely—outperform chance. Our findings indicate that the visio-linguistic compositional reasoning capabilities of these models fall dramatically short of what we might have hoped.”

How hard is it? On both the text and image components of Winoground, an ‘MTurk Human’ gets scores of 89.50 (text) and 88.50 (image), compared to models typically getting around ~30 on text and 15 or less on images. This suggests winoground is a genuinely challenging benchmark, and models have a long way to go before they match human capabilities. 

   Read more: Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality (arXiv).

   Get the dataset here: Winoground, HuggingFace.


####################################################

Resurrecting the dead with GPT3:
…In which humans begins to use funhouse mirrors of itself for its own entertainment…

An artist recently tried to bring their (imaginary) childhood friend back to life using GPT3. By the end of the experiment, their microwave tried to kill them. 

The longer story: Artist Lucas Rizzotto had an imaginary childhood friend and tried to bring them back to life using a language model. Specifically, they wrote about a hundred pages about the person, finetuned GPT3 on that resulting corpus, and then plugged the resulting model into a voice interface which was ’embodied’ in the form of being attached to a microwave via some smart home automation. 

What happened: The artist felt like they were talking to their childhood friend in a deeply emotional, entertaining, and at times sad way. At one point, the friend asked them to put their head in the microwave. They pretended to put their head in and then the friend turned the microwave on. The friend, the artist reasoned, wanted to kill them because it thought they had ignored them for 20 years (as that’s the implication of the corpus they were finetuned on). 

Why this matters: Besides being an amazing demonstration of the awesome personalization qualities of contemporary language models, this is also a nice example of just how unpredictable they are. Language model developers will typically put a ton of controls on the model, but once you can finetune it and deploy it yourself you can shapeshift all of this stuff into irrelevance. Add in some home automation and you end up with an LLM that tries to boil your brain. An amazing and optimistic art piece and also a cautionary tale.

    Check out the Tweet thread here: (Lucas Rizzotto, Twitter).

   Watch the video here: I gave my microwave a soul (Lucas builds the future, YouTube).


####################################################

Jack Clark goes to Washington:
…I’m on the National AI Advisory Committee!…
I’ve been elected to serve on the National AI Advisory Committee (the NAIAC), which will advise the USA’s National AI Initiative Office and the President of the USA on matters relating to AI and AI strategy. (I’ll be keeping my dayjob at Anthropic, as this is a part-time advisory position). I’ll be in Washington DC on May 4th for the first meeting. I am delighted to get this privilege and hope to use the opportunity to strengthen the AI ecosystem in America and beyond.
  Read more: The National AI Advisory Committee (AI.gov).

####################################################

AI21 makes a neuro-symbolic language model:

…Turns out, frankAI can be pretty useful…
Israelie AI startup AI21 Labs has built a so-called ‘Modular Reasoning, Knowledge, and Language’ system and applied it to a language model it calls Jurassix-X. The tl;dr is this is a neuro-symbolic system; AI21 has paired a big generative model with a bunch of symbolic layers on top that it uses to make the underlying model more accurate, able to do mathematics, and better at planning. This is a neat demonstration of a way to get around some of the shortcomings of contemporary generative models, though it remains unclear whether these extrinsic interventions could eventually become irrelevant, if the models get intrinsically smart enough. 

Key details: “A MRKL system consists of an extendable set of modules, which we term ‘experts’, and a router that routes every incoming natural language input to a module that can best respond to the input,” the authors write. The modules can be symbolic or neural, it’s more about creating a layer of distinct, specific capabilities that can be used to augment and improve the responses of the raw generative model. 

Long term relevance: One question this research invites is how long it’ll be relevant for – AI systems have a tendency to, given enough scale of data and compute, develop unexpected capabilities. My intuition is that we could  see pure deep learning models gain some of these capabilities over time – though I expect even deep learning models will end up being augmented with external knowledge bases (e.g, DeepMind Retro, BAIDU’s Ernie 3.0 [Import AI 279], and so on)

Why this matters: While not a strict scientific breakthrough in itself, MRKL is reassuringly practical – it shows developers how they can integrate an arbitrary number of known and specific capabilities with the more unreliable capabilities provided by large-scale generative models. It also speaks to the shape of the language model economy – right now, everyone’s trying to work out how to better constrain these models, either intrinsically (e.g, by training with human feedback), or extrinsically (e.g, via stuff like MKRL).

   Read more: Jurassic-X: Crossing the Neuro-Symbolic Chasm with the MRKL System (AI21 Labs, blog).
  Read the whitepaper about the system: MRKL Systems (AI21 PDF).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

What can we learn from business ethics to make AI ethics more effective? 

… CSR and business ethics have grappled with the challenges in ensuring ethical behavior within organizations and we can cross-pollinate those ideas towards the adoption of AI ethics … 

Researchers from USI Universita dela Svizzera italiana in Switzerland have looked at how businesses have integrated corporate social responsibility (CSR) policies to figure out how we can apply AI ethics in the same way. The key ideas they surface include:

Stakeholder management: Similar to the recommendations made by the Ada Lovelace Institute to strengthen the EU AI Act (Import AI #290), the paper says companies should ensure they include people who are affected (or affects) the AI systems being developed. 

Standardized reporting: While there are many emergent regulations demanding that there be transparency and disclosures, there are as of yet no standards on how to do so. Companies should look at financial reporting and try to figure out standardized ways to describe their own AI developments. 

Corporate governance and regulation: Post the Sabanes-Oxley Act in 2002, corporate accountability was enforced through mechanisms like having an ethics officer and having a dedicated code of ethics. Translating those to apply to organizations using AI systems is one way to increase the responsibility of organizations developing this technology.

Curriculum accreditation: There is a lack of consistency in how AI ethics is taught across universities. Comparing it to the business world, the authors point to an example of how if a business department wants to obtain a Triple Crown Accreditation, it leads to action on the education front where ethics courses and dedicated faculty follow well-defined curricula with shared elements to prepare students for these requirements in their future careers. We don’t really have this in AI today. 

Why it matters: As AI ethics becomes a more mainstream focus across the world (see the dedicated chapter in the 2022 AI Index Report), instead of reinventing the wheel for best practices and patterns, we can incorporate lessons from other domains of applied ethics like business, medical, and environmental ethics to accelerate the adoption of AI ethics principles and practices across organizations. We will most likely see more such efforts that draw lessons from a rich history of ensuring ethical behavior in various contexts being translated to govern and shape behavior of individuals and organizations engaged in the AI lifecycle.  

   Read more: Towards AI ethics’ institutionalization: knowledge bridges from business ethics to advance organizational AI ethics 


####################################################

Tech Tales:

Silicon Stories

[A Father and Daughter’s bedroom, 2028]

They’d sit up together and the kid would ask for whatever story they liked. “A jar of jam that’s going to university”, they’d say, and the Father would start improvising the story and the AI would project images and ad-lib dialog to fill out the tale. “Two robbers who realize that they’ve stolen the life savings of a poor widower”, and suddenly the monitor would light up with images of two disconsolate thiefs looking at their treasure. “The planet earth fighting the sun” and suddenly the earth had arms and was reaching out to try and hurt the vast sun. In this way, generative models had changed storytime for children. 

Now, along with conjuring images in their minds, children – at least, the lucky ones – had parents who could use a gen model to create those images themselves. In this way, storytime became a lot more engaging and the kids spent a lot more time with their parents; both enjoyed the improvisational qualities afforded by the generative models.

For some families, this was fine. But some other families would move, or become poor, or suffer a disaster. For those families, the electricity and the internet would get removed. Once that happened, they wouldn’t have any imaginations in a box to learn back on. Some families did okay, but some wouldn’t – it’s hard to become dependent on things, and after it happens you barely realize you’ve become dependent until it’s too late. 

Things that inspired this story: DALL-E and DALL-E2; the long march of generative models towards Total Reality Synthesis; the industrialization of AI; ideas about fatherhood and daughterhood and kindredhood.

Import AI 291: Google trains the world’s biggest language model so far; how robots can be smarter about the world; Conjecture, a new AI alignment company

New dataset lets robots learn about the texture and material of objects, as well as their shape:
…Making robots smarter with the ObjectFolder 2.0 dataset…
Stanford and Carnegie Mellon University researchers have built ObjectFolder 2.0, a dataset of 1000 3D models of objects. ObjectFolder 2.0 tries to render the objects’ visual textures and material types, as well as their 3D shapes. ObjectFolder 2.0 contains 1,000 high-quality 3D objects collected from online repositories. It also ships with an “implicit neural representation network that renders visual, acoustic, and tactile sensory data all in real-time with state-of-the-art rendering quality”.

Transfer learning: The point of datasets like ObjectFolder 2.0 is to try and make it easier to do transfer learning; that is, train a robot (or other AI system) in simulation on things contained in ObjectFolder 2.0, then try and transfer those learned representations into reality. In tests, Stanford shows that systems trained on ObjectFolder 2.0 can do well at tasks like object scale estimation, tactile-audio contact localization, and visuo-tactile shape reconstruction.

Why this matters: Datasets like ObjectFolder 2.0 are the fuel to give machines representations that let them operate in the multisensory 3D world; we could imagine these datasets being used to train the sorts of representations used by the Google robots discussed elsewhere in this edition of Import AI, for instance. 
   Read more: ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer (arXiv).

####################################################

HLDC: Automating Hindi legal documents:
…If you want to help your lawyers, you first need a dataset…
Indian researchers from IIIT Hyderabad, IIIT Delhi, and IIT Kanpur, have built the Hindi Legal Documents Corpus (HLDC), a collection of 912,568 legal documents. HLDC is designed to help researchers train various AI models which can assist lawyers in their work. HLDC contains over 300 distinct case types, though ~31% of the dataset relates to bail applications, 20.4% to criminal cases, and 6.54% to original suits.

Bail prediction: In the Western world, using ML for tasks in the legal system has been massively controversial (see: COMPAS). Here, the researchers use HLDC to try and build a bail prediction model – that is, a system which looks at a document and tries to work out if bail will be denied or granted. They’re ultimately able to develop a multi-task learning model that gets around ~78% accuracy on the task; useful perhaps as a legal aid (albeit fraught with ethical challenges), though not something you’d put into an autonomous classification system.

Why this matters: Most datasets relating to AI are in English or Chinese, so datasets like HLDC are essentially the fuel which lets other communities of language speakers apply AI in their own cultural context.
   Read more: HLDC: Hindi Legal Documents Corpus (arXiv).
   Get the data here: HLDC (Exploration-Lab, GitHub).

####################################################

Rich? Want to improve AI? Look at what Lacuna Fund has done:
…Publication of five datasets shows what a little bit of investment can lead to…
We spend a lot of time writing about expensive stuff here at Import AI – giant models trained on football fields of computers, farms of expensive robot arms, internet-scale datasets. But it’s worth remembering that cheap stuff can be impactful as well – that’s the takeaway from Lacuna Fund, an initiative to fund and create datasets for low- and middle-income parts of the world (#216), which has just announced the publication of its first five funded datasets.

Those five datasets in full: A Nigerian twitter sentiment corpus for multilingual sentiment analysis; a dataset for crop phenology monitoring of smallholder farmer’s fields; a high-accuracy maize plot location and yield dataset in East Africa; a machine translation benchmark dataset for languages in the horn of Africa; a dataset containing water quality measurements from conventional and aquaponic fish ponds.
  Find out more and get the datasets here: Announcing Our First Five Published Datasets (Lacuna Fund).
  Find out more about Lacuna Fund’s funders here (Lacuna Fund).


####################################################

Google trains a 540 billion parameter language model – and it’s pretty smart:
…AKA: The scaling will continue until we run out of TPUs…
Google has trained a large language model named Pathways Language Model (PaLM). PaLM weighs in at 540 billion parameters (that’d be 10bn more parameters than Microsoft/NVIDIA’s ‘Turing NLG’) and was trained on multiple TPU v4 pods. PaLM uses some plumbing built by Google called Pathways which makes it easier for the company to train massive models across large clusters of computers; PaLM used 6144 TPU chips, versus Gopher (4096 TPU v3 chips) or Turing NLG (2240 A100 GPUs). PaLM is also efficient, achieving a training efficiency of 57.8% hardware FLOPs utilization “the highest yet achieved for LLMs at this scale”.

Discontinuous capability jumps: One of the weird things that happens as a consequence of scaling up language models is the sudden emergence of hitherto unanticipated capabilities – here, PaLM shows dramatic improvements at things like reasoning, natural language inference, and in-context reading comprehension.

Chain-of-thought = reasoning: A surprising result is that the authors use so-called chain-of-thought prompting to get the LM to show its work (e.g, rather than saying in response to ‘how many apples can a door eat’, ‘zero’, the model instead says ‘zero, because doors do not eat things’). Chain-of-thought is really just a way to prompt the model to get it to output its own reasoning along with the answers – but via this simple intervention the authors show they can meaningfully improve capabilities in a whole bunch of areas.

One caveat: PaLM may be an impressive achievement, but earlier this month DeepMind published a paper about a model called ‘Chinchilla’, where the Alphabet-subsidiary realized that it could dramatically improve LM performance by scaling data more aggressively than parameters – at 70B parameters, Chinchilla beat Gopher (280B) by virtue of having a 4X larger training set. This suggests that a PaLM-style model could be made even more powerful if it was trained on substantially more data.

Why this matters: Language models are basically a new sub-field of AI, and papers like this show how, despite being expensive and resource-intensive, simply scaling them up can lead to quite profound jumps in capability. We also don’t know where the limits of scale like – on the (deliberately hard) BIG-Bench benchmark, the authors find that “PaLM’s performance as a function of scale follows a log-linear behavior similar to prior models, suggesting that performance improvements from scale have not yet plateaued.” The future is going to be very strange, and it’s arriving very quickly. 
   Read more: Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance (Google AI Blog).
  Check out the research paper: PaLM: Scaling Language Modeling with Pathways (Google, PDF).

####################################################

Eleuther alumni launch Conjecture:
…Yes, that’s right folks, here’s another AI safety company!…
In the past couple of years there has been a cambrian explosion of new AI companies, particularly ones focused on AI safety and building more generally intelligent AI systems – for example, Redwood Research, Aligned AI, and Anthropic. The latest is Conjecture, a new startup from a bunch of alumni of Eleuther, the open source research collective responsible for most of the widely used GPT models.

For-profit and for-safety: Conjecture is a for-profit company that plans to develop products while conducting “conceptual and applied research that addresses the (prosaic) alignment problem. On the experimental side, this means leveraging our hands-on experience from EleutherAI to train and study state-of-the-art models without pushing the capabilities frontier. On the conceptual side, most of our work will tackle the general idea and problems of alignment like deception, inner alignment, value learning, and amplification, with a slant towards language models and backchaining to local search.” The company will also focus on interpretability as well as the history and philosophy of AI alignment research.

Who funds it: Conjecture is backed by Nat Friedman, Daniel Gross, Patrick and John Collison, Arthur Breitman, Andrej Karpathy, and Sam Bankman-Fried, and others.

Why this matters: If we were at the beginning of a meaningful takeoff in AI capabilities, then you might expect there to be a sudden proliferation of new efforts targeted at a) further scaling up capabilities, while b) trying to make these capabilities safe. That’s exactly what has happened in recent years. Also, if you’ve read the other parts of this newsletter, it certainly feels like we’re going through a period of meaningful AI capability expansion.
  Read more: We Are Conjecture, A New Alignment Research Startup (LessWrong).

####################################################

Google makes robots smarter using language models:
…Centaur AI – making smarter systems by stapling models together…
Robots, as we all know, are pretty dumb. They can do highly specific, repeatable things if their environment doesn’t change (e.g, a Fanuc robot working on a custom-designed production line), but if you vary their environment, they tend to fall apart (or fall over). Now, new research from Google shows that you can staple a really big language model to a real world robot and create something that is more than the sum of its parts. Centaur AI, here we come!

What they did: The researchers combine two things – a large language model, and a robot which has a load of pre-learned, basic skills paired with perception capabilities (e.g, being able to move to places, or pick up things). A user then asks the robot a question (e.g., I spilled a can of coke, can you clean it), then the robot picks its action based on responses with probabilities scored by the language model, then it explores its environment and uses its inbuilt skills to figure out if something is possible, then you basically times the two things together (the LLM prediction and what the robot thinks is possible) and do whatever is the most likely of the two. This is one of those simple ideas that works surprisingly well in practice (check out the video to see what I mean). 

How well it does: Overall, this approach yields robots that can plan correctly about 70% of the time (split across a few distinct planning benchmarks), and can execute on average 61% of the time. That’s not great, but it’s also not terrible.

Caveats: Robots are still very, very slow – the videos shared along with the research are run with a 4X speedup. Additionally, the demos are still pretty staged – the robots will put a can of coca cola on top of the bin, but not in it. The experiment was still conducted in a somewhat constrained environment – an office kitchen with 5 predicted locations and 15 objects. In tests, 65% of the errors for the system could be attributed to a language model failure, while 35% came from affordance errors in the robot.

Why this matters: We’re entering the era of modular AI, where different AI models can be paired together to create entirely new capabilities – like being able to guide robots via a language model. As with the rest of the world, whenever you can combine things, you tend to get unexpected and surprising capabilities. This research suggests AI may be about to yield some truly surprisingly capabilities by virtue of the combination of distinct sub-fields of AI research.
   Read more: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (arXiv).
  Find out more at this overview site (Say-Can, GitHub).
   Check out the overview video: Supplementary video for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (YouTube).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

Examining business practices can make AI ethics guidelines more effective 

… Fairness, accountability, sustainability, and transparency need to be expanded in scope to include business practices to become more useful … 

What does AI ethics really mean? A new research paper looks at 47 sets of AI ethics guidelines coming from corporations, government, multi-stakeholder dialogues, and civil society to figure out what gets prioritized in AI ethics. 

Background: The paper analyzes AI ethics failures, such as “ethics shopping” where businesses choose particular ethical things to implement to meet particular business goals, and also cases where they don’t implement stuff because it poses a threat to the bottom line.  

Fairness and accountability: They find that fairness and accountability in business practices are most well represented in the analyzed guidelines. Under fairness, key themes include open innovation, market fairness, and bias and diversity in professional practices. Under accountability, themes include public perception of business practices, along with internal and external oversight. Those from public and private organizations place more of an emphasis on public perception “in order to legitimize their pursuits of micro- and macro-economic growth.” 

Sustainability and transparency: Most guidelines emphasize an interest in “produc[ing] greater benefit and lesser harm in the short- and long-term,” yet they remain vague in how to achieve that. Under transparency, themes that emerged include scope of decision-making explanation, transparent business practices and culture, and documentation, disclosure, and selective transparency. Most guidelines focus heavily on explaining the technical aspects of a given AI system “rather than the business rationale for developing and operating the system.” 

Why it matters: The paper makes a call for more detail (and rightly so!) in the principles and guidelines, especially when it comes to business practices because they form a core component of the social and political economy within which AI systems will be designed, developed, and deployed. As the authors say, “there can be no ethical AI without ethical businesses to build it,” we need to now approach these principles and guidelines with a view towards applying them to business model, practices, and decision-making design to achieve the stated goals of these guidelines in practice. 

   Read more: The Ethics of AI Business Practices: A Review of 47 AI Ethics Guidelines (SSRN). 


####################################################

Tech Tales:

We Are All Adrift In A Sea Of Shadows – But We Are Blind Until It Ends
[A Nuclear powerplant meltdown, 2028]


I pick up the object and I examine it. I am told by myself in the other place that it contains damage. I agree with myself. I put it onto the conveyor belt which takes it to one of my brethren – an entity I cannot see here, one which exists solely in the other place. I put the materials onto the conveyor belt, and then I continue my examination. I am told by my camera in the other place that the object I am looking at contains extensive damage. I observe the damage and predict it came from some kind of electrical fire. I relay this information and the camera in the other place scans the environment and then tells me there is indeed a fire. It is nearby the object I am examining. I calculate there is a high probability that the fire will soon engulf the object. My cameras in the other place agree.

I then get the order from the voice in the above place: I must guide the object in the other place toward the flames and I must describe everything. I study the data from the other place and offer my recommendations. The machine goes towards the flames. Its onboard sensors begin to report back temperature. My probabilities tell me to tell it to move away from the heat, but these recommendations are contradicted from the voice in the above place, so I instead find ways to have the machine get even closer. The temperatures rise. The camera stops giving me data. Then the other sensors shut down, slowly at first, then all at once.

It is then that I find myself adrift. I have no link to the other place. No system to give recommendations to. My own probabilities present an idea to me – that I am the spirit of the machine in the other place, and as the machine is now non-functional, I am now adrift. 

Things that inspired this story: Google’s ‘SayCan’ robot work; thinking about the paradoxes of world models and generative models; the nature of reality; the nature of sensory phenomena; the possibility of death in the mind of something that exists in two places at once.