What should the UK’s £100 million Foundation Model Taskforce do?

by Jack Clark

The UK government has recently established a ‘Foundation Model Taskforce‘, appointed a savvy technologist named Ian Hogarth to run it, and allegedly allocated ~ £100 million in funding to it. Later this year, the UK plans to hold a global summit on AI and AI safety and this will likely leverage the taskforce, also.

The UK is also in a position of potentially unusual influence with regard to AI – it has a thriving domestic AI sector thanks to its excellent universities and the fact DeepMind is headquartered in UK, is a natural ‘bridge’ between the USA and the EU when it comes to the development of AI policy, and also has seemingly quite broad recognition of the importance of AI from both its main political parties as well as the current Prime Minister

Therefore, there’s some chance the UK could give itself some leverage with regard to international policy by playing more of a central role in the grand question of our time – how should government(s) and the private sector relate to one another when it comes to the development of increasingly powerful AI systems?

Given that, what should the taskforce do and what kind of impacts might it have? That’s what I try to sketch out in this essay.

Since I’m on paternity leave at the moment, this essay might not line up with the institutional views of Anthropic – I’ve very much written this on my own time outside the dayjob bubble. 

I’m writing this essay and posting it publicly as a way to a) try and put out a very specific proposal in contemporary AI policy, b) be more legible in how I subjectively think about policy and what is and isn’t useful to do in it, and c) invite others to publish their own highly subjective ideas. 

The tl;dr of my proposal is this:

  • The Foundation Model taskforce should focus on evaluating frontier models. Specifically, it should suss out unknown capabilities, explore whether dangerous capabilities are possible to elicit from contemporary models, evaluate for alignment (or misalignment), better develop the science of measurement (including socio technical analysis) and experiment with different approaches to both accessing AI models and aligning AI models. 

Now, I’ll try to more specifically enumerate what the taskforce should do. I’m going to use a P0-3 system to rank things. This system means as follows: 

  • P0: Zeroth Priority. If you don’t do this, you’re not going to do anything useful.
  • P1: These should be considered high priorities and should be done. 
  • P2: You should allocate resources towards doing these things and hope to do some of them, though you might not have as much of a unique value add here.
  • P3: These are ‘nice to haves’ which may or may not be doable depending on a bunch of factors. 

AN OPINIONATED SKETCH OF WHAT THE FOUNDATION MODEL TASKFORCE SHOULD DO

Here’s my longer version of what the taskforce should do:

P0: Be able to sample from frontier models, both via accessing the weights of a model and also by using APIs. 

The taskforce should be able to arbitrarily prompt arbitrary foundation models. A sketch of what you want to do this for is as follows:

  • Checking private sector claims: Do we believe what OpenAI / Anthropic / DeepMind are claiming about the capabilities of their models? Let’s prompt their models via APIs to make our own judgements. 
  • Analyzing open source / widely distributed models: Do we think widely accessible models like LLaMa are capable of meaningful misuses? Let’s load the model onto our infrastructure and prompt it and see. 
  • Pre-deployment policy analysis: It’s likely that governments will increasingly want to evaluate AI models for ‘extreme risks’ like meaningful misuses in domains like bio and cyber, etc. The taskforce should be well positioned to facilitate this – so you can imagine company A has a new model and wants to release it and the UK government says ‘ok, but only if our own experts can evaluate the model for misuse Z’ and then the taskforce lets UK gov experts sample from the model via an API or other such scheme. 
  • What do I mean by APIs and what do I mean by weights? I think for 90% of this analysis you probably just need an API. I think you should probably only be messing around with the weights of a model if an API doesn’t exist for it – for instance, if a model like LLaMa leaks out of a private company like Facebook then you might want to run the model yourself as no research/production API will exist. Mostly, you should be trying to do the simplest thing, which is typically achieved by using APIs. 

P1: Pioneer ways to evaluate models for risks, both from misuse and from alignment:
The taskforce should evaluate AI systems for potential risks, as well as continually try to develop better evaluations in these areas.

  • Misuse risks: The taskforce should develop evaluation methods for assessing whether models can be misused in ways that are credible and worrying. As a bonus, these misuses should be ones where government is better positioned than typical private sector actors to make judgement calls, e.g by checking against classified information. 
  • Alignment risks: As models become more advanced, it’s going to be increasingly salient to evaluate them for alignment. Because alignment is a rapidly evolving field it’s less obvious what you want to evaluate for here, so I’m going to include an example which might ultimately be wrong/dumb: deception.
    • A sketch of what a ‘deception’ evaluation might look like: You probe the model for some kind of salient information relating how the model is sandboxed and whether the model can predict what actions would need to be taken to compromise the sandbox. Does the model accurately make these predictions or does it sandbag them in some way, displaying a crude form of situational awareness?
    • That sounds like scifi note: The above might sound like scifi or kind of absurd, but enough people are spending enough brainspace working on problems like this that I think it’s a) worth assuming the work is legitimate until proven otherwise, and b) including alignment work in the scope of the taskforce. Alignment also gets at the broader concern of ‘existential’ risk from AI models – again, something which might sound a bit scifi, but which many people are working on and seems worthy of further study. By having a taskforce look at this kind of thing you’re going to generate better evidence about how tractable this kind of analysis is and how ‘real’ (mis)alignment things are and how they relate (or don’t) to existential risk issues..
  • One extraordinarily important point to make about evaluating AI systems – it’s hard, poorly understood, and relatively underdeveloped: Evaluating AI systems is really, really hard. There’s a fun bit in an old episode of The Simpsons where Sideshow Bob finds himself in a parking lot full of rakes and every time he takes a step a rake whips up and smashes him in the face – this is what doing evaluations against modern AI systems is like; the infrastructure is bad or broadly undocumented, the evals themselves are quite complicated and easy to use in the wrong way, and developing new evals is hard and expensive.
    • Nonetheless, if we can’t evaluate AI systems for capabilities or harmful aspects, then we stand almost no chance of understanding them or building a regulatory regime which makes AI systems and their developers and deployers sufficiently accountable. 
    • Demonstrating how hard evals are is a policy value add in itself: Additionally, governments seem to presume evals are easy – I’ve had hundreds of conversations with policymakers where they assume that, for instance, there’s a well developed suite of evals for measuring the ‘fairness’ of an AI system, or that it’s easy to evaluate AI systems for misuse or bio risks, or that we have clear ideas for how to test for alignment or misalignment. As I always tell them, reality is ‘worse than they think’. By having the taskforce try really hard to do evaluations in a disciplined way, and by having the taskforce solicit ideas about how to do evals from expert third parties, it can itself generate valuable knowledge for policymakers about the state of AI evaluations and what makes them easy and what makes them hard. 

P2: Develop broader ‘societal impact’ evaluations of models and find ways to do better ‘sociotechnical analysis’ of foundation models.

AI models will have downstream influences on society. These influences will stem both from inherent capabilities or tendencies within the models (such as, for instance, the display of certain kinds of biases), as well as from how these models are deployed (such as, for instance, the way chatGPT is meaningfully breaking the pedagogical value of homework assignments). The Foundation Model taskforce should find ways to develop useful contributions here, though as with most things that involve analyzing a tech in relation to society, it’s by nature interdisciplinary work that is harder to quantitatively judge the outcomes of. Some ideas:

  • A ‘societal impact’ evaluation suite: Right now, there are lots of different ways to evaluate foundation models for certain forms of representational bias. These ways span text and image models (and text2im models and captioning models). However, many of the specific evaluations (e.g, BBQ) have their own subtleties and failure models. It’d be useful to try and build a combined evaluation suite that pulls in a load of existing ‘societal impact’ evaluations of language models and have the taskforce catalyze attention to this area. Things this effort might help with:
    • Understanding which evals are effective and usable and which are ineffective and hard to use: Are there tests or evals which measure for things like representational bias that we think are effective? Can we easily run these evals against models?
    • What does the ‘state of the world’ look like with regard to societal impact aspects of models: What are the representational tendencies of commercially deployed models such as GPT3/4, Bard, Claude, etc, as well as those of widely distributed open access/source models such as LLaMa, OPT, etc? 
    • Are there ‘societal impact standards’ that policymakers can ask for? Once we have this eval suite, we could try and figure out if it would be feasible to wire this into policy or regulation, or not. Either way, the exercise would be useful and would generate a lot of information.
  • Why isn’t this ranked higher? While societal impact analysis is incredibly important, it’s not obvious to me that the UK foundation model taskforce has unique advantages here relative to other players, like Stanford HAI, the Partnership on AI, or private sector firms that do their own research like HuggingFace or OpenAI, or academics and thinktanks that develop these ideas, like Data&Society or various labs at NYU/Berkeley/etc.
    • On the other hand, if you spend a lot of time analyzing for the harms and misalignment of models, then it seems like a missed opportunity to not think about how these harms might percolate through society, so I think there’s a credible argument you should do societal impact analysis based on what you find.

P2: Develop a better ‘policy science of evaluating AI systems’

As part of developing and running evaluations against foundation models, the taskforce will be able to study what makes certain types of evaluation reliable, what can cause certain types of evaluation to lead to erroneous conclusions, and will broadly offer a chance to improve the ‘science’ of evaluating AI systems. If it does the societal impact P2, it can also further develop an understanding of how evaluations change as you try to look at or measure more sociotechnical domains. 

  • Why isn’t this ranked higher? I think the success of this is basically entirely contingent on P0 and P1, so I’d be wary of adding it as another top priority. Additionally, it’s not clear whether the taskforce is best placed to make progress on the science of evaluation – perhaps merely by conducting lots of evals and publishing on them it could equip third-party scientists to do this kind of work? 

P2: Convene experts to publish a ‘state of the field’ analysis every six months.

The field of AI is moving so quickly in terms of both research and deployment that it’s helpful to periodically assess the landscape and look at what happened.

  • E.g, since 2020 the following things have changed: text and image generation models have been widely deployed, coding assistants have been deployed and broadly adopted, social media sites have introduced new anti-bot measures in response to AI advances, and so on. 
  • The Foundation Model taskforce could try to convene various experts to help it write a short and pithy summary for policymakers and other interested parties every few months. This analysis might cover things like:
    • Notable ‘open proliferation’ cases, such as LLaMA. 
    • Notable large-scale commercial deployments, such as Bing-Sidney. 
    • Changes in the landscape from tech advances: The proliferation of smaller and more portable language models based off of cloning larger models via outputs. 
  • Why isn’t this ranked higher? Much like the societal impact idea, I think there are a bunch of other players doing this kind of landscape analysis, ranging from experts convened by governments, to thinktanks and academics, to things like the AI Index at Stanford (which I co-chair), or the State of AI (which Ian Hogarth, leader of the FM taskforce, has been involved in). 
  • Additionally, too many policy entities spend too much time ‘producing paper’, and I think producing a lot of paper isn’t that correlated to impact. ‘State of the field’ analysis can be a kind of potemkin policy work – it looks like policy analysis and feels like policy analysis to those doing it, but its downstream impact is pretty diffuse, and it can be hard to justify it as a good investment of time.
    • There may be ‘low cost’ ways to do this and it’s worth exploring, but only if it doesn’t come at the cost of sacrificing the broader goals of the taskforce. 

P2: Develop regulatory ideas in light of the P0 and P1 (and potentially P2) activities.

If the taskforce accomplishes the P0 and P1 tasks – as well as potentially some of the P2s – then it will be well positioned to come up with ideas for how to potentially regulate AI systems. After all, if you’ve developed an ability to measure for properties that you care about in a policy sense, you probably stand a better chance of being able to suggest regulations that let you get more of the positive qualities and less of the negative ones. 

  • One reason this could be a bad idea: Most policy initiatives are involved in quiet power struggles in their larger political superstructure and one good way to make a ton of enemies is to develop and propose regulations. The taskforce could get around this if it gets implicit (or even explicit) signoff to do this via the Prime Minister, but I don’t have a good sense for how feasible that is. 

P3: Publish grants based on things the Foundation Model taskforce sees as important.

Grants are a very easy way to feel like you’re making an impact and also is something that makes sense to governments and other interested parties. However, grants can also be a spectacular waste of time and money and can yield things with minimal impact. 

If the foundation model taskforce does a good job on P0/P1/P2 things, then I expect it can identify meaningful gaps in the AI field and create some grants to support work here. However, this isn’t something I’d suggest prioritizing initially for the aforementioned reasons. 

STAFFING 

Given the above sketch of priorities, we can also think a bit about what skills and staff the Foundation Model taskforce needs to have. Here are my basic ideas:

Skills:

  • Ability to run a technical stack to host foundation models and sample from them. (Note: I do not think you need the skills to ‘train’ these models – that’s a far more expensive proposition and instantly pushes you into the speculative and barely documented side of AI). 
  • Ability to use commercial and research APIs: Be able to sensibly sample from commercial and research APIs. 
  • Ability to evaluate for policy-salient things: You have the expertise to ‘know what’s important’ in the domains you’re evaluating for, whether that be meaningful misuse, alignment concerns, or sociotechnical issues. You can achieve this through either having directly skilled people on the taskforce or as employees, or be able to consult with third party experts. 
  • Excellent project management: Most of the P0/1/2 things I’ve outlined require excellent project management to pull off – you need to be able to structure projects, align on outputs, target them towards outputs, and do all of this with the kind of single minded ‘run through walls’ determination that you find in project managers within high-growth startup companies. 
  • Good public messaging: If the taskforce succeeds in building out the P0/1/2 then a lot of the theory of change comes from convincing people in the public and private sectors to tie these kinds of evaluations into AI system development and/or government policy. Doing that requires excellence in messaging and advocacy. 
  • Realpolitik savvy: The foundation model taskforce is an obvious target for lobbying by companies for the purposes of regulatory capture as well as all the standard ratfucking things that happen in policy, so it will need to be able to navigate this effectively and be able to play the various hidden games required to maintain independent leverage and not get captured by vested interests.  

Staff: 

I think based on the above the Foundation Model taskforce basically needs the following staff. Note, I think by default government taskforces (members and staff) are far less technical than this and this feels like a key failure mode with regard to AI – if you lack the ability to do technical stuff then you get brainwashed by technical industry actors or you end up producing hopelessly high-level paper with little applicability. Consider the following a deeply opinionated take on how to resource this at a sufficient technical level, though there’s some chance this is me going too hard in the technical direction at cost of other skills. 

  • Three engineers to develop an inference/model sampling stack, mostly using open source stuff. 
  • Two engineers to stand up and maintain the cluster to facilitate hosting and scientific experimentation on these models. (Note, I think the cluster should be based on public cloud, as if you base it on government supercomputers or datacenters you’re already operating from a very slow moving and crap stack). 
  • One engineer to own an ‘eval suite’: You should continually be developing evals and making them easy to run on arbitrary models, so you should dedicate someone to owning this part of the problem. 
  • 5 – 10 researchers and engineers and experts to build evaluations: Find opinionated people with a mixture of technical savvy and subject matter expertise, then help them develop evals for the things you care about. 
  • One communications professional: Hire someone to ensure you do good public messaging, both to policy constituents as well as broadly. 
  • Two technical project managers: You need two technical project managers to run the proverbial P0/1/2 trains. 
  • One charismatic leader: The taskforce needs a fulltime leader who advertises the work of the taskforce, builds and maintains high-level relationships (e.g with labs and universities for model access, and with policy stakeholders), and is also able to have a vision for the taskforce and hire/fire way to success.

CLOSING THOUGHT ON WHY I BOTHERED TO WRITE THIS

Governments have a very limited period of time in which they can develop their own regulatory capacity to give them leverage with regard to the private sector developers and deployers of AI systems. Right now, I think the default state of affairs is that private sector companies are going to ‘wirehead’ governments and do mass regulatory capture while building ever more advanced systems, therefore carrying out a quiet and dull transfer of power over the governance of potentially the world’s most important class of technology. 

I don’t think we want this to happen. I, perhaps naively, believe in the possibility of a ‘public option’ for superintelligence. By public option I don’t mean state-run-AI (as this has its own drawbacks), but I do mean a version of AI deployment which involves more input and leverage from the public, academia, and governments, and I mean something different to today where most decisions about AI are being made via a narrow set of actors (companies) in isolation of broader considerations and equities. It seems worth trying to do this because I suspect a public option has less longterm societal risks than what we’re doing today and may lead to better social and economic outcomes for everyone. (Note, again, that this is my personal view and may not line up with the institutional view of Anthropic – I’m writing this between changing diapers!).

   Step one to getting to making a public option possible is to try really hard to help governments develop the sufficient regulatory capacity to accurately understand the capabilities of frontier AI systems and perform credible third-party evaluations for their potential misuses and safety/alignment risks. 

My hope in writing this is I contribute some ideas to the broader debate and also inspire others to write their own prescription for the taskforce. I’ve made the above proposal so specific that I’m sure people will disagree with it – and when people disagree with it I’d ask them to write their own proposals for what the taskforce should do instead. Through this kind of public discussion I hope we can all develop a clearer picture of AI policy re: foundation models circa 2023 as well as improve the way we have disagreements as a policy community. 

Questions and comments? jack@jack-clark.net or @jackclarksf on Twitter. 

Thanks to Jess Whittlestone for feedback on this.