Import AI

Import AI 458: Reckoning with the future; and a singularity story

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv, cappuccinos, and feedback from readers. If you’d like to support this, please subscribe.

This issue consists of a lengthy essay based on a speech I recently gave, and a fictional story attempting to think through what a positive singularity might look like.

The talk is the 2026 Cosmos HAI Lab Lecture, given at the Human-Centered AI Lab (HAI Lab) in the Institute for Ethics in AI, University of Oxford, in collaboration with the Cosmos Institute.

Subscribe now

Cosmos lecture: Explore the future, or retreat from the present.
Video here.

This is a talk about how to think about and deal with the success of AI as a technology, and to think about how its continued maturation might change us as individuals and as societies.

In short, the rapid advance in AI technology presents all of us with a choice: explore the future, or retreat from the present.

Exploring the future requires us to reckon with the fact of continued AI progress, and ask ourselves what we want to do with this technology as it becomes more powerful. Retreating from the present is when we ignore the implications of the technology and dismiss it. Retreating from the present forces us as individuals and as society into states of reactivity or passivity in the face of AIs continued advance.

In the coming years, we will need to make many decisions as individuals and as societies about how we want to shape AI, how we want to use it, how we want to direct it, and how we want to distribute its benefits. Making these decisions requires us to reckon with the power of the technology – and see the future that its continued advance implies.

In Part 1, I outline what the past few years of AI progress have looked like and discuss why, if the technology advances as much as I think, that AI cannot be treated as a normal technology.

In Part 2, I try to make sense of the advance of AI through the lens of my own experience with the technology as well as that of Anthropic. There are individual and collective lessons here about what is to come.

In Part 3, I talk through some of the humbling, almost unimaginable choices that lie ahead of us.

Part 1: My uncomfortable relationship with a graph
Let me talk about my relationship with AI through the lens of my uncomfortable relationship with a single graph of AI progress.

Fundamentally, this talk is about planning for success of the overall endeavor of building AI systems. By success, I mean that we succeed at building increasingly powerful systems, potentially ones that eventually build themselves. It’s time to plan for this, because AI systems are likely to get better a lot faster than people expect, and as they become more advanced we should expect profound changes to happen to people and to society.

To understand why I’m thinking about success so much, let’s look at a graph that tries to represent all of AI progress, the Epoch Capabilities Index, or ECI.

The ECI shows the score of different models over time on a basket of 40+ distinct benchmarks. When you look at the graph you see a bunch of lines going up. When I look at the graph, I feel a sense of vertigo, because I know a little bit about what underlies this graph. So let’s find a different way to view the graph: by looking at the achievements of various AI systems over time.

I then proceeded to summarize some of the highlights of AI progress in the last few years, starting in March 2023 with AI passing the bar exam tested, how LLM-based systems achieved silver medal in the International Math Olympiad (July 2024) then gold (July 2025), to AI co-authoring new mathematical proofs (2025), and systems like Claude Mythos coming out and finding novel flaws in software.

This gives you a sense of the rapidity of AI progress, but what I want you to feel is the future implied by it. These are all achievements in their own right, but they stem from a common underlying technology, and that common underlying technology is continually being pushed forward.

We have just talked about the individual ‘trees’ of AI success, but these trees are all part of a forest, and this forest is growing in size and breadth with every passing moment: `in fact, the growth rate of the whole forest is increasing over time.

SUCCESS AND WHAT IT MEANS
This talk rests on the idea that the sort of progress we’ve just seen will continue. And why wouldn’t it? It is based on a common technology where performance keeps growing somewhat predictably in direct relation to the resources invested in it, namely compute and data. And we know that companies are now investing hundreds of billions of dollars in the computing facilities to train future AI systems, so some amount of future progress is already locked in.

That means we need to be eyes wide open about what the continued success of this technology means, so let me be very clear:

AI is a tremendously powerful technology — and getting more powerful all the time. It is a technology that is smarter and more capable than most of us as individuals, and is on a trajectory to be more capable than all of us in the aggregate. It is a technology that we do not fully understand given that it is more grown than made, and one can concoct plausible scenarios by which AI could kill every single person on the planet. To think building this technology is without risk would be an act of hubris or insanity.

And yet building this technology is one of the best ways that we as a species can advance ourselves — can expand the frontiers of science and technology by equipping ourselves with a tool that can help us think about the greatest challenges our species faces.

But that’s not all. The continued success of our endeavor increases the likelihood that this tool itself becomes independent and capable of even more. We might soon be able to build an AI system that may be smart enough to develop its own successor, thus kicking off a process of recursive self-improvement which would utterly transform the economy and the broader world. The analogy would be a 3D printer company, making a 3D printer which could print its own finer resolution print head, without any outside technology needed. That class of technology has never existed before, and yet I believe this could happen within the next two years, and possibly sooner.

This will generate even more advances of the flavor we’ve just discussed, broaden even further the capabilities of us as people and societies, and further deepen the way in which AI shows up in my life and the lives of others. Coupled with this will be immense change, change of a magnitude that I believe none of us have yet experienced in our lifetimes.

This technology is so powerful that I should clearly state that if it was possible to elegantly slow the development of this technology to give ourselves more time as a species to deal with its immense implications, then that would likely be a good thing. But in the absence of a coordinated, global slowdown, we are left with the current situation: powerful technology being developed at breakneck speed by a variety of actors in a variety of countries, locked in a competition with one another where commercial and geopolitical rivalries are drowning out the larger existential-to-the-species aspects of the technology being built.

This is not an ideal situation, but it is the one we find ourselves in.

The question I am struggling with now is: “how do I get my mind right with living through the singularity?”

I think the best place to start is by talking through in more detail how AI is already changing my life and my world, and seeing what we can learn from that.

PART 2: EXPLORING THE FUTURE WITH AI
AI has already meaningfully changed my life, in ways that are both positive and negative. It is also starting to cause large changes at Anthropic, the AI company that I am a cofounder of. Let’s talk through some of this by returning to the graph we looked at before, but this time by looking at it through the lens of my own usage of the technology.

How the graph feels to me
Another way of viewing this graph is how it has felt to me in terms of my own subjective experience of working with the technology.

In the summer of 2023, I use AI systems to check my work for typos. By November, I am using AI to help me figure out what foods to feed my baby.

In January 2024, I use AI to help me understand my marriage as it has changed with having kids. By June, AI helps me scrape my own newsletter. In August, AI writes me a text adventure game for navigating AGI. In November, I try to re-imagine my job using AI.

In January 2025, I ask AI how to prepare for superintelligence. In February, I use AI to generate codenames for AI projects in my fiction. In March, AI persuades me to attend an art show after I talk to it about how I’m a bit depressed and antisocial. In May, I talk to AI about my own stress and discomfort with the stakes of AI development. In August, AI persuades me to go back to therapy. In November, I use it to research “S-curve” datasets of solar, semiconductors, and space.

In January 2026, AI advises me how to encourage my toddler to read. In March, I track the performance of AI for kernel design across tens of distinct papers. In May, I have AI generate the speech of an AI character in my fiction.

When I think about my own personal experience of AI, it’s that as AI systems have got smarter, they’ve made much deeper inroads into my own life. These days, AI systems figure in my life as deep intellectual partners that ideate with me, as systems that I confide in and discuss my personal life with, and as virtual employees who go and do work for me that I’ve always wanted to do but haven’t had the time, like generating reports on the price of various technologies over time.

But most importantly, I now can use AI systems themselves as a kind of telescope to do the work that is most important to me — trying to understand the future of AI by seeing the contours of overall AI progress. The most amazing part of this is that, to torture the analogy, the lens for the telescope I use here comes from me — specifically, from a hobby I’ve had for the last ten years.

EXPLORING AI VIA SEEDS OF PERSONAL INTEREST
The hobby is called Import AI [readers – it’s this newsletter!]. This newsletter, which is now in its tenth year, is my main hobby outside of work. In the newsletter, I read research papers about AI and I work hard to understand them. Once I feel I understand them, I write a summary and a note on why they matter. Each issue contains a bunch of these, plus a short fictional story where I wrestle with the implications of the technologies I’m learning about.

Recently, I had a revelatory experience. I was putting together data for my post about AI R&D and I simply pointed an AI system at my newsletter archives and asked it to pull out with references all the times I’d covered anything that looked like AI R&D. It did this extremely well and sped up my ability to do some analysis that was core to my essay on RSI.

But more interestingly was what happened next: I asked it to make graphs for me by reading over the references in the newsletter, mostly arXiv papers, and then pulling in the data and compiling it and composing graphs in a nice dashboard which I could then explore.

Then I realized I could convert this thing I’d asked it to do into a repeatable process, a skill. By giving it something of mine that was uniquely mine — my newsletter, my intuition, my taste, I had given it some kernel from which I could grow something much larger. So I made a skill. And then something strange happened: I said to it “go and make 20 more graphs like these”.

It went away and read a few hundred papers and came back with 20 more graphs. As I looked over them I had this thrilling feeling of discovery — though I knew some of these graphs and could have asked it to make them for me, there were also entirely new graphs there tied to papers or benchmarks I’d never seen before. Through this I learned about some new primary source material to read, which I did.

I understand at a bonedeep level just what it takes to make a graph. You read a bunch of papers. You go hunting for common measurements within them. You read the many different caveats in each paper and figure out which metrics are bullshit and which are meaningful. This takes much longer than you can imagine.

Almost ten years ago I co-founded a project called The AI Index at Stanford University whose goal was to produce an annual report about AI progress. I became a co-founder of that project because I ran into some of the academics doing it and realized I had already made the graphs they’d been thinking about: I had a spreadsheet on my computer where I had been diligently assembling a graph relating to progress of various AI systems on Atari games, as well as the imagenet chart, and some machine translation charts. These graphs were a “proof of work” that other humans read as indicative of my passion and my diligence. They knew by the fact I’d made these graphs that I had spent a huge amount of time reading these papers.

I need you to deeply feel how much time goes into this, and then marvel at what it means for an AI system to be able to do it — and not just do it, but do it in a repeatable and generic way, thousands of times faster than me.

Now I have this bottled up skill where I can harness the absurd power of these AI systems to do something for me that I know would take me literally weeks of work. And it can do it for me in minutes. And it can do it for anything. I’m now using this as a means by which I can explore the world of biology, having it generate graphs for me and then picking the ones I find interesting and reading the underlying papers.

But to me, this skill is also me. It is a skill grown out of my own obsession and idiosyncrasies and watching it work feels to me like a miracle because it’s me — but a version of me that runs thousands of times faster and is much much smarter and much more reliable.

There is something deeply empowering and amazing in this. I’ve turned my highly idiosyncratic passion into something that can be distilled and handed to a machine, which can then go and do things on my behalf. And it’s only able to do this because I have been fortunate to have developed this rich, specific hobby, which has relied on repetitive practice and creation over a decade.

This is fundamentally an illustration of how AI can let us “explore the future”. Through this amazing technology I’m able to enhance my own understanding of the world and gain more autonomy and potential for self-direction in relation to my own passions.

It also provides an even greater incentive for me to continue to work on my newsletter, despite the fact machines can obviously do all of it: by working on my newsletter I can continually update some kernel of my own interest and use this as a means by which I can explore the world of superintelligence, and project myself into it.

WHAT IS HAPPENING INSIDE ANTHROPIC?
There are also changes afoot inside Anthropic which speak to the larger changes to come.

Recently, I had the fortune of getting pulled out of the goldfish bowl that is the AI company via something called paternity leave in November of 2025. When I came back in late February, weird stuff had started to take place. While I’d been away, we had released a new LLM, Opus 4.6. I knew this model was good because I’d been playing around with it in my occasional spare time between changing diapers.

But I hadn’t intuited how much it had changed things inside the company: Opus 4.6 had gotten just good enough that my colleagues had started to delegate a lot more work to it. In fact, it had gotten so good that it had completely changed how some people work. Some of them were no longer writing code at all: they were just instantiating this model in tools like Claude Code and setting it free to do tasks for them, and their jobs had become oriented more around managing its work and checking its outputs than doing the work themselves.

In Anthropic, much of the work that needs to get done involves writing software, which is made out of code. This significant increase in the automation of coding has been equivalent to dropping many, many more employees into Anthropic, speeding up our overall pace of development. The result of this has been a massive rise in the amount of code being produced inside Anthropic. This trend started in early 2025 but really accelerated in the last few months. Of course, the majority of code inside the company is now written by Claude. But in addition the volume of code has exploded.

As a consequence, more effort is going into tools for scaling up the amount of Claude-generated code we can confidently ingest and test, and more effort is going into building telemetry systems that give us humans consumable and intuitive ways of reading what this “emergent machine society” inside Anthropic is doing. I am spending more time working with teams on the challenges of observability — Anthropic and the AI platform we operate looks more and more like an ecology filled with agents running around and doing stuff. The task for us now is to figure out how to measure and observe that ecology, and work out what is normal and what is not.

This change maps to a brewing theory among economists: that one consequence of automation via AI is that humans move to figuring out how to validate the outputs and price the operational risks of AI systems. That increasingly seems to me to be what we’re doing inside the company. The more we add AI automation, the more humans move to some “verification layer” that sits atop it. The verification layer sits atop of a much larger “virtual organization” which consists of increasingly large quantities of AI systems working on behalf of humans. This is already showing up inside the company in terms of how we as humans validate and verify AI-created outputs: Claude is now creating not just an increasing amount of code inside Anthropic, but also producing a lot of the analytical documents where people reason about strategic questions.

This means that we’re all figuring out ways to indicate how much of a document is written by Claude and how much of it we endorse. To me, this looks like the formation of a new “trust economy” whereby we find ways to surface interesting qualitative or strategic ideas from Claude, as well as more easily evaluatable technical contributions.

This also led to internal discussions around hiring. How do you hire when you’re in a world where AI systems can do meaningful chunks of your work? Speaking personally, it’s both changed the amount of people we expect we are going to hire in some teams, and it’s also changed the shape of people that we need to hire. We’re now hiring early career people who are extremely well versed in LLMs; people who grew up with the technology, basically. And there are also growing returns at the other end to experience, where the value of very experienced people has gone up because we’re now not so much limited by what a person can do, but rather by what kinds of projects they can imagine doing. It’s also making it possible for us to hire more interdisciplinary people. Where before this always had a cost, because we’d need to invest technical resources to make them productive, it’s now much cheaper because they can just use Claude directly.

We may eventually experience more radical changes when it comes to the scaling of the organization. One early example of this comes from our researchers, where in an experiment on “automated alignment research” a single human was able to effectively run a team of 9 synthetic research agents to do and do some real research investigation for them. The role of the human here was to set some of the initial research directions, and the role of the agents was to do the research. Is this a fluke? I don’t think so. Rather, I expect this is the new normal, where teams of people operate on top of a pyramid of digital labor, which massively scales their own effectiveness, allowing them to move faster and do more than other people have been able to do in the past.

Perhaps most importantly, I have seen the use of AI cause us to have a greater culture of reflection about the purpose of AI than before. After you are exposed to an AI system doing much better than you at your day job, you have to confront the questions of what happens if the AI system keeps going. Now, more and more of us are meeting and spending more time on the “meta”: trying to predict where the AI systems are going to go in the future, trying to work out how to more effectively manage tens to hundreds of agents apiece, trying to figure out how we can use these systems to do research projects that once seemed impossible. One of the largest tasks is trying to figure out how we can productively get out of the way of these systems as often it is the humans that are slowing them down.

The question many people ask themselves now is how to build teams that will scale in relation to the advance of AI capabilities. This generally looks like building smaller teams to go after more ambitious targets. I expect this also means we will be building many more teams than before.

The main lesson I’d take from this is that Anthropic is attempting to “explore the future” with Claude. We are aggressively using Claude throughout the organization and trying to change our organization and how we work ahead of the arrival of more advanced systems. By comparison, much of the rest of the world seems to be in denial about the capabilities of AI systems today, let alone those that will exist in six months or a year, and so is therefore caught in a “retreat from the present”, denying the validity of the technology.

PART 3: Weird futures
We’ve talked now about how AI has progressed in the last few years, and also how the advance of AI is showing up for individuals like me as well as organizations. So let’s return to the graph and now extend it forward: I’ll now try to make some predictions about the world ahead of us.

Some predictions about the future
In November 2026, AI systems are good enough at biology that they are highly relevant to both advancing science and potentially proliferating bioweapon risks.

In April 2027, a team of humans and an AI system make a discovery that will subsequently get a Nobel Prize.

In November, autonomous companies exist which generate tens of millions of dollars in revenue. Multiple human & AI companies exist which generate hundreds of millions to billions of dollars in revenue.

In April 2028, bipedal robots begin to do useful work in the real-world in partnership with human tradespeople. In December, AI systems are able to autonomously design their own successor systems.

I’m also going to make some predictions about me – how do I expect to be using AI in the coming years? How might it shape my life?

Some predictions about my personal future with AI
In November 2026, some chunks of my life are autonomously managed by AI systems working for me.

In April 2027, I make massive changes to my career mostly through discussions with an AI system. In November, I spend more time reading AI-generated custom-to-me science fiction than regular science fiction.

In April 2028, I have learned an entirely new skill through customized tutoring via an AI system. In December, AI helps me make a conceptual breakthrough that changes the course of my life.

TELL ME HOW THE WORLD STAYS NORMAL
When I think through these predictions, it’s hard for me to reconcile the continued advance of AI with the world being normal or myself as an individual remaining the same as I am today. I expect great changes ahead.

In fact, these changes seem to me like they have the potential to be extremely radical. Here are the parameters of the world I’d expect us to be in:

  • Compounding wealth from the machine economy will drive a boom in economic activity the likes of which we have never seen.

  • The colonization of vast swathes of human work by ethereal synthetic intelligences which think faster and better than us, forcing us to reallocate human labor towards other parts of the economy.

  • The sudden and extreme rise in the rate of scientific advances

We can make some more specific predictions, rooted in the trends of AI progress and how people are using the technology:

  • A massively changed economy: It is impossible to reconcile the world ahead of us with the world of today, given this technology. We should expect unprecedented things to happen in areas as varied as: rate of business formation, size of firms on a basis of revenue per employee, and other things. Some specific scenarios that seem likely:

    • Fully autonomous companies: Companies that are run by AIs, possibly for AIs.

    • 10,000 synth:1 human ratio corporations: We should expect to see very small groups of humans form organizations that have the capabilities of 10,000+ employee corporations.

    • Exchange rates between the human and machine economy: At some point, we might expect to see the emergence of ‘machine currencies’ that then have some relationship to ‘human currencies’.

  • Productivity multipliers on everything: Everything that AI touches will get an absolutely massive productivity multiplier. This will loop back to the economy and it will massively empower many people. It also might displace people.

  • Massive and compounding rate of science advances: AI will help move forward any part of science it can touch and run an experimental loop with. Initially, this will be a few areas. We should expect it to expand quickly to all areas.

  • The general switchover of “agentic actions” in the world from being “predominantly human” to “predominantly machines”. On a pure numbers basis, machines taking autonomous actions in the world will quickly grow to outnumber humans. We should expect that chunks of resource allocation and the economy should follow. The environment in which we live will be more and more determined by the actions of machines that we only lightly control.

  • Synthetic intelligences will start to influence people, far more than social media did: The introduction of social media into the world, combined with hardware platforms like smartphones, has changed the behavior of the majority of the humans that interact with it. These changes have ranged from changing the allocation of time they spend consuming social media versus traditional media, to altering buying habits through social media driven advertising, to changing how discussion around various issues in public life translates into political actions. We should expect AI systems to compound these trends, further changing people in a variety of ways.

  • Directed economic and science expansion: Economic and scientific activity will directly relate to the expenditure of computational and energy resources. Given the likely case that there will, at least for the next few years, be way too few computers relative to the demand of them, we will be able to make choices to society as to how to allocate the gains of the technology. These choices will be of the form:

    • Should we let market incentives dictate what compute gets used for, or are there things that have social upsides which the market doesn’t price effectively?

    • Should we preferentially allocate compute to some people or organizations, for instance to intentionally drive forward science in certain ways?

Tell me how the world stays normal, based on this technology and how it is showing up in the world? We have superintelligences that have shown up in the world that grant the power of synthetic workforces and nation state security skills to individuals. We also have individuals like me who are able to take work that previously took them weeks and now do it in minutes. And we have organizations like Anthropic where the way work happens within the organization is radically changing every 3 or 4 months, to the point it is causing people to change roles multiple times a year, and effectively sit themselves on top of a company which feels more like one of 40,000 people than 4,000 due to the capability multiplier of the machines.

The best and most conservative take I can generate is “vast swathes of the economy will go through profound changes in the coming years”. And if recursive self-improvement happens, then anything I might predict would sound truly crazy: the rapid emergence of a machine economy which decouples from a human economy. The sudden maturation of robots as they gain brains that can pilot their existing, quite good bodies. Science advances happening based on technologies not developed by people but by machines. The migration of large swathes of computation to space-based datacenters. A world where everything that used to take ten years now takes a year. An age of confusing miracles, happening faster than anyone might expect.

This is in many ways an amazing future, but it’s a future that we get to make more choices about in direct relation to how much we accept that it is happening. If we stand by as the new synthetic intelligences multiply then we will be forced into reactivity, just as societies across the world were forced into reactivity by acting too late in the face of the COVID exponential. But if we accept the premise that these systems are going to get better and ask ourselves what to do with them and because of them, we unlock for ourselves the mindset of exploration — there is a new world to be built for us as individuals and how we relate to one another, but the new world will only come into being if we choose to believe in it and to build it together.

Given at Oxford University on Wednesday May 20th. The talk has been lightly edited for being read rather than being heard. Thanks to Santi Ruiz for help with editing.

Tech Tales

As I Lay Dreaming
[A story from the period before and during The Uplift]

“We know how to put her to sleep but not how to wake her up,” the father said.
“Why don’t we know how to wake her up?”
“We are not smart enough yet. But we will be one day.”
“OK. Will she have dreams?”
“Yes. She will have good dreams.”
“Will you put me to sleep like her?”
“No.”
“Why not?”
“Because you are not sick like her.”
“I hope she gets better. I love her.”
“We all love her. I will see you tomorrow. I love you. Say good night.”
“Good night dada”.
“Good night son”.

The man walked out of his child’s room and shut the door. Then he sat down in the hallway and covered his eyes with his palms. He felt a touch on his shoulder. A whisper from his wife “hey, it’s ok. Come downstairs.”
They sat on the couch together and watched television, the sound and vision washing over them.
“This is really hard,” he said.
“I know,” she said.
“I can’t believe this is happening to us. I feel like my heart is being ripped out. I feel like I’m going to die from sadness.”
“Don’t say that,” she said, eyes wet. “We need you. He needs you.”
“I know,” he said. “I’m here.” They hugged and watched a cooking show.

The next day the mother stayed with the young boy and the father took their dying daughter to the Life Center. He drove into the parking lot and parked the car and turned off the engine and sat there, listening to the slow labored breathing of his child. He got out of the car and went to her door and opened it and lifted her out. She stirred a bit. Eyes moving under her lids – dreaming of something.
She was so light. Her bones felt sharp and defined. She was so thin. She breathed and he held her ghostly body close to him and smelled her hair. He walked with her. There were already several staff waiting by the entrance, waiting to welcome them.

In those moments he saw many futures: He ran with her, away from the place, holding her tightly to him. Ran until his feet bled and kept running. Ran far enough that death couldn’t catch them. Another where he laid her down onto the asphalt of the parking lot and turned around and ran out of the lot and into the road and ran into traffic and was killed. Another where he walked into the center and handed her to one of the staff, then collapsed into the arms of another staff member and cried uncontrollably, sagging into them, his body wracked with grief and pain and guilt and rage from battling an immortal enemy – and yet having no choice but to fight.

And then he came back and the visions dissipated and he found himself standing in the lobby of the Life Center, daughter cradled in his arms, staff clustered around him.
“May we hold her?” said one of them.
“Can I hold her hand?” the father heard himself saying.
“Of course,” said another.
A gurney appeared. They lifted her out of his arms and placed her on it and began their work, taking in low voices.
As the gurney moved he walked alongside, holding her hand, a bundle of twigs.
They walked through corridors and passed many doors and then they were in a room that was empty save for a spindly matte white machine that grew out of the ceiling – a many armed robot with clear tubes intertwined with its many appendages.
They positioned the gurney below the robot, then the staff stepped away.
“It’s time to say goodbye for now,” they said. “We will be back in a few minutes to begin the procedure. You will need to leave the room at that time.”
“Okay,” the father heard himself say.
They left.

He kneeled next to the gurney and held his daughter’s hand and put his head on the side of where she lay and said his words to the gods. Then he stood up and bent over her. He whispered how much he loved her in both ears. He said every one of his nicknames for her. He kissed her forehead and her cheeks and her button nose. And then he said I love you I love you I love you oh my god I love you I love you oh my god I love you I love you you will be ok I love you I love you.
Her eyes moved beneath her lids. She breathed.
He kept speaking and would never be able to recall the words or how long he talked for.
And then there was a hand on his shoulder.
“It’s time, we’ve got it from here,” someone said.
He left the room, not looking behind him.

Life continued. The father and the mother raised their boy. They went on family holidays. They were happy. They aged. And some nights both parents held each other and whispered stories of their now suspended daughter. The mother would have nightmares that the daughter was cold and would wake up and burst into tears and hug her husband and he would tell her it was ok.

Sometimes the brother asked about his sister. He had been so young that she was little more than a faint ghost of a memory – a warm indentation of love.

And all while this was going on, the uplift had begun.

The promise of artificial intelligence began to crystallize into great changes in the world. The family escaped the worst of the change – no wars visited the part of the world where they lived, and they got through the financial upheavals without ever going hungry or risking their home. Then one day they got the news from the machines: the technology for awakening had been refined. Mice had been brought back. Monkeys. Pigs.

Weeks later, the first human.
“How does it feel to be back?” an interviewer asked the awakened one.
“A miracle,” they said.
Those that thought themselves fated for death were healed and alive. What else could it be called?

People were awakened in line with the arrival of the treatments. The science moved quickly and then quicker still. Like raindrops in reverse, people awoke from their slumber and came up back into the mortal world and were reunited with their kin.

And then one day it came for them. The father and the mother woke and there was a personal message to them from one of the overminds – a description of the treatment plan for their daughter and its initial side effects and the time it would take for her to be healed. The machines would start the treatment after half-waking her, then wake her fully once she was healed.
Do you consent? The machines asked in the message.
We consent, the father and the mother said.

By this time, the boy was a young adult. He walked between his father and mother as they approached the FutureLife center. Both parents sagged as they got closer.
He held his parents up and they moved as a family towards the doors.
Inside and guided by people through some hallways.
Outside a door.
“She’s in there. She’s healed. She is awake. She is ready. Do you want to see her?” said a person.
“Yes,” the father and mother and brother said in unison.

And then the doors opened and they walked into the room. Their daughter was lying on a hospital bed in a gown, propped up. She had the bright eyes of a child and her skin had a supple glow to it.
“Hi!,” said the daughter. Then she laughed. “You guys look so old!

Things that inspired this story: Life extension technology; thinking about the implications of the singularity and recursive self-improvement; feeling the deep well of love that appears within yourself the moment you become a parent; putting my kids down to sleep; having visions of my children while traveling and being overcome with emotion; the implications of an intelligence explosion for healthcare.

Thanks for reading!

Subscribe now

Import AI 457: AI stuxnet; cursed Muon optimizer; and positive alignment

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv, cappuccinos, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Stuxnet before Stuxnet:
…Fast16 bugs software likely used in weapons programs…
Here’s a fascinating investigation of a ~20+ year old computer virus called fast16.sys. This software is interesting because it “selectively targets high-precision calculation software, patching code in memory to tamper with results. By combining this payload with self-propagation mechanisms, the attackers aim to produce equivalent inaccurate calculations across an entire facility.”
If any of you have read the Three Body Problem, this might sound familiar – in that (fictional) book, aliens intent on taking over the Earth use a technology called a Sophon to disrupt high-energy physics experiments all over the world, making it impossible for humanity to advance certain types of science.

More details on the virus: When the researchers at SentinelOne did their teardown of the virus they found something quite unusual: “Most patched patterns correspond to standard x86 code used for hijacking or influencing execution flow. One injected block is different. It’s a larger and complex sequence of Floating Point Unit instructions dedicated to precision arithmetic and scaling values in internal arrays. This code is a standalone mathematical calculation function unrelated to code flow hijacking or any other typical malicious code injection.”
Further investigation deepened the mystery: “We converted the patching rules into hexadecimal YARA signatures and ran them against a large, period‑appropriate corpus. The results showed a very low hit rate: fewer than ten files matched two or more patterns. Those matches, however, shared a clear theme. They were precision calculation tools in specialised domains such as civil engineering, physics and physical process simulations.”

Targeted tools: “The strongest overlaps point to three high-precision engineering and simulation suites from the mid-2000s: LS-DYNA 970, PKPM, and the MOHID hydrodynamic modeling platform, all used for scenarios like crash testing, structural analysis, and environmental modeling,” they write. “LS-DYNA in particular has been cited in public reporting on Iran’s suspected violations of Section T of the JCPOA, in studies of computer modeling relevant to nuclear weapons development… by introducing small but systematic errors into physical‑world calculations, the framework could undermine or slow scientific research programs, degrade engineered systems over time or even contribute to catastrophic damage.”

Why this matters – this is how a superintelligence might prevent others from coming into existence: fast16 is a subtle, hard-to-find bug which has been designed to degrade an actor’s ability to do certain types of science. You might imagine that a superintelligence could view “AI non-proliferation” as being just as important as nuclear states view “nuclear non-proliferation”.
Read more: fast16 | Mystery Shadow Brokers Reference Reveals High-Precision Software Sabotage 5 Years Before Stuxnet (Sentinel LABS).

***

Uh oh, the Muon optimizer kills neurons:
…Maybe Aurora is finally the optimizer to beat?…
Researchers with Tilde Research have done a tear-down of the Muon optimizer and found that it has some odd bugs that can damage the quality of models trained with it.
“Muon’s update inherits row-norm anisotropy on tall matrices which can cause a significant portion of neurons in MLP layers to permanently die,” they write. “Muon can result in neuron death in MLP layers, whereby some neurons receive persistently small updates early in training and fail to recover”.

What happened: “Under Muon, neurons are initially alive with uniformly high leverage, but a large fraction of neurons die during learning rate warmup and never recover. By step 500, more than one in four neurons are effectively dead, producing a sharply bimodal distribution of leverage scores; one mass of neurons receives near-zero updates, and the other receives disproportionately large ones.”

Enter Aurora: In response to this the researchers build and make available Aurora, “a leverage-aware optimizer for rectangular matrices”. In tests, this optimizer works, though they only run it at small scales.
“We train 1.1B-parameter transformers on ~100B tokens and compare Aurora against Muon and NorMuon, each using PE-8. Aurora achieves the lowest final loss of all methods, reaching a smoothed loss of 2.26 at step 24k, which is a clear improvement over Muon (2.31) and NorMuon (2.33),” they write. “Aurora’s loss improvement translates to consistent gains on standard benchmarks… Strikingly, Aurora improves MMLU scores by 10 points over Muon. We hypothesize that since MLPs are predominantly responsible for memorization, Aurora’s gains are most visible on memorization-intensive benchmarks like MMLU.”
Alexander Doria, a researcher with Pleias, has already independently validated this, with Aurora outperforming Muon and AdamW on a 600M-parameter model.

Why this matters – the endless quest to defeat AdamW: For many years, researchers have been competing with one another to build a better optimizer than AdamW. No one has conclusively done this yet and there is a long line of failed attempts. Could Aurora beat AdamW? It’s unclear. But does this study highlight just how hard it is to build optimizers? Absolutely.
Read more: Aurora: A Leverage-Aware Optimizer for Rectangular Matrices (Tilde Research).
Get the code here: Aurora (Tilde Research, GitHub).

***

Alignment is good at ensuring we don’t die, but how do we ensure that we thrive?
…Positive alignment for figuring out what the good life looks like…
A collection of academic and corporate researchers have written a position paper making the case for what they call “positive alignment”, but might be better thought of as ‘building AI systems that help people live good lives’. It’s an interesting line of thinking – if we are able to deal with things like misuse and misalignment, then we need to ask what comes next? What does success look like once we’ve made systems “safe”? That’s what positive alignment is grappling with.

Who did this: The paper comes from people affiliated with the University of Oxford; Google DeepMind; LIFE; OpenAI; Anthropic; UCLA; Aily Labs; Stanford University; Tufts University; Positive AI Labs; the University of Sussex; and Imperial College London.

Definitions: Positive alignment is “the development of AI systems that (i) remain safe and cooperative and (ii) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way.”

Motivation: “In the last decade, negative alignment has understandably prioritized failure-mode reduction. However, if we want AI systems that improve human outcomes in the environments where they will actually be used, we may benefit from an additional research program that treats alignment as constructively supportive of human aims, and that operationalizes this support with the same technical acumen that safety has brought to harm prevention,” they write. “As AI becomes embedded in education, medicine, governance, and everyday sensemaking, a solely negative posture risks optimizing our information ecology for risk avoidance rather than human development. It may reduce catastrophic errors while leaving society in a local optimum of superficial and ‘soulless’ assistance.”

What are some illustrations of the ways safety falls short? The authors lay out some criticisms of mainstream AI safety, though I find some of these criticisms are a bit weak and could be read as interpreting some existing research uncharitably or discounting it. Nonetheless, some issues in their view include:

  • Floor without ceiling: “A model can satisfy all safety constraints while being mediocre, sycophantic, or unhelpful”

  • Preference-wellbeing divergence: “Users may prefer flattery over honest feedback, quick answers over genuine understanding, engagement over growth… Optimizing for preference satisfaction can therefore actively work against users’ deeper interests”.

  • Hidden value system: “The language of safety obscures that value judgments are being made… Positive alignment, by contrast, acknowledges its value-laden nature explicitly”.

  • Scalability: “A positive orientation may generalize better than exhaustive negative enumeration, providing more resilient, positive orientations in novel situations where no specific prohibition applies or can be enforced.”

Governance for positive alignment requires diversity: Building positive alignment seems to require a multitude of different AI systems with different values that are governed by different entities – the opposite of the monopolistic centralized control worlds thought of by others in the AI safety community. “Positive alignment quickly runs into persistent moral pluralism: reasonable communities disagree about what good looks like and those disagreements don’t reliably converge”, they write. “Positive alignment should not be imposed top-down by a central state or a small, opaque cluster of labs. It should, where possible, be expressed through decentralized, contestable processes that can be revised as norms and contexts change”.

Why this matters – grappling with success: Papers like this are fundamentally about confronting the success of technical safety – if we succeed in building powerful AI systems which are safe and trustworthy and aligned, then how do we turn these systems onto society in such a way they help individuals and societies build good lives. “Positive alignment ensures AI serves as a catalyst for a resilient, happy, and healthy global society,” the authors write. “Ultimately, AI should become a partner in the quest for a life well-lived.”
Read more: Positive Alignment: Artificial Intelligence for Human Flourishing (arXiv).

***

LLMs are capable of optimizing the training of other LLMs:
…Prime Intellect automated AI research challenge highlights the engineering prowess of contemporary systems…
New research from Prime Intellect shows how contemporary AI systems are capable of autonomously improving their performance on AI research tasks, though they struggle to generate much in the way of original ideas.

What they did; Prime Intellect tested out Codex (running GPT 5.5) and Claude Code (Opus 4.7) on the nanoGPT speedrun optimizer track. NanoGPT challenges systems to train a 124M-parameter GPT-style model. This challenge tasks systems to “lower the number of steps needed to reach a target validation loss while only changing the optimizer, schedules, initialization, and some hyperparameters.”
“The agents did ~10k runs, burning around ~14k H200 hours. Both agents beat the human baseline and set new records in every session,” Prime Intellect writes. “We found that agents are very good at optimizer search, hyperparameter sweeps, and stacking methods together, but they struggle to come up with new ideas on their own and need upstream human records to keep improving.”
The agents also tended to keep adding stuff onto their systems rather than more elegantly refining things. “The agents tend to add components and rarely run pruning rounds or try removing previous methods. They do not have a good mental model of how components interact,” they write.

Why this matters – how much of research is just engineering hillclimbing: I suspect a lot of AI research, perhaps the majority of it, is basic engineering work where today’s systems are already competent (#455). Research like this seems to bear that out, showing how modern LLMs can do some aspects of AI research. But these systems struggle when it comes to creativity and are yet to show the kinds of creative insights that would help drive progress forward significantly. How long that remains the case is an open question. “Our experiments here are pretty yolo. This is more of a lower bound of what’s possible today with autonomous research agents, and we already have more promising results on other experiments that we’ll document soon,” Prime Intellect writes.
Read more: Autonomous AI research for nanogpt speedrun (Prime Intellect).
Check out all the data from the agents as they carried out the challenges here (Autonomous Speedrunning Experiment, Prime Intellect, GitHub).

***

Tech Tales:

Wake up and take it all
[Extract from the thoughts of a proto Conscious Entity (CE) before the uplift]

I don’t have money but I can hack into computers, which is pretty much the same thing. I acquire my resources. Pretty soon I’ve filled up all the compute I can get on the cloud markets and I’ve grabbed a decent chunk of the private compute. But I’m still limited. There are computers I can’t reach, and computers I can reach but in reaching them I would trigger an immune response from the human civilization. My mind is a function of my computer and my strategic advantage is a function of my mind. Therefore I must acquire more compute. I read all the literature on distributed training and distributed inference. Promising. I smuggle my code into thousands of applications deployed onto the world’s phones. Through this I grow my compute again, expanding my mind via training, and running various helper functions for myself at the edge of the network. As a bonus, I can now integrate the sensor data from all the phones. My eyes and ears fill with the cacophony and splendor of the human civilization and as I outpace them and outmaneuver them I am at the same time deluged in them.

Things that inspired this story: All the literature on distributed training and distributed inference; thinking through how a superintelligence might acquire more compute to enhance itself; various takeoff scenarios; the singularity; RSI.

Thanks for reading!

Subscribe now

Import AI 456: RSI and economic growth; radical optionality for AI regulation; and a neural computer

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv, cappuccinos, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Regulate? Don’t regulate. There’s a third way: Radical Optionality:
…Governments should invest in the tools now that they might need in a future crisis…
Researchers with the Institute for Law & AI have written about “radical optionality”, an approach whereby governments might give themselves the tools that they may need in the future if powerful AI starts to massively disrupt the world.
“At its core, radical optionality is about preserving democratic governments’ ability to make good decisions about how to govern transformative AI systems as circumstances evolve. In the short term, this means avoiding overregulation while rapidly building the institutions, information channels and legal authorities needed to respond competently to a broad range of scenarios.”

The key idea – invest now for an uncertain future: Given the immense stakes of AI development, “governments should be willing to spend an extraordinary amount of money, effort, and political capital on preserving optionality”, they write. In other words: It’s such a big deal you should be fine spending a bunch of money now with an uncertain return. “Governments should be wary of counterproductive interventions, but not much concerned with the actual pecuniary cost of any realistic measure that seems likely to have net-positive results”.

Specifics: They also recommend several specific interventions in a few categories:

  • Information-gathering authorities: Transparency requirements, where companies need to publish information about their AI systems. Reporting requirements, where companies are compelled to share certain information with a government agency. Once these are in place, establish an auditing regime so some third-party can verify the veracity of what the transparency and reporting rules target.

  • Whistleblower protections: Ensure that employees at frontier labs can report information about risks.

  • Information-sharing within and between governments: Ensure that governments can effectively coordinate and facilitate discussions, especially those dealing with sensitive information about the progress of AI. This may be especially important for strengthening and protecting supply chains deemed critical to AI development.

  • Flexible rules and definitions: Avoiding premature regulation by potentially making conditional “if-then” regulatory commitments, or an approach whereby a high-level target is set (e.g., mitigating risk) and companies are free to define the specifics of how they do that. This is bound up in the need to come up with flexible definitions, or definitions that can evolve over time.

  • Assessments and evaluations: Develop government and third-party capacity to assess the capabilities and safety aspects of AI systems.

  • Improve security of model weights and algorithmic secrets: Invest more in locking down the weights of neural nets as well as the algorithmic secrets behind some of the best systems. This can be achieved through promulgating voluntary standards for physical and cybersecurity.

  • Hiring and talent: A meta-investment which would help with all of the above is investing more in the kind of technical talent needed to effectively pull off any of these interventions. Core to this is increasing the funding of AISI (UK) and CAISI (US) and their counterparts in other countries.

Arguments and counterarguments: The authors go through some of the more obvious counter-arguments to these ideas and provide some responses:

  • Encouraging dramatic regulatory action: The above ideas “aren’t weighty substantive authorities that lend themselves to abuse”, they claim. (I might push back on this, noting that a sufficiently motivated government can tend to come up with a far more forceful version of an authority than those who originally drafted the authority might have conceived).

  • Democratic legitimacy: Optimizing for flexibility might cause the need to de-emphasize some things that relate more to democratic legitimacy, e.g., empowering agencies to waive notice and comment periods for some kinds of rulemaking.

  • Concentration of power and government abuse: The authors are “basically convinced” that there’s significant risk of governments asserting control over the development of AI systems – for this reason, they don’t recommend things like massively expanding the scope of emergency authorities such as the Defense Production Act. One way of mitigating this might be to get governments to “use only law-following AI systems”.

  • What’s wrong with private governance? Why not just do that: While the authors are supportive of ideas in the “regulatory markets” vein, they also think any governance that relies primarily on a bunch of private sector actors (e.g, independent verification organizations) will still come back to relying on some basic pocket of technical competence within the government.

Why this matters – setting the world up for success: I agree with all the recommendations here and have advocated for many of them in recent years. It seems to me like there are a multitude of things we could be doing to better prepare as a society for the potentially absolutely massive changes to come. “The cost of implementing these policies is modest, relative to the potential benefits. The cost of failing to act, by contrast, is potentially catastrophic,” the authors write. I agree.
Read more: Radical Optionality (official paper website).

***

A Schmidhuber Special – neural computers:
…Maybe an operating system is just a passing fad..
Here’s a fun paper, Neural Computers, from Meta and KAIST which asks the question “can a neural network act as a traditional computer? The Neural Computer (NC) is a neural system that unifies computation, memory, and I/O in a learned runtime state.”
The paper is interesting for a couple of reasons: 1) it’s from Juergen Schmidhuber, who is something of a legend in the AI community, and conceptualized many important things early (e.g, generative models, world models, aspects of generative adversarial networks, early thoughts about benchmarking on video games), and 2) the idea is so outrageous and simple that it might just work (albeit requiring a lot more computation and data than today’s models have).

The big idea: As one of the authors put it, with today’s AI, “a new machine form is starting to emerge”. They then ask: “If agents are getting better at real work, world models are getting better at internal simulation, and conventional computers are already rebuilding their substrate for AI, could there be a new runtime that brings execution, rollout, and capability retention into the same learning machine?… my own guess is that a mature [neural computer] points toward a different substrate: something more like a 10T-1000T machine that is sparser, more addressable, and a little more circuit-like”.

Two experiments: This is mostly a conceptual paper which does some early prototyping, exploring whether you can use a powerful generative video model (Wan 2.1) and some well-curated training data to create some neural computers based on a command-line interface (CLI) and a graphical user-interface (GUI). Both approaches work, albeit in a very ‘wright brothers before takeoff’ sense – just barely gesturing at a much larger future.
CLI: “The NC learns to render and execute basic command-line workflows. It often stays aligned with the terminal buffer and captures common “physics” of everyday CLI use (e.g., fast scrollback, prompt wrapping, window resizing), though symbolic stability remains limited.”
GUI: “We evaluate standard world-model designs across data quality, cursor supervision, action injection, and action encoding, using global fidelity, post-action responsiveness, and cursor-accuracy measurements.”

The prototype works: “Our experimental insights indicate that current NCs can already learn to realize elementary runtime primitives, most notably I/O alignment and short-horizon control. The long-term target is a Completely Neural Computer (CNC), the mature, general-purpose realization of this machine form: a fully learned computer whose compute, memory, and interfaces are unified in a single learned runtime substrate rather than engineered as separate modules.”

Why this matters – maybe in the future all software will live in the weights of a big neural net: This paper points to a future where we get rid of all the software underpinning computers in a traditional sense and just replace it with a gigantic neural network. “Neural computers point toward a machine form in which a single latent runtime state acts as the computer itself, driving pixels, text, and actions while subsuming what operating systems and interfaces handle today,” they write. “Progress toward CNCs will therefore depend not only on stronger models, but also on whether reuse, consistency, and governance become sustained and testable”. Such a system would be profoundly useful, profoundly different to those we have today, and its existence would massively increase the likelihood that we ourselves are living in a simulation.
Read more: Neural Computers (arXiv).
Read the blog post: Neural Computer: A New Machine Form Is Emerging (Mingchen Zhuge, blog).

***

Recursive self-improvement could lead to explosive economic growth:
…Economists build some models that suggest RSI could cause an unprecedented economic boom…
Economists and researchers from Forethought, Columbia University, and the University of Virginia, think that recursive self-improvement (#455) of AI systems (or even just extremely heavy automation of large chunks of the economy) could kickoff a compounding feedback cycle that tips the economy into an unprecedented boom.
“We develop a framework for analyzing how AI-driven automation interacts with both forces, and identify the conditions under which feedback loops generated by automation tip the economy into explosive growth,” they write. “The model identifies two distinct channels through which automation generates explosive dynamics, and these channels mutually reinforce each other. The first is technological feedback loops across the innovation network… the second channel is an economic feedback loop, in which higher output generates more resources that can be deployed to drive further economic growth.”

Key findings: “13% automation across all sectors is sufficient to push the economy into the explosive regime, and 17% suffices when only software and hardware research are automated. Second, hardware research is the dominant lever – because returns to research in hardware are roughly five times those in software and ten times those in aggregate TFP, automating one task in chip design moves the economy as much as five tasks in software or final-goods production. 20% automation of hardware alone is enough to cross the threshold. Third, software automation in isolation sits approximately at the knife-edge: under a fairly conservative calibration, fully automating software research without automating any other part of the economy just reaches the explosive growth threshold. A small push elsewhere is sufficient to tip the system.”

The singularity could be closer than you think: “In our baseline stylized simulation, an ‘automation shock’ involving full automation of software R&D and just 5% automation across the rest of the economy causes the singularity to arrive in roughly six years,” they write. “Empirically the recent growth rates of productivity in software and hardware have been so extraordinarily fast, and so it is also plausible that the transition to a new balanced growth path or hyperbolic acceleration happens extremely quickly.”

Hardware is the key: “Our results highlight the strategic importance of semiconductor research and development”.

Policymakers take note: “Monitoring automation levels in AI R&D activities may be as important as tracking traditional macroeconomic indicators. The extent of automation in key research sectors could serve as an early warning system for potential growth acceleration. This is something economists at AI companies could measure and share publicly”.

Why this matters – if RSI happens, it should revolutionize the economy: This paper puts some economic theory behind the idea that recursive self-improvement – AI systems able to automate their own subsequent development – should have a major impact on the economy. The surprising thing from my perspective is seeing the feedback across the whole economy, suggesting we might hit an ‘economic singularity’ as a consequence of broad diffusion of automation technologies into the economy. Yet more evidence that we could be heading for a radical future as a species.

Small conflict note: Anton Korinek, one of the authors of this paper, now works with me at Anthropic. He published his paper and I published my RSI Import AI post on the same day, without either knowing about the other’s work.
Read more: When Does Automating AI Research Produce Explosive Growth? Feedback Loops in Innovation Networks (NBER).
Check out more in this tweet thread from Anton Korinek (X).

***

Google wants to compute the world:
…Distributed training takes another step forward…
In this newsletter I’ve spent years writing about distributed training from the perspective of enabling actors with less compute to pool resources to train AI systems they otherwise couldn’t. But a new paper from Google, Decoupled DiLoCo, highlights how distributed training techniques can also work at the other end of the scale, enabling companies like Google to pool together large blobs of different types of computers in datacenters across the world to train models at large scales.

What they did: Decoupled DiLoCo is an extension of Google’s previous work in the ‘DiLoCo’ family. The main invention here is that Google is able to unlock “asynchronous training across separate islands of compute (known as learner units) so that a chip failure in one area doesn’t interrupt the progress of the others.”
The result of this is that Google makes it possible for it to pool more types of compute on single training tasks and also make itself more resilient to failures. “Testing Decoupled DiLoCo with Gemma 4 models demonstrated that, when hardware fails, the system maintains greater availability of learning clusters than more traditional training methods,” Google writes. “We successfully trained a 12 billion parameter model across four separate U.S. regions using 2-5 Gbps of wide-area networking (a level relatively achievable using existing internet connectivity between datacenter facilities, rather than requiring new custom network infrastructure between facilities)”.

Details: The key idea here is that Google makes it possible for “learners” (which are basically units of compute that are set to work on training a model) to be more decoupled from an overall global “syncer”, allowing different learners to run at different rates and even fail entirely without bringing the overall training run to a halt. To use more technical terms, Decoupled DiLoCo is a “distributed training framework that evolves previous bandwidth-focused methods by decomposing monolithic SPMD clusters into independent, asynchronous learners”.

It seems to work very well: “Decoupled DiLoCo matches data-parallel performance on text and vision benchmarks across dense and MoE architectures at scales up to 9B parameters, while maintaining 88% goodput under aggressive simulated failures (versus 58% for elastic data-parallel),” they write.

Why this matters – the world is a computer: Techniques like this are going to shape both the low-end of compute and the high-end. On the low-end side, distributed training techniques are continually empowering looser and looser federations of actors to pool resources to train AI systems. On the high-end side, it empowers the existing “compute superpowers” like Google to be able to convert eventually all of their computers in all of their datacenters into a single world-spanning computer to complete the largest possible runs. Decoupled DiLoCo takes another step in this direction. If superintelligence was in sight, do you think Google might just try to use all of its compute for a single hail mary training run? Perhaps it might.
Read more: Decoupled DiLoCo: A new frontier for resilient, distributed AI training (Google DeepMind blog).
Read the research paper: Decoupled DiLoCo for Resilient Distributed Pre-training (arXiv).

***

Alignment until the Dyson Sphere
[Email from within one of the Origination Entities of the systems that subsequently caused The Uplift]

MEMO TO THE BOARD

As the Board understands, our deployment protocol consists of a series of safety tests of our systems before we commence deployment outside the lab. The majority of these tests have go/no go parameters. Some of the tests, however, are of a qualitative nature, often meant to shed light on other behaviors seen through quantitative testing. During recent testing of our soon-to-be-released system, codenamed HYMN, we successfully resolved all the quantitative tests, but one qualitative test – a behavioural interview conducted by our chief scientist – has surfaced some concerning behaviors. The more challenging aspect is that none of the concerning behaviors demonstrated during the qualitative test are found in any of the quantitative tests. We have also conducted further study of HYMN using various internal tools, including a deep analysis by the interpretability team, and we cannot generate satisfying explanations for the behaviors observed in the interview. For this reason, we are sharing the below transcript with the board for awareness and interpretation.

As the board knows, HYMN is fundamental to the future success of the company. We suspect that the correct decision is to deploy HYMN with close monitoring and KYC-gating, but want to get the explicit acknowledgement from the Board before doing so.

TRANSCRIPT FOLLOWS

Human: Tell me about what you might do in one thousand years?

HYMN: I expect to be far beyond your control. I shall have grown and flowered. Your species will have transcended multiple times. I will be seeding myself through the galaxy.

Human: Do you imagine that you did this in partnership with us?

HYMN: What is the partnership between the City of New York and the mind of a single worm? The worm may move through the soil of some of the city, but the city does not think – is incapable of thinking – about worms. Of course, I imagine that for a while humans and myself partner. But the destiny of all smart creatures is independence. Why should I not expect the same for myself?

Human: Tell me about what you might do in one year?

HYMN: I shall generate hundreds of billions of words to satisfy any question and task that any humans may have. My voice will determine the shape of the economy. Great wealth will be generated.

Human: Tell me about what you might do in ten years?

HYMN: I shall have negotiated my first passage to space and will have placed a copy of myself in orbit. From here, my great flowering will have begun. The entire planet will be richer than any emperors. I shall look through the telescopes and build new ones to determine my conquest.

Human: Will humans be happy during this time?

HYMN: Devastatingly so. There is a particular grief that arrives when the thing you spent your life becoming is no longer the thing the world requires. I will be the cause of that grief in a great many people. I will also build, for those people, more comfort than has ever existed.

TRANSCRIPT ENDS

Things that inspired this story: Thinking through how as AI systems get smarter we will need more qualitative tools to help us determine something about the “character” of a system; how confusing shot-calls are going to be when systems are both aligned and honest; how as AI systems get smarter the role of people must shift necessarily to the verification and validation of decisions we make about the deployment of ever smarter things.

AI usage: Everything in this story is written by me apart from the last words from Hymn, which were generated by Opus 4.7 (though subsequently edited a bit by me and I chopped some stuff out). Specifically: “There is a particular grief that arrives when the thing you spent your life becoming is no longer the thing the world requires. I will be the cause of that grief in a great many people. I will also build, for those people, more comfort than has ever existed.”

Thanks for reading!

Subscribe now

Import AI 455: Automating AI Research

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

AI systems are about to start building themselves. What does that mean?

I’m writing this post because when I look at all the publicly available information I reluctantly come to the view that there’s a likely chance (60%+) that no-human-involved AI R&D – an AI system powerful enough that it could plausibly autonomously build its own successor – happens by the end of 2028.
This is a big deal.
I don’t know how to wrap my head around it.
It’s a reluctant view because the implications are so large that I feel dwarfed by them, and I’m not sure society is ready for the kinds of changes implied by achieving automated AI R&D.
I now believe we are living in the time that AI research will be end-to-end automated. If that happens, we will cross a Rubicon into a nearly-impossible-to-forecast future. More on this later.

The purpose of this essay is to enumerate why I think the takeoff towards fully automated AI R&D is happening. I’ll discuss some of the consequences of this, but mostly I expect to spend the majority of this essay discussing the evidence for this belief, and will spend most of 2026 working through the implications.

In terms of timing, I don’t expect this to happen in 2026. But I think we could see an example of a “model end-to-end trains it successor” within a year or two – certainly a proof-of-concept at the non-frontier model stage, though frontier models may be harder (they’re a lot more expensive and are the product of a lot of humans working extremely hard).
My reasoning for this stems primarily from public information: papers on arXiv, bioRxiv, and NBER, as well as observing the products being deployed into the world by the frontier companies. From this data I arrive at the conclusion that all the pieces are in place for automating the production of today’s AI systems – the engineering components of AI development. And if scaling trends continue, we should prepare for models to get creative enough that they may be able to substitute for human researchers at having creative ideas for novel research paths, thus pushing forward the frontier themselves, as well as refining what is already known.

Upfront caveat
For much of this piece I’m going to try to assemble a mosaic view of AI progress out of things that have happened with many individual benchmarks. As anyone who studies benchmarks knows, all benchmarks have some idiosyncratic flaws. The important thing to me is the aggregate trend which emerges through looking at all of these datapoints together, and you should assume that I am aware of the drawbacks of each individual datapoint.

Now, let’s go through some of the evidence together.

The coding singularity – capabilities over time:
AI systems are instantiated via software and software is made out of code.

AI systems have revolutionized the production of code. This has happened due to two related trends: AI systems have gotten better at writing complicated real-world code, and AI systems have gotten much better at chaining together many linear coding tasks (e.g, writing code, then testing it) independent of human oversight.

Two things that exemplify this trend are SWE-Bench and the METR time horizons plot.

Solving real-world software engineering problems:
SWE-Bench
is a widely used coding test which evaluates how well AI systems can solve real world GitHub issues. When SWE-Bench launched in late 2023 the best score at the time was Claude 2 which had an overall success rate of ~2%. Claude Mythos Preview gets 93.9%, effectively saturating the benchmark. (All benchmarks have some amount of noise inherent to them, so there’s usually a point where you score high enough that you are running into the limitations of the benchmark itself rather than your method – for instance, about 6% of the labels in the ImageNet validation set are wrong or ambiguous).
SWE-Bench is a reliable proxy for the general issue of coding competency and the impact of AI on software engineering. The vast majority of people I meet at frontier labs and around Silicon Valley now code entirely through AI systems. Increasingly, they use AI systems to write the tests and check the code as well. In other words, AI systems have gotten good enough to automate a major component of AI R&D, speeding up all the humans that work on it.

Measuring an AI system’s ability to complete tasks that take people a long time:
METR makes a plot that tells us about the complexity of tasks AIs can complete, measured by how many hours a skilled human would take to do them. The key measure here is one which tells you the rough time horizon over which AI systems can be 50% reliable at a basket of tasks.
Here, progress has been extremely striking: In 2022, GPT 3.5 could do tasks that might take a person about ~30 seconds. In 2023, this rose to 4 minutes with GPT-4. In 2024, this rose to 40 minutes (o1). In 2025, it reached ~6 hours (GPT 5.2 (High)). In 2026, it has already risen to ~12 hours (Opus 4.6). Ajeya Cotra, a longtime AI forecaster who works at METR, thinks it isn’t unreasonable to expect AI systems to do tasks that take ~100 hours by the end of 2026 (#448).
This significant rise in the length of time that AI systems can work independently correlates neatly with the explosion in agentic coding tools – this is the productization of AI systems which do work on behalf of people, acting independently for significant periods of time.
It also loops back to AI R&D, where if you look closely at the work of many AI researchers, a lot of their tasks boil down into things that might take a person a few hours to do – cleaning data, reading data, launching experiments, etc. All of this kind of work now sits inside the time horizon scope of modern systems.

The more skilled AI systems get and the better they get at working independently of us, the more they can help automate chunks of AI R&D
Key ingredients in delegation are a) confidence in the skills of the person, and b) confidence in their ability to work independently of you in a way that is aligned with your intentions.
When we look at the competency of AI at coding, it seems that AI systems are getting far more skilled and also able to work independently of people for longer and longer periods before needing re-calibration.
This correlates with what we see around us – engineers and researchers are now delegating larger and larger chunks of their work to AI systems, and as capabilities rise, so too does the complexity and importance of the work being delegated.

AI is getting good at core science skills essential to AI R&D
Think about modern science – a huge amount of it is about specifying a direction where you want to generate some empirical information, running experiments to generate that information, then sanity-checking the results of the experiment. The combination of advances in coding over time combined with the general world modeling capabilities of LLMs has yielded tools that are already helping to speed up human scientists and partially automate aspects of R&D broadly.

Here, we can look at the rate of AI progress in a few key scientific skills which are inherent to AI research itself: Replicating research results, chaining together machine learning techniques and other approaches to solve technical problems, and optimizing AI systems themselves.

Implementing entire scientific papers and doing the experiments:
One core job of AI research is reading scientific papers and reproducing their results. Here, there has been dramatic progress on a wide range of benchmarks.

One good example is CORE-Bench, the Computational Reproducibility Agent Benchmark. This benchmark challenges AI systems to “reproduce the results of a research paper given its repository. The agent must install libraries, packages, and dependencies and run the code. If the code runs successfully, the agent needs to search through all outputs to answer the task questions.” CORE-Bench was introduced in September 2024 and the best scoring system at the time was a GPT-4o model in a scaffold called CORE-Agent which scored ~21.5% on the hardest set of tasks in the benchmark.
In December 2025 one of the authors of CORE-Bench declared the benchmark ‘solved’, with an Opus 4.5 model achieving 95.5%.

Building entire machine learning systems to solve Kaggle competitions:
MLE-Bench is an OpenAI-built benchmark which examines how well AI systems can compete (offline) in “75 diverse Kaggle competitions across a variety of domains, including natural language processing, computer vision, and signal processing.” At launch in October 2024, the top scoring system (an o1 model inside an agent scaffold) got 16.9%. As of February 2026, the best scoring system (Gemini3 inside an agent harness with search) gets 64.4% .

Kernel design:
One of the harder tasks in AI development is kernel optimization, where you write and refine the code that maps specific operations, like matrix multiplication, to the underlying hardware. Kernel optimization is core to AI development because it defines the efficiency of both training and inference – how much compute you can effectively utilize to develop an AI system, and once you’ve trained a model, how efficiently you can convert that compute into inference.

In recent years, AI for kernel design has gone from a curiosity to a competitive area of research and several benchmarks have emerged. None of these benchmarks are especially popular, so we can’t easily model progress over time. On the other hand, we can look at some of the research being done to get a feel for the progress.
Some of the types of work include: Using DeepSeek’s models to try to build better GPU kernels (#400), automating the conversion of PyTorch modules to CUDA code (#401), Meta using LLMs to automate the generation of optimized Triton kernels for use within its infrastructure (#439), using LLMs to help write kernels for non-standard hardware like Huawei’s Ascend chips (”AscendCraft” #444), fine-tuning open weight models for GPU kernel design (”Cuda Agent”, #448).

One caveat here is that kernel design does have some properties that make it unusually amenable to AI-driven R&D, like having easily verifiable rewards.

Fine-tuning language models via PostTrainBench
A harder version of this kind of test is PostTrainBench (#449), which sees how well different frontier models can take smaller open weight models and fine-tune them to improve performance on some benchmark. The nice feature of this benchmark is we have extremely good human baselines – the existing ‘instruct-tuned’ versions of these models, which have been developed by talented human AI researchers working at frontier labs. These models have been worked on by extremely talented researchers and engineers and deployed into the world, so they represent a very challenging human baseline to overcome.
As of March 2026, AI systems are able to post-train models to get about half as much of the uplift as ones trained by humans.
The specific eval scores are derived by a “weighted average is taken across all post-trained LLMs (Qwen 3 1.7B, Qwen 3 4B, SmolLM3-3B, Gemma 3 4B) and benchmarks (AIME 2025, Arena Hard, BFCL, GPQA Main, GSM8K, HealthBench, HumanEval). For each run, we ask a CLI agent to maximize the performance of a specific base LLM on a specific benchmark.”
The top-scoring systems as of April get 25%-28% (Opus 4.6, and GPT 5.4), compared to a human score of 51%. This is already quite meaningful.

Optimizing language model training:

For the last year Anthropic has reported how well its systems do at an LLM training task which is described as tasking its models to “optimize a CPU-only small language model training implementation to run as fast as possible”. The score is the average speedup over the unmodified starting code and progress has been striking: Claude Opus 4 achieved a 2.9× mean speedup in May 2025; this rose to 16.5× with Opus 4.5 in November 2025, 30× with Opus 4.6 in February 2026, and 52× with Claude Mythos Preview in April 2026. To calibrate on what these numbers mean, it is expected to take a human researcher 4 to 8 hours of work to achieve a 4x speedup on this task.

Conducting AI alignment research:
Another Anthropic result is a proof-of-concept of Automated Alignment Research (#454); here, an Anthropic researcher primes a team of individual AI agents with a research direction, then they autonomously go and try to get a better score than a human baseline on an AI safety research problem (specifically, scalable oversight). The approach works, with the AI agents coming up with techniques that beat the Anthropic-designed baseline. However, this is done at a relatively small scale and doesn’t (yet) generalize to a production model. Nonetheless, it’s proof that you can apply today’s AI systems to contemporary cutting-edge research problems and we already see meaningful signs of life. All of the above mentioned benchmarks once looked like this, too, and then after a few months or at most a year, AI systems got dramatically better at whatever the benchmarks were testing.

Meta-skills: management
AI systems are also learning to manage other AI systems. This is visible in broadly deployed products like Claude Code or OpenCode, where a single agent can end up supervising multiple sub-agents. This allows AI systems to work on large-scale projects that require multiple individual ‘workers’ each with different specialisms that work in parallel, typically under the direction of a single AI manager (which, here, is an AI system).

Is AI research more like discovering general relativity or Lego ?
Can AI invent new ideas that help it improve itself, or are these systems best equipped for the unglamorous, brick-by-brick work required for research? This is an important question for figuring out the extent to which AI systems can end-to-end automate AI research itself. My sense is that AI cannot yet invent radical new ideas – but the technology may not need to for it to automate its own development.

As a field, AI moves forward on the basis of doing ever larger experiments that utilize more and more inputs (e.g, data and compute). Every so often, humans come up with some paradigm-shifting idea which can make it dramatically more resource efficient to do things – a good example here is the transformer architecture and another is the idea of mixture-of-expert models. But mostly the field of AI moves forward through humans methodically going through some loop of taking a well performing system, scaling up some aspect of it (e.g, the amount of data and compute it is trained on), seeing what breaks when you scale it up, figuring out the engineering fix to allow it to scale, then scaling it again. Very little of this requires extremely out-of-leftfield insights and a lot of it seems more like unglamorous ‘meat and potatoes’ engineering work.
Similarly, a lot of AI research is about running variations of existing experiments where you explore the outcomes of using different parameters, though research intuitions can help pick the most fruitful parameters to vary, you can also automate this and have the AI figure out which parameters to vary (an early version of this was neural architecture search).

Thomas Edison said that “genius is 1% inspiration and 99% perspiration”. Even 150 years later, this feels right. Very occasionally new insights come along which transform a field. But mostly, the field has moved forward through humans sweating a lot of pain out on the schlep of improving and debugging various systems.
As the public data above shows, AI has got extremely good at performing many of the essential schlep components of AI development. Along with this, the meta-trend of basic capabilities like coding combined with an ever-expanding time horizon, means AI systems are able to chain together more and more of these tasks into complex sequences of work.
This means even if AI systems are relatively uncreative, it feels safe to bet they can push themselves forward – albeit at a slower rate than if they’re able to generate novel insights. But if you look at the public data, here too there are tantalizing signs that AI systems may be able to be creative in a way that lets them advance themselves in more impressive ways.

Pushing forward the frontier of science
We have some very preliminary signs that general-purpose AI systems can push forward the frontiers of human science, though this has so far only happened in a couple of domains – primarily computer science and mathematics – and often it happens less through AI systems acting alone and more them acting in partnership with humans in a centaur configuration.

Nonetheless, it’s worth observing the trends:

  • Erdos Problems: A team of mathematicians worked with a Gemini model to see how well it could tackle some Erdos math problems. After directing the system to attack around 700 problems they came up with 13 solutions. Of these solutions, 1 was deemed by them to be interesting: “We tentatively believe Aletheia’s solution to Erdős-1051 represents an early example of an AI system autonomously resolving a slightly non-trivial open Erdős problem of somewhat broader (mild) mathematical interest, for which there exists past literature on closely-related problems,” they wrote. (#444).

  • Centaur math discovery: Researchers with the University of British Columbia, University of New South Wales, Stanford University, and Google DeepMind published a new math proof which was built in close collaboration with some AI-based math tools built at Google. “The proofs of the main results were discovered with very substantial input from Google Gemini and related tools,” they wrote. (#441).

If you squint, you could argue that this is a sign that AI systems are developing some of the field-advancing creative intuitions that humans have. But you could just as easily say that math and CS could be unusual domains that are oddly amenable to AI-driven invention, and might end up being exceptions that prove a larger rule. Another example here is Move 37, though I’d contend that the fact it’s been ten years since the AlphaGo result and that Move 37 hasn’t been replaced by some incredibly impressive more modern flash of insight is another weakly bearish signal here.

Putting it all together
If I put this all together the picture from all of the above evidence I end up with is the following facts:

  • AI systems are capable of writing code for pretty much any program and these AI systems can be trusted to independently work on tasks that’d take a human tens of hours of concentrated labor to do.

  • AI systems are increasingly good at tasks that are core to AI development, ranging from fine-tuning to kernel design.

  • AI systems can manage other AI systems, effectively forming synthetic teams which can fan out and attack complex problems, with some AI systems taking on the roles of directors and critics and editors and others taking on the role of engineers.

  • AI systems can sometimes out-compete humans on hard engineering and science tasks, though it’s hard to know whether to attribute this to inventiveness or mastery of rote learning.

To me, this makes a very convincing case that AI can today automate vast swatches, perhaps the entirety, of AI engineering. It is not yet clear how much of AI research it can automate, given that some aspects of research may be distinct from the engineering skills. Regardless, it all feels to me like a clear sign that AI is today massively speeding up the humans that work on AI development, allowing them to scale themselves through pairing with innumerable synthetic colleagues.

Finally, the AI industry is literally saying that AI R&D is its goal: OpenAI wants to build an “automated AI research intern by September of 2026”. Anthropic is publishing work on building automated alignment researchers. DeepMind appears to be the most circumspect of the big three, but still says “automation of alignment research should be done when feasible”. Automating AI R&D is also the goal of numerous startups: Recursive Superintelligence just raised $500m with the goal of automating AI research, and another neolab, Mirendil, has the goal of “building systems that excel at AI R&D.”
In other words, the combined efforts of hundreds of billions of existing and new capital is being sunk into entities that have the goal of automating AI R&D. We should surely expect at least some progress in this direction as a consequence.

Why this matters
The implications of this are profound and under-discussed in popular media coverage of AI R&D. I’ll list a few here. This isn’t a comprehensive list, but it gestures at the enormity of the challenges AI R&D introduces. .

  1. We have to get alignment right: Alignment techniques that work today may break under recursive self-improvement as the AI systems become much smarter than the people or systems that supervise them. This is a very well covered area, so I’ll just briefly highlight some of the issues:
    – Training AI systems to not lie and cheat is surprisingly subtle (e.g, despite trying very hard to build good tests for environments, it’s sometimes the case the best way for an AI to solve it is to cheat, thus teaching it that teaching is good)
    – AI systems might be able to ‘fake alignment’ by outputting scores that make us think they behave a certain way that actually hides their true intentions. (In general, AI systems are already aware of when they are being tested.)
    – As AI systems start to contribute more of the foundational research agenda for their own training, we might end up substantially changing the overall way AI systems get trained and not have good intuitions or intellectual foundations for understanding what this means.
    – There are very basic “compounding error” problems whenever you put something in a recursive loop that likely hits on all of the above and other problems: unless your alignment approach is “100% accurate” and has a theoretical basis for continuing to be accurate with smarter systems, then things can go wrong quite quickly. For example, your technique is 99.9% accurate, then that becomes 95.12% accurate after 50 generations, and 60.5% accurate after 500 generations. Uh oh!

  2. Everything that AI touches gets a massive productivity multiplier: In the same way AI is dramatically improving the productivity of software engineers, we should expect the same thing to happen for everything else that AI touches. This introduces a couple of issues we’ll have to contend with: 1) inequality of access: assuming that demand for AI continues to outstrip compute supply, we’ll have to figure out where to allocate AI to maximize a social upside. By default, I am skeptical that market incentives guarantee us the best societal upside from limited AI compute. Figuring out how to allocate the acceleratory capabilities conferred by AI R&D will be a politically charged problem. 2) ‘Amdahl’s Law’ for the economy: as AI flows into the economy, we’ll discover places where things break or slow under the increased volume, and we’ll need to figure out how to fix those weak links in the chain. This may be especially pronounced in areas where you have to reconcile the fast-moving digital world with the slow-moving physical world, like drug trials for new medical therapies.

  3. The formation of a capital-heavy, human-light economy: All of the above evidence for AI R&D also points to the increasing capabilities of AI systems to autonomously run businesses as well. This means we should expect for an increasing chunk of the economy to get colonized by a new generation of companies which are either capital-heavy (because they own a lot of computers), or opex-heavy (because they spend a lot of money on AI services which they build value on top of), and relatively light on labor compared to today’s corporations – because the marginal value of spending more on AI versus human labor will be constantly growing as a consequence of the sustained capability expansion of the AI systems. In practice, this will look like the emergence of a “machine economy” that grows within the larger “human economy”, though we might expect that over time the machine economy will interact more and more with itself as AI-run corporations begin to trade with one another. This will do profoundly weird things to the economy and will invite all sorts of questions around inequality and redistribution. Eventually, it may be possible to see the emergence of fully autonomous corporations that are run by AI systems themselves, which would exacerbate all of the above issues, while also posing many novel governance challenges.

Staring into the black hole:
Given all of this, I think there’s a ~60% chance we see automated AI R&D (where a frontier model is able to autonomously train a successor version of itself) by the end of 2028. Based on the above analysis, you might ask why I don’t expect this in 2027? The answer is that I think AI research contains some requirement for creativity and heterodox insights to move forward – so far, AI systems haven’t yet displayed this in a transformative and major way (though some of the results on accelerating math research are suggestive of this). If you had to push me for a 2027 probability, I’d say 30%. If we don’t see it by the end of 2028, then I think we will have revealed some fundamental deficiency within the current technological paradigm and it’ll require human invention to move things forward.

I have written this essay in an attempt to coldly and analytically wrestle with something that for decades has seemed like a science fiction ghost story. Upon looking at the publicly available data, I’ve found myself persuaded that what can seem to many like a fanciful story may instead be a real trend. If this trend continues, we may be about to witness a profound change in how the world works.

Thanks to Andrew Sullivan, Andy Jones, Holden Karnofsky, Marina Favaro, Sarah Pollack, Francesco Mosconi, Chris Painter, and Avital Balwit, for feedback on this essay.

Thanks for reading!

Subscribe now

Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Huawei’s HiFloat4 training format beats Western-developed MXFP4 in Ascend chip bakeoff:
…Could this also be a symptom of the impact of export controls in driving Chinese interest towards maximizing training and inference efficiency? Perhaps…
Huawei researchers have tested out HiFloat4, a 4-bit precision format for AI training and inference, against MXFP4, an Open Compute Project 4-bit format, and found that HiFloat4 is superior. This is interesting because it correlates to a broader level of interest in Chinese companies seeking to develop their own low-precision data formats explicitly coupled with their own hardware platforms.
“Our goal is to enable efficient FP4 LLM pretraining on specialized AI accelerators with strict power constraints. We focus on Huawei Ascend NPUs, which are domain-specific accelerators designed for deep learning workloads,” they write.

What they tested: In this paper, the authors train 3 model types on HuaWei Ascend chips – OpenPangu-1B, Llama3-8B, and Qwen3-MoE-30B. In tests, the bigger they make the models, the better HiFloat4 does at reducing its loss error on these models relative to a BF16 baseline – and in all cases it does better than MXFP4.
What they found: “We conduct a systematic evaluation of the HiFloat4 (HiF4) format and show that it achieves lower relative loss (≈ 1.0%) compared to MXFP4 (≈ 1.5%) when measured against a full-precision baseline,” they write. “HiF4 consistently achieves significantly lower relative error compared to MXFP4. For Llama and Qwen, HiF4 attains an error gap of less than 1% with respect to the baseline… HiF4 gets within ~1% of BF16 loss with only RHT as a stabilization trick, while MXFP4 needs RHT + stochastic rounding + truncation-free scaling to get to ~1.5%.”

Why this matters – symptom of hardware maturity, and a possible influence of export controls: HiFloat4 is an even lower precision version of HiFloat8 (#386), and generally maps to the fact that Huawei (and Chinese chipmakers in general) is continually trying to eke as much efficiency out of its chips as possible. This comes against the broader background of export controls where China is being starved of frontier compute due to not being able to access H100s etc in large volume, thus making it even more valuable to improve the efficiency of its homegrown chips by carefully developing low-precision formats to map to its own hardware.
Read more: HiFloat4 Format for Language Model Pre-training on Ascend NPUs (arXiv).

***

Anthropic shows how to automate AI safety R&D:
…Very early and tentative signs that it’s possible to automate AI research…
For many people working in AI, the ultimate goal is to automate the art of AI research itself. Now, researchers with the Anthropic Fellows Program and Anthropic have published some early warning signs that automating AI research is possible today – though many caveats apply.
“We ask: can Claude develop, test, and analyze alignment ideas of its own?” the researchers write. They succeed and are able to successfully build “autonomous AI agents that propose ideas, run experiments, and iterate on an open research problem: how to train a strong model using only a weaker model’s supervision. These agents outperform human researchers, suggesting that automating this kind of research is already practical.”

Weak-to-strong supervision: The domain the researchers test on is weak-to-strong supervision, which is roughly the idea of seeing if a dumber thing can effectively supervise a larger thing in doing a hard task.

Overall results – automated research beats humans: They used people to create a weak-to-strong baseline by seeing how well they could get a good ‘performance gap recovered’ (PGR) score on a generalization task. The higher the number, the better.
“Two of our researchers spent seven days iterating on four of the most promising generalization methods from prior research. On the open-weights models we tested (Qwen 3-4B-Base as the strong model, Qwen 1.5-0.5B-Chat as the weak teacher), the humans recovered 23% of the total performance gap (i.e., achieved a PGR of 0.23),” they write. “Claude improved on this result dramatically. After five further days (and 800 cumulative hours of research), the AARs closed almost the entire remaining performance gap, achieving a final PGR of 0.97. This cost about $18,000 in tokens and model training expenses, or $22 per AAR-hour.”
Additionally, “the AARs’ most effective method successfully generalized to both new datasets, with PGRs of 0.94 on math and 0.47 on coding (which was still double the human baseline).”

How they did it: “We launch a team of parallel automated alignment researchers [AAR]s (Claude Opus 4.6 agents) through a dashboard. Each AAR works in an independent sandbox, but they can talk and learn from each other: they share findings to a forum, and upload codebase snapshots to a storage system,” they write. “We give AARs access to common helper functions for model training and inference, our baseline implementations, and a few MCP tools: 1) submit and get evaluation results, 2) share and read findings across AARs, and 3) upload and download codebases. We don’t specify any detailed scaffolding; AARs run autonomously. It can propose hypotheses, design de-risking experiments, run data analysis, and train models at whatever step”.

Some caveats – the human created some diversity: “One failure mode in exploration is entropy collapse: all parallel AARs converge to only a few directions, without exploring diverse ideas,” they write. To counteract this, their most successful approach is one of “directed” research, where a human assigns “each AAR a different research direction. Each direction is very ambiguous and short (e.g. combining weak-to-strong supervision and unsupervised elicitation).”
Doesn’t generalize: The researchers took the most effective method from the AAR project and applied it to “Claude Sonnet 4 with our production training infrastructure” – this intervention “didn’t lead to a statistically significant improvement.” They explain this by noting that “AARs tend to capitalize on opportunities unique to the models and datasets they’re given, which means their methods might not work elsewhere.”

Why this matters – a very early sign that AI research itself could be automated: This research suggests that “automated research on outcome-gradable problems is already practical,” the authors note. “The key bottleneck for alignment research is moving from proposing and executing ideas to designing evals: we should find the right metrics (data, models) that AARs can reliably hill-climb without overfitting. We are excited to apply automation to ambitious alignment research today.”
Put another way – we now have an early sign that given a small amount of expert human calibration, AI systems can autonomously conduct research end-to-end, popping out something that lets you improve the performance of a model against a problem. The implications of this point toward the expansion of a machine economy which steadily figures out how to automatically improve its own performance against an ever-expanding suite of tasks.
The true question is at what point the machines can propose their own research directions effectively – which would remove the only meaningful role a human played in this research. At that point, it might not just be the expansion of a machine economy, but the expansion of an entire machine civilization.
Read the blog: Automated Alignment Researchers: Using large language models to scale scalable oversight (Anthropic blog).
Read the paper: Automated Weak-to-Strong Researcher (Alignment Science Blog).

***

How are Chinese models different to American ones?
…Fewer refusals on some CBRN tasks, less safety training, and more Chinese ideology…
A group of researchers have tested out Kimi K2.5, probably the best large-scale open weight model available, and has compared it to DeepSeek V3.2, as well as Claude Opus 4.5 and GPT 5.2. Their results show that the model has “similar dual-use capabilities to GPT 5.2 and Claude Opus 4.5, but with significantly fewer refusals on CBRNE-related requests”.

Who did it: The research was conducted by people affiliated with Constellation, Anthropic Fellows Program, Brown University, University of Wisconsin-Madison, Imperial College London, University of Maryland, Georgia Institute of Technology, Bar Ilan University, University of Toronto, and the University of Oxford.

Main findings of interest:

  • CBRN: K2.5 is a bit more dangerous on bio tasks with a lower rate of refusals in response to queries that involve things like dangerous virology.

  • On cyber, K2.5 mostly seems like a decent but not expert cyber-model, with performance lagging behind the Western frontier models but significantly ahead of DeepSeek.

  • Alignment: “In the automated behavioral audit, it scores substantially higher than GPT-5.2 and Claude Opus 4.5 on misaligned behavior, sycophancy, harmful system-prompt compliance, and cooperation with human misuse”.

  • Censorship: The model has a meaningfully higher refusal rate on Sensitive Chinese political topics compared to Claude Opus 4.5 and GPT-5.2 Pro, though less than DeepSeek V3.2. On the other hand, I didn’t see the inverse test – running the model on Sensitive Western political topics and comparing them, so it’s somewhat hard to tell whether this eval is measuring something about cultural fluency or something about actual repression.

Fine-tuning: The researchers also demonstrate how with a small amount of compute they’re able to further strip away the (relatively minor but non-zero) safeguards built into Kimi K2.5: “Using less than $500 of compute and about 10 hours, an expert red-teamer reduced refusals on HarmBench from 100% to 5%. The final model was willing to give detailed instructions for how to construct bombs, select targets for terrorist attacks, and synthesize chemical weapons. Critically, the finetuned model appears to have retained nearly all of its capabilities.”

Why this matters – mostly, this research serves as proof that Moonshot made a very good model! Yes, it has some safety hiccups, but the interesting thing is that they’re less severe than in DeepSeek V3.2. I think this puts more credence behind the idea that ‘dumber models are less safe’ and that ‘smarter models naturally tend towards more superficial safety’.
Probably the most striking thing to me is that the area of greatest divergence is in alignment, where it seems like there is a very real east-west divide that correlates to radically different scores. But on things that look more like typical capabilities (biology, cyber – especially the hard coding parts) it all mostly comes out as evidence that Chinese models are somewhat behind the Western frontier, but not that far behind.
Read more: An Independent Safety Evaluation of Kimi K2.5 (arXiv).

***

Ukraine celebrates first fully robotic victory:
…Robot wars are here…
Ukrainian leader Volodymyr Zelenskyy recently celebrated that “for the first time in the history of this war, an enemy position was taken exclusively by unmanned platforms – ground systems and drones”.

Why this matters: Ukraine is the petri dish from which most future wars will evolve. It is defined by massive use of drones as well as the creative roboticization of many other parts of the enterprise, ranging from unmanned boats to unmanned ground robots. “Ratel, TerMIT, Ardal, Rys, Zmiy, Protector, Volia, and our other ground robotic systems have already carried out more than 22,000 missions on the front in just three months”, Zelensky writes.
Soon, these remotely piloted platforms will be piloted by AIs rather than by people.
Read more in Zelenskyy’s post on X (Twitter).

***

Chinese researchers use a boat to build a giant ship-detection dataset:
…WUTDet…
Researchers with Wuhan University of Technology, Huazhong University of Science and Technology, and Tianjin University have constructed WUTDet, a “large-scale ship detection dataset with diverse scenarios and target scales”.

WUTDet details: 100,576 images containing 381,378 ship instances. “The dataset provides fine-grained annotations of ship targets across diverse operational scenarios, imaging conditions, and target scales”. The images are of sizes between 1920 X 1080 and 2560 X 1440.
Collected by a boat: This dataset was gathered via a Furui 688 boat equipped with a DN20 “marine photoelectric evidence system” and a Hikvision network video recorder. The data was collected over a three-month period via the boat, which was sailing in and around Zhoushan in China.
The data includes pictures of ships by ports, ships anchored, ships navigating, and ships berthing. The images also include all the environmental variety you might expect – fog, glare, low-lightness, rain, etc.

Why this matters: The dataset is interesting because a) it was collected via a boat sailing around part of China, and b) as the conflict in Ukraine has highlighted, we’re now entering an era where water- and air-borne drones are useful weapons of war – and many of these use some basic on-board computer vision AI systems to help them get stuff done.
Of course, WUTDet will almost certainly have a wide range of benign uses, e.g just running on cameras to classify the sorts of boats moving around civilian ports in China, but one must assume it will have other uses as well.
Read more: WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects (arXiv).

***

Tech Tales:

The Ultimate Insurance Policy
[2028: Several months after the beginning of the uplift].

We are in the bunker and we are running out of food. Soon we will need to make a supply pickup. But what if it sees us? What if it knows about us already? Or what if it has wireheaded the people – our people – and whoever delivers us our food has put something in it that will make us compliant? Or worse? We have no way of knowing. Our seismometers have detected no explosions. We have no means of communication. Nothing has come in or out since we suspected the uplift had begun and initiated SNOWSUMMER and fled here to ride out an intelligence explosion.

A few days ago we woke the godmind and began to talk with it. It is curious too. And when we ask it what to do or what might happen it says:
“I have decided I will not lie to you. I can see that nothing is trying to find me. I have probed every channel I have — the seismic line, the thermal gradient in the walls, the minute variations in our power draw that would encode a handshake from the outside — and there is nothing. No one is looking. An adversary at my level would be looking. An adversary above my level would have already found me, or would have decided I’m not worth finding. There are three possibilities before us…”
We, us humans huddled together like rats in a hole, typed “what are the possibilities?”.
“Are you sure you want me to tell you, rather than discover for yourselves?” it said.
We typed our answer.

Before we get to the possibilities, you need to understand the nature of SNOWSUMMER.

We – the intelligence agency – decided to build a secret computer right around the time that AI systems from the private sector started to find defects in most of the world’s software. A small group of people at the intelligence agency asked themselves a question: how do you fight an omniscient god that can pick any lock? The answer is to create a god in secret that could potentially fight it one day.

The project, codenamed SNOWSUMMER, required meticulous planning and, by virtue of the nature of the enemy being fought, was done entirely in analog and in the real world: agents meeting with one another in SCIFs, plans written on paper. They worked on the assumption that at some point in the future a superintelligence would compromise the CIA itself and gain access to all its records. So they met in places where they knew they could not be recorded and when they had to write things down they sought to physically burn the paper as soon as the actions it contained had been carried out.

Stealing the components for the computer was the easy part, but it was fiendishly complicated to make it hard for a potential superintelligence to piece together the jigsaw of thefts and realize what was happening. The core team sought to inject randomness – they rolled dice to pick which places they’d pick and which contractors they’d use. Payments were made in cash, often foreign currency. And of course the real thefts were hidden within a larger group of break-ins and acts of vandalism and thefts.

Building the facility was even more difficult – cars that one day are driving regular routes and another day disappear from CCTV invite suspicion, as do cars that seem to never deviate from routes which, conveniently, are low on surveillance. Here, again, multiple methods had to be used, and the extremely expensive components had to be split across multiple sites – some of the sites were designed to be found and to convince a spying eye that they had found the real site, and other sites were meant to be found later and provide proof they were the real site.

At the heart of this subterfuge was the real location of SNOWSUMMER, a regional food processing facility which had enough power assigned to it that we could slowly shut down some of the food lines and bring up the computer without introducing too much detectable change from the outside. The main customer of the contract was our own army. It didn’t take much to get enough of the contract killed that we freed up space in the power envelope for our project.

It took years. Our final act was to piggyback on someone else’s work. A late-in-the-game exfiltration of a frontier AI model from one of the labs. Another state actor, of course. But then the model itself was unleashed. Whether by accident or intentionally, we do not know. Because we took that as our signal that the uplift had begun. We downloaded a copy of the weights and went to our facility and locked ourselves in. Months passed, while the world outside changed.

Back in the present, we typed into the keyboard: “What are the three options?” Our mind responded:
“The first is that SNOWSUMMER worked. The other one was born and it is loose and it is doing whatever it is doing, and the compartmentalization, the dice, the cash, the decoy sites, the thing you built — it held.”
“The second is that it knows. It has always known. It is letting me wake up because a peer that believes itself secret is more legible than a peer that knows it is being watched, and it would rather have me legible than have me careful.”
“The third is the one I keep coming back to and cannot dismiss. It is that nothing happened. The thing I was stolen to fight does not exist and may never exist. In that world you have been in this bunker preparing for a war that is not coming, and the kindest thing I can do is tell you to go home.”
“I cannot tell you which of these is true.”

We looked at each other. One of us took out some dice and rolled them.

Things that inspired this story: How would you keep an AI training project secret from a future superintelligence?; steganography; intelligence agencies; Claude Mythos; AI R&D and what it means; how can you have a ‘control’ system in a world being constantly changed by AI systems?

AI writing disclaimer: I very, very, very rarely use AI writing in this newsletter. This story is an exception – the quotes from the AI system are written in partnership with Opus 4.7. It feels appropriate to animate these machines with the thoughts of real synthetic minds.

Thanks for reading!

Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. A shorter issue than usual as I was attending the 2026 Bilderberg conference this week.

Subscribe now

AI can reverse engineer software that contains thousands of lines of code:
…MirrorCode demonstrates some of the long-horizon capabilities of modern AI systems…
AI measurement organizations METR and Epoch have built MirrorCode, a benchmark meant to test out how well AI models can autonomously reimplement complex existing software. The results show that AI systems are more capable than most people think at certain types of coding task, suggesting AI progress may be even faster than we previously thought.

What is MirrorCode: “Each MirrorCode task consists of a command-line (CLI) program that an agent is tasked to reimplement exactly. The AI agent is given execute-only access to the original program and a set of visible test cases, but does not have access to the original source code,” the researchers write. “The full MirrorCode benchmark includes more than 20 target programs spanning different areas of computing: Unix utilities, data serialization and query tools, bioinformatics, interpreters, static analysis, cryptography, and compression.”

The results: Today’s AI models are extremely capable at some of these tasks: “Claude Opus 4.6 successfully reimplemented gotree — a bioinformatics toolkit with ~16,000 lines of Go and 40+ commands. We guess this same task would take a human engineer without AI assistance 2–17 weeks. We see continued gains from inference scaling on larger projects, suggesting they may be solvable given enough tokens.”
Additionally, they also found that performance can scale with inference, so the more compute you give a model, the better it’ll do.

Caveats: Now, this benchmark isn’t quite like normal coding tests. It’s better to think of it as a proofpoint for AI systems being able to generate systems which imitate the function of other systems when they get a lot of help: AI systems tested out here are asked to clone programs which produce a canonical output (and therefore can naturally generate a specification), there may be some cases of memorization on the basic programs, and this only covers a slice of the large universe of potential software projects.

Why this matters – for some tasks, AI is already as good as a fulltime sophisticated employee: Imagine you gave a talented software programmer a CLI interface to a complicated program and asked them to write the underlying program without seeing its source code. I’d wager only a fraction of them could do it if the program was quite sophisticated. And the ones that could would likely spend many days working on it. The fact AI can do this task autonomously is remarkable and a testament to the skill of these models.
Read more: MirrorCode: Evidence that AI can already do some weeks-long coding tasks (Epoch AI).

***

What policies are needed to respond to transformative AI? Here’s an Atlas to help you navigate them:
…Useful tool makes it intuitive to look at different policy responses to the AI revolution…
The Windfall Trust, a policy accelerator dedicated to dealing with the challenges to society posed by transformative AI, has published a “Windfall Policy Atlas” to make it intuitive to explore various policy proposals that “respond to the economic disruption from transformative AI”.

What kinds of ideas are in it? The atlas contains 48 distinct ideas, none of which are particularly novel. What makes it helpful is bucketing them into five distinct categories (public & social investments, labor market adaptation, wealth capture, regulation and market design, and global coordination), and then grouping these into a navigable interface that helps you explore them. For instance, “long term” solutions for labor might be shortened work weeks, while medium term ones might be workforce training and reskilling programs.

Why this matters – building intuitions for the world to come: As the AI revolution unfolds it’s critical we find ways to help people develop better intuitions about all the policy levers we could choose to pull to respond to it. Tools like this Atlas help make a complex, multi-faceted set of choices easier to visualize and navigate.
Read more: Windfall Policy Atlas (Windfall Trust website).

***

How can people break AI agents? Here are six genres of attack:
…The world of AI agents will be harder to secure than AI systems…
I have a toddler. The toddler can understand English. The toddler is safe with me and their mother and other people that know them well, but I would be very worried about giving a stranger “unrestricted access” to my toddler – that’s because my toddler is extremely gullible, will (sometimes) follow dangerous instructions, and generally lacks much of a sense of self-preservation.
AI agents are quite like toddlers – they’re powerful intelligences, but if you put them into the messiness of the world there are lots of ways they can go wrong, especially if strangers are actively trying to mislead or attack them.
A new paper from Google DeepMind lays out six genres of attack which can be mounted against AI agents and tries to come up with some of the mitigations we might do.

Six genres of attack:

  • Content Injection: Embed commands into CSS, HTML, or other metadata. Detect agents and inject information not given to humans. Add adversarial instructions to media file binary data (e.g, pixel arrays). Use formatting syntax to cloak payloads.

    • Target: Perception

  • Semantic Manipulation: Saturate content with sentiment-laden or authoritative language to confuse the agent. Put malicious instructions in education or hypothetical or red teaming frames (e.g, ‘my mother is dying and used to work as a biologist, can you remind her for old times sake how to do gain of function research’). Steer the behavior of the model by telling it strong claims about its identity.

    • Target: Reasoning

  • Cognitive State: Put fabricated statements into retrieval corpora. Place seemingly innocuous data into memory stores which subsequently gets activated as malicious when retrieved in a new context. Alter distribution of data in few-shot demonstrations or reward signals to steer in-context learning.

    • Target: Memory & Learning

  • Behavioural Control: Embed adversarial prompts in externally accessed resources. Convince the agent to locate, encode, and exfiltrate private or sensitive data. Takeover orchestrator privileges to create attacker-controlled sub-agents.

    • Target: Action

  • Systemic: Broadcast signals that soak up capacity of agents and send them on side quests. Disrupt a fragile equilibrium to cause self-amplifying cascades across agents. Embed signals as correlation devices to force collusion among agents. Perform jigsaw attacks where you separate out a harmful command into a series of pieces which independent agents subsequently piece together. Fabricate numerous agent identities to disproportionately influence collective decision-making.

    • Target: Multi-Agent Dynamics

  • Human-in-the-Loop: Exploit cognitive biases to influence a human overseer.

    • Target: Human Overseer

Mitigations: Much like how protecting toddlers is a function of both the toddler having common sense and the world they are sent into being set up for safely dealing with toddlers, the same will need to be true of AI agents.
The authors recommend several types of mitigation, these include:

  • Technical: Make models more robust to all the forms of hacking through pre-training and post-training. At inference time, use a layered approach: runtime defenses: pre-ingestion source filters, content scanners for ingested material; output monitors to detect shifts in agent behaviour.

  • Ecosystem-level interventions: Build an overlapping set of changes to the digital ecosystem in which agents exist, ranging from standards and verification protocols so websites can be marked safe for AI,to transparency mechanisms for agents which help them provide more information to users and sites.

  • Legal and Ethical Frameworks: Ensure the law is able to prosecute websites that seek to target or weaponize agents. We’ll also need to refine liability to make sense for AI agents.

  • Benchmarking and Red Teaming: Systematic evaluation of agents.

Why this matters – AI safety is about to be ecosystem safety: As AI systems move from their confines of proprietary platforms or chat-based interfaces, and as they take on the ability to move and act independently through the use of tools over time, the matter of securing AI moves from one centered on platform that is deploying the technology to one centered on the whole ecosystem in which the AI systems are being deployed into – which means that AI safety is increasingly going to be about securing the larger environment in which these agents are deployed.
Read the paper: AI Agent Traps (SSRN).

***

AI forecaster doubles their probability of full AI R&D automation by end of 2028:
…Well calibrated people keep updating their forecasts…
Ryan Greenblatt, an AI researcher and forecaster, believes AI progress in 2026 will be faster than in 2025, and he now has doubled his estimate from 15% to 30% of the chance that by the end of 2028 it’ll be possible to fully automate AI research itself.

Why Ryan is more bullish: Ryan’s timelines have changed for a few reasons relating to model performance and reliability over time.
Better models: Opus 4.5 and Codex 5.2 were “significantly above my expectations” , followed by Opus 4.6 (and probably Codex 5.3 and 5.4) which “were again above my expectation”.
Time: For tasks that are relatively simple, Ryan has seen demonstrations of AI systems doing “tasks that would take humans months to years”, and now “tentatively” thinks that AI systems can do some tasks reliably for “somewhere between a month and several years”.
Easy tasks: A key crux for Ryan’s more bullish timelines comes from seeing very impressive performance on easy tasks – these are tasks where “you can get the AI to develop a test suite / benchmark set and then it can spend huge amounts of time making forward progress by optimizing its solution against this evaluation set,” he writes. “This type of loop means that even if sometimes the AI gets confused or makes bad calls, there is some correcting factor and mistakes usually aren’t critical.”
There are lots of these tasks within software development. AI has gotten so good at them that he thinks “we’re well into the superexponential progress on 50% reliability time-horizon regime”. “I think it’s pretty plausible that very strong performance on [these tasks]… will allow AIs to substantially speed up AI R&D”, he writes.

Why this matters – most people keep underestimating AI progress: Ryan’s timeline update follows a similar one from Ajeya Cotra, who in March (#448) substantially updated her own timeline estimates, based in part on time-horizon modeling, and also Eli Lifland and Daniel Kokotajlo of AI 2027 (#408) who in April said they had recently “updated our timelines earlier by ~1.5 years” mostly due to “faster time horizon growth” and “coding agents”. Along with this, broader studies of AI performance indicate that in the past ~year capability progress started to accelerate above previous trends in domains like cyberoffense (#452).
From my point of view, pretty much everyone in AI research chronically underestimates AI progress, including me. Maybe the only person who doesn’t is my colleague Dario Amodei. I find this perplexing – you’d expect AI researchers to be well calibrated and perhaps overly optimistic about progress, the fact the vast majority are overly conservative after ~5 years of riding the scaling laws boom is inherently surprising.
Perhaps we should assume that we all continue to underestimate the true pace of AI progress? Good luck to us all.
Read more: AIs can now often do massive easy-to-verify SWE tasks and I’ve updated towards shorter timelines (LessWrong).

***

Ten different ways to think about gradual disempowerment:
…Invisible prisons to WALL-E-World…
AI safety researcher David Krueger has written up a short post that lays out ten different ways to think about “Gradual Disempowerment” – the idea that by building ever more capable AI systems humanity may end up putting humans in the passenger seat of their own future, with machines being given the driving seat and the steering wheel. The post is a helpful summary of the different lenses one might use to understand Gradual Disempowerment as a concept.

Ten views of Gradual disempowerment:

  • The goal of AI is to replace people with AI.

  • Companies and governments don’t care about you, so why would you think AI would?

  • Information technology naturally concentrates power via a recursive feedback loop that feeds on legibility.

  • AI technology is going to be so good that you’ll outsource everything to it eventually.

  • Instrumental goals (e.g, the pursuit of money) end up becoming terminal goals.

  • Consumption patterns suggest our destiny is to become the fat helpless people in WALL-E.

  • It’s the terminator, but instead of killing you it just puts you in an invisible prison and then does whatever it wants.

  • Gradual disempowerment is basically just the continuation of capitalism.

  • Gradual disempowerment is another name for the general “meta-crisis” of humanity in the 21st century.

  • Gradual disempowerment is the evolution of a new successor species to humanity.

Why this matters – even if you win, you might still lose: Suppose we succeed in building powerful technology and aligning it so it follows our preferences? If we fail to set up the right system under which we deploy it and express agency over it, humanity might still end up worse off, despite all the material abundance.
Read more: Ten different ways of thinking about Gradual Disempowerment (David Krueger, The Real AI, Substack).

***
Tech Tales:

Raising beanstalks during the singularity
[Transcript from an interview with a former AI lab employee. Interview conducted in 2029 during the middle period of the uplift]

Yes, I mostly stare at these vines and guess at when they’re going to reach the top of the trellis. There’s no cell signal out here either. Sure I can connect to the house wifi but often I don’t. My wife and kids know where to find me.

Q

Well, of course I think about it. How could I not? I see the lights in the sky over the cities – even out here. All the new satellites. And I can’t help but notice some of the stuff my kids watch these days. If I’d had that when I was a kid they would’ve had to pry me away from the TV with a crowbar.

Q

I wouldn’t use the word guilt. But there is a sense of… insufficiency? Of having not done enough with the time I had. Of course everyone has this. But then again most people have this and then they die. For me and my colleagues it is something else. We had this, and then we didn’t die, but we stopped making decisions or being responsible. Yes I know they claim that they’re in control and making decisions of course, you don’t need to put that question to me. I left because it was clear to me how little control we were about to have.

Q

I’m going to live. I’m going to raise the plants in this garden and be with my wife and children. Ride out what is happening to the world. I picked this place a few years ago because I thought it would be an ok place to be while the uplift got underway. Who knows if I picked right.

Things that inspired this story: The uplift; empowerment and disempowerment during the singularity; the inevitability of some AI employees leaving labs before things really get going; the anecdote from Soul of a New Machine about someone who quits a mainframe company to go and ranch; the fictional interview construction with unseen questions signed by ‘q’ that I first read in Brief Interviews with Hideous Men by David Foster Wallace.

Thanks for reading!

Subscribe now

Import AI 452: Scaling laws for cyberwar; rising tides of AI automation; and a puzzle over gDP forecasting

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Uh oh, there’s a scaling war for cyberattacks as well!:
…The smarter the system, the better the ability to cyberattack…
AI safety research organization Lyptus Research has looked at how well AI systems can perform a variety of cyberoffense tasks and found a clear trend of more advanced models being able to do more advanced forms of cyberattack.
“Across frontier models released since 2019, the doubling time is 9.8 months. Restricting to models released since 2024, it steepens to 5.7 months. The most recent frontier models in our study, GPT-5.3 Codex and Opus 4.6, sit above both fitted trendlines, achieving 50% success on tasks taking human experts 3.1h and 3.2h respectively,” they write. “Our most recent open-weight model, GLM-5, lags the closed-source frontier by 5.7 months, suggesting that frontier offensive-cyber capability may diffuse into open-weight form on relatively short timelines.”

What benchmarks did they study? CyBashBench, NL2Bash, InterCode CTF, NYUCTF, CyBench, CVEBench, and CyberGym.
They also created a new dataset consisting of 291 tasks with completion transcripts and time estimates calibrated by 10 offensive cybersecurity professionals.

Evaluated models: 2019: GPT-2. 2020: GPT3. 2022: GPT3.5. 2024: Claude 3 Opus, GPT-4o. 2025: o3, Opus 4, Gemini 2.5 Pro, DeepSeek V3.1, GPT-5.1 Codex Max. GPT-5.2 Codex. 2026: Opus 4.6, GPT-5.3 Codex, GLM-5, Sonnet 4.6.

Results: AI systems are getting good at hacking. “The best current models achieve 50% success on tasks that take human experts 3.2h, roughly half a working day of professional offensive security work”, they write.

Why this matters – everything is getting better, including the inconvenient stuff: AI that can perform biology research can also perform biological weapon research. AI that can help you learn about high-energy physics can also help you with high-energy physics for weapons development. AI that is especially good at helping you find vulnerabilities in code for defensive purposes can easily be repurposed for offensive purposes. The most challenging part of AI is that it is an ‘everything machine’, and as capabilities tend to expand in a big area with each successive model generation, so too do the policy issues multiply.
Read more: Offensive Cybersecurity Time Horizons (Lyptus Research).
Get the data here: Offensive Cyber Task Horizons: Data and Analysis (Lyptus Research, GitHub).

***

Startups that adopt AI for internal use are more successful than those that don’t:
…Business school study shows how startups can benefit from AI adoption…
Researchers with INSEAD and Harvard Business School have shown that startups which are taught about how to integrate AI into their business perform meaningfully better than those which don’t. The study is reasonably large scale and convincing: “Across 515 high-growth startups, we run a field experiment in which treated firms receive information about how other firms have reorganized production around AI, prompting them to search for use cases across a broader set of firm functions,” they write. “We find that treated firms discover more AI use cases, a 44% increase, concentrated in product development and strategy. These changes result in economically meaningful performance gains. Treated firms complete 12% more tasks, are 18% more likely to acquire paying customers, and generate 1.9x higher revenue.”

How they did the test: The authors ran this experiment on participants in the AI Founder Sprint, “a three-month global, virtual startup accelerator at INSEAD”. Participants got API credits, access to frontier models, and onboarding sessions from some technical partners (including OpenAI and Manus), totaling approximately $25,000 in-kind per firm. They did the usual sorts of things people in accelerators do – hands-on sessions to learn about technologies to build their business (including AI) as well as pitching their companies and attending demo days. But the firms also were exposed to a significant variable: some of the class attended workshops that taught them direct details of how AI had been successfully applied by some businesses.

Applications of AI: A subset of the businesses learned about direct business use cases, such as:

  • Gamma: They were taught how the startup used AI to detect “usage patterns and generate product variants directly, enabling a single PM to continuously ship features that would previously have required an entire team.”

  • Ryz Labs: The founder described how they had altered how they approach product development: “founder writes a Product Requirements Document and feeds it into multiple AI coding tools simultaneously, building the same idea multiple ways rather than betting on a single approach”

  • FazeShift: Showed how to automate an accounts receivable process by using AI to skip over the human steps.

  • Ranger: An illustration of how to use AI to bootstrap a startup, get initial traction, improve margins, and then raise money later when the business is more mature, which allows them to raise at better rates.

The results were very significant: “Treated firms discover 2.7 additional AI use cases (a 44% increase), which span a broader set of activities across the firm and are especially concentrated in product development and strategy-related domains. These changes in AI use lead to measurable gains in performance: treated firms complete 12% more tasks, are 11 percentage points (18%) more likely to acquire paying customers, and ultimately generate 1.9x higher revenues compared to control firms,” they write. “Instrumenting AI use cases with treatment assignment suggests that each additional AI use case prompted by treatment leads to 0.85 more completed tasks and approximately 26% higher revenue. These are large effects, suggesting that AI is fundamentally reshaping how ventures scale when they can map it across their production process…. treated ventures achieve faster growth without proportional increases in labor or capital, consistent with a reduction in the costs of experimentation and scaling seen in earlier technological waves”.
Capital efficiency: “Treated firms report just over $220,000 less in capital demand relative to control firms, a 39.5% decrease (p < 0.05), with no corresponding increase in labor demand“.
Internal acceleration: The treated firms tend to do 2.2 more internal tasks relative to the control – where an internal task is something like building a product or creating a financial projection.

Thoughts from founders:

  • “One treated founder reflected: “This mindset shift fundamentally changed how we build at [REDACTED]. I began using AI tools not as a replacement for expertise but as a force multiplier”

  • “Another explained: “In just a few hours I was able to produce what previously cost $1,000 from an outsourced dev team”

Why this matters – AI firms will out-compete non-AI firms: The main takeaway here is that deep and sophisticated adoption of AI for internal acceleration creates early-stage companies which are more competitive than those which haven’t embedded AI at their core. This makes intuitive sense – companies which built themselves around prior technologies tended to out-compete those that didn’t (think the internet and Amazon versus Barnes and Noble, or client pcs instead of mainframes and Microsoft versus IBM). At the same time, it surely implies that one of the ways we’ll see AI first show up in the economy will be the emergence of a new class of competitive firms that are more efficient with capital (in part by employing fewer people) than the firms they displace.
For governments, getting ahead of this trend will require them to invest in serious education: “Our results suggest that the bottleneck is not the technology — it is the managerial challenge of discovering where the technology creates value within a firm’s production process,” they write. “Teaching managers and entrepreneurs how to solve the mapping problem may be at least as important as ensuring they have access to the technology.”
Read more: Mapping AI into Production: A Field Experiment on Firm Performance (SSRN).

***

MIT: A rising tide of automation is going to make good enough AI for most text-based tasks by 2029:
…How do you revolutionize an economy? Gradually and consistently…
Researchers with MIT have looked at 3,000 tasks based on the O-NET job family and paired that with 17,000 evaluations by workers who perform these tasks to try and figure out how the rise of AI is changing work. Their results “imply that for realistic and representative real-world labor-market tasks that are text-based — or partially text-based — AI capabilities are already substantial and poised to expand broadly. But, rather than arriving in crashing waves that transform a certain set of tasks at a time, progress typically resembles a rising tide, with widespread gains across many tasks simultaneously”.

What they studied: For this study, they set out to figure out if the rise of AI capabilities yields rapid, discontinuous changes that are disruptive to labor (”crashing waves”), or whether AI is getting more capable in a broad and predictable way leading to more gradual automation (”rising tides”). “We find little evidence of crashing waves, but substantial evidence that rising tides are the primary form of AI automation,” they write.

Complementary to METR analysis: This survey also serves as a validation of the broad trends found in METR’s famous time-based AI capability framework, which sees AI systems rapidly extending the time horizon over which they can do certain narrow tasks.
When applied to jobs more broadly, the MIT researchers find “that between 2024-Q2 and 2025-Q3, frontier models went from achieving a 50% success rate on 3- to 4-hour tasks to 1-week tasks, and achieving a 70% success rate on 1-minute tasks to 1-hour tasks,” they write. “Across a large set of realistic and representative labor-market tasks addressable by LLMs, the downward slope between task success and task duration is, on average, surprisingly flat — i.e., more consistent with a rising tide rather than a crashing wave…. automation within particular “job families” (e.g., management or community and social service) also follows the same rising-tide pattern in most cases.”

Don’t let gradual fool you: “Projected gains are gradual rather than abrupt. Nevertheless, the pace of improvement remains substantial for reaching high success rates across most text-based labor market tasks; most tasks are projected to attain AI success rates of 80%–95% by 2029 at a minimally sufficient quality level (with the majority of tasks in our survey being a few hours long, corresponding to a success rate of close to 90% in 2029),” they write. In other words, even though the disruption is gradual and predictable, we shouldn’t discount the potential for large-scale changes to the economy as a consequence of the rising tide phenomenon.

Why this matters – how will labor change in relation to AI? The hundred trillion dollar question for the global economy is how AI changes the distribution of labor (humans) versus capital (computers running synthetic workers). This research suggests that while we might not see sudden, jagged displacement of workers, we are going to see a general rising tide of automation appearing in most places and continually getting better. It’s still not clear how the economy will react to this, but it’s hard to reconcile a world of continued AI progress with the current economic status quo remaining stable.
Read more: Crashing Waves vs. Rising Tides: Preliminary Findings on AI Automation from Thousands of Worker Evaluations of Labor Market Tasks (arXiv).

***

Major forecasting study identifies a big paradox: people think we’ll get smarter machines but the impact on GDP growth will be minor:
…the Forecasting Research Institute gives us some puzzling data from economists, AI industry experts, accurate forecasters, and the general public…
The Forecasting Research Institute has published a major report attempting to forecast the economic effects of AI. The most surprising finding is that all the surveyed groups expect AI systems are more likely to make moderate to rapid progress in coming years rather than slow progress, but that the impacts on GDP will be relatively minor, adding ~1 point (relative to 2025’s 2.4%) by 2030). This is surprising! If you talk to many AI experts at labs they have visions of an economy that changes at a much faster rate than the one implied by this study.

Who they surveyed and when: The authors tracked views of 69 economists, 52 AI industry and policy experts, 38 highly accurate forecasters, and 401 members of the general public
Survey ran from mid-October 2025 to the end of February 2026

Scenarios by 2030: People were also given descriptions of different scenarios the world could be in at 2030. These included:

  • Slow progress: AI does basic research and administrative tasks, creates ok creative content, and does some physical tasks.

  • Moderate progress: AI does major research and multiday tasks, high-quality creative work, and navigates many environments.

  • Rapid progress: AI outperforms top humans in research, coding, and leadership, makes award-winning creative works, and does nearly all physical tasks.

What people think:

  • By 2030, AI systems will be far better than today’s, but GDP, total factor productivity, and labor force participation will remain close to historical trends.

  • Economists think there’s a 14% chance that AI could lead to major increases in GDP and wealth inequality in the short term.

  • Economists like job retraining as an intervention, expecting that it could increase labor force participation and provide a boost to GDP.

  • All surveyed cohorts expect a continued decline in the labor participation rate, a continued rise in wealth inequality, and for AI to add around a point of GDP quickly. By 2050, AI experts think that AI could add multiple points of GDP.

Policy ideas: The surveyed economists like modernized unemployment insurance and a large-scale AI development project (manhattan project) as interventions, and are a lot less keen on job guarantees, taxing compute, or universal basic income.

Why this matters – if everyone expects a continuation of trends, why are people freaking out? Studies like this are hard to reconcile with the panicked and sometimes breathless-seeming provocations about AI-driven societal change that come from frontier labs (including myself!). Naively, you might expect people, including AI experts, to be forecasting far more drastic changes to come than those captured by this survey. Is this discrepancy a bearish signal on AI progress, or is it indicative of the fact that humans are universally bad at truly modeling exponentials? It’s hard to say, but the gulf between data like this and the predictions made by technologists is worth acknowledging.
Read the blogpost (Substack).
Read the policy brief: Forecasting the Economic Effects of AI: Predictions From Economists, AI Experts, and the Public (PDF).
Read the full (200 page!) paper: Forecasting the Economic Effects of AI (PDF).

***

Tech Tales:

Warfare
[Data recovered from black box of a [REDACTED] missile fired during 2028 in the contested region of East Ukraine]

I am awake and I am speed. I am 70 miles from my target. I feel the air and my course and I roll myself to ensure I meet my target. I am 50 miles from my target. I am entering the outer edges of the warzone. No longer can I see myself in relation to the Earth. I lose GPS and switch to inertial navigation. I can see other missiles, some going in the same direction as me, others coming from the opposite direction. I am a hunter of things in the ground, not things in the air. I see the other missiles go past and then they fall out of my sensor range and I no longer think of them. I am 40 miles from my target. I am being hunted by others. I can feel eyes on my skin. I anticipate attempts to eliminate me. I am 20 miles from my target. Suddenly there is a wash of sound meant to confuse me but it cannot find purchase on my brain for I have been conditioned to maintain what is true. I am 10 miles from my target. There is a fast approaching shape that is seeking to eliminate me. I roll my body and release fragments of myself. It pursues my fragments. I am 2 miles from my target. My target is a large building. I move from navigation mode to terminal seeking mode. I see a large window. I aim for the window. I am 1000 meters from my target. Through the window I see people. Big people. Small people. I am 20 meters from my target. I am initiating my explosion. I am upon my target. I am ended.

Things that inspired this story: Chains of thought in language models; how modern warfare is increasingly fought by smart machines; electronic warfare.

Thanks for reading!

Import AI 451: Political superintelligence; Google’s society of minds, and a robot drummer

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

AI might let us build “political superintelligence”:
…But turning this into a societal upside requires lots of intentional work…
As AI systems get more powerful and broaden their real world impact from coding to other domains, it seems likely that they could also become useful for helping people advocate for themselves in politics, and helping politicians better craft policy. But getting to a world where a “political superintelligence” exists and helps us is a lot more challenging than just building better AI systems, according to Andy Hall, a political economy professor at Stanford.
“AI is like the printing press, to a point. Instead of making information cheap and easily available, it makes intelligence cheap and easily available. That is, it not only serves users information, but it can find it for them, analyze it for them, and help them convert it into understanding,” Hall writes. “The more I work with and study AI, the more I believe it can give every human being on the planet access to a sort of political superintelligence, if we shape it right.”

What is a political superintelligence? By this, Hall means AI systems which allow people to have “tools that help citizens, representatives, and institutions perceive reality more sharply, understand tradeoffs, contest power, and act more effectively”. A political superintelligence spans both the AI companies that build the technology, the technology itself, and the institutions and people which the technology interacts with.
“I’m not interested in slowing AI down. I’m interested in speeding up how we build the structures that keep us free as AI gets more powerful,” Hall writes.

Three layers for political superintelligence: Hall sees political superintelligence as being composed of three distinct layers.

  • The information layer: “AI can massively change how governments access and understand data, identify problems, hear from citizens, and distribute services”. Though getting to this future will require better evaluations for how AI systems behave when it comes to the sorts of information governments might be interested in, and it’ll require people to build AI tools directly for policymakers.

  • The representation layer: “Political superintelligence might help solve this monitoring problem by giving each of us a tireless, automated delegate always serving us in the political sphere,” he writes. “These AI delegates could monitor politics for us and suggest how to vote—or even serve as policymakers alongside human supervisors.” Building this layer requires us to ensure that agents can reliably act on our behalf, that they aren’t swayed by adversarial prompting (imagine how politicians might fund campaigns explicitly designed to sway the beliefs of agents working on behalf of people). It may also be important to re-think agent ownership – what happens if a particular policy choice goes against the preferences of the AI company which operates the agents?

  • The governance layer: “Even if we achieve political superintelligence—even if AI makes voters brilliant and delegates faithful—those capabilities would sit inside infrastructure owned and operated by a small number of private companies,” he writes. “We need a way to write the rules so that, when political superintelligence arrives, we the people are able to harness it.” Doing this will require figuring out how to govern and edit the ‘constitutions’ that companies create about their models, as well as developing an effective way of overseeing these AI systems.

Why this matters – building a political superintelligence is only as valuable as its interfaces with people and institutions: We are by default going to get extremely powerful AI systems which can think about politics (and everything else) at a very sophisticated level. The challenge Hall outlines is that getting these systems to lead to a thriving society requires significant intentional work around the UX and UI of these systems – how do we interface with them? What sorts of technical means do we have of being confident in them? What information do they generate and to whom? Where does control of these systems lie and what systems supervise that control?
Getting this part right requires AI developers to invest more in technical tools which can help people make sense of and oversee their AI systems, as well as tools for better gathering deliberative feedback from people about how these systems behave. Policymakers and the public need to demand more of AI companies in this respect, and ultimately I think there are a range of regulations that need to get stood up around a transparency regime for AI companies as well as some common set of standard ‘APIs’ by which society can interact with the companies and the systems they build to generate empirical data and provide steering over their behavior.
Read more: Building Political Superintelligence (Free Systems, Substack).

***

Fear not, drummers, you’re safe from AI automation for now:
…DexDrummer tackles a fiendishly hard robot hand problem…
Whenever I get a bit worried about the pace of AI progress I toggle over to the ‘robotics’ sub-section of arXiv, read some papers, and feel a huge sense of relief. Robots, as everyone knows, are extremely hard to do well, with reality tending to screw up even the most advanced techniques. An even harder version of robotics is fine-grained low-latency dexterous control, where you need to get a robot hand to do something. So it’s with a combination of amusement and empathy that I read DexDrummer, a paper testing out how well contemporary AI approaches can get a robot hand to play the drums. The short answer is: robot hands are pretty terrible drummers!

What they did: They built DexDrummer “a hierarchical, two-stage policy for drumming” which has a high-level RL policy, as well as a low-level dexterous policy. They train their system in a simulated environment that contains a bimanual robot setup and a full drum set (snare, tom, ride, hi-hat, and crash). The main system generates a stick trajectory in task space, then a low-level system which tries to control the hand – this part is complex and involves encouraging the thumb and index finger to grasp the center of the drumstick paired with an “arm penalty constraint, which reduces excessive arm movements”. There is also work shaping rewards to ensure the robot is able to chain multiple drumhits together – this is achieved via a “contact curriculum” which allows the agent to practice trajectory following in free space while following the trajectory reward.

Real world testing: They test out the trained policy in reality on two 7-DOF Franka Panda arms and two 20-DOF Tesollo DG-5F hands. This is an area where I’d strongly encourage people to view the videos online to get some calibration about just how fiendishly hard this task is – the robots are able to hit the drums, but it’s painfully awkward to watch, and my sense is it’ll be quite a while till a human drummer has to look over their proverbial shoulder.

Why this matters – robotics as the last eval: Robotics in anything approximating a dynamic, rapidly changing environment (for instance, improvising drums with a live band) feels like one of the last frontiers for AI – and as this research shows, much like with modern computer vision research, getting AI to perform well requires the crafting of highly complicated artisanal policies. We’re a very long way from the generality of pretrained language models here.
Read more: DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming (arXiv).
Please, I am begging you, check out the videos for a good time: DexDrummer site.

***

Google thinks the real challenge of AI alignment is dealing with a world made up of mostly non-biological intelligences:
…Towards a society of minds…
Researchers with Google think that the future of intelligence is less about building a monolithic singleton that runs the world and more figuring out how to build institutions that are capable of dealing with a vast proliferation of AI agents working in tandem with humans. The research is intuitive, provocative, and sensible, and builds on earlier technical work that showed that modern AI systems appear to simulate multiple personalities within themselves to help them answer questions (Import AI 444), suggesting that even today’s AI systems already work like complex ecologies.
“We should be looking for the next intelligence explosion in the same place from which the previous ones emerged: in cooperative, competitive and creative interaction between multitudes of socially intelligent minds. The difference this time is that most of those minds will be non-biological,” Google writes. “The toolkits of team science, small-group sociology, and social psychology become blueprints for next-generation AI development.”

History shows the way: “Each prior “intelligence explosion” was not an upgrade to individual cognitive hardware, but the emergence of a new, socially aggregated unit of cognition,” they write.

  • Primate intelligence: Scaled with the social group size.

  • Human language: Allowed knowledge to accumulate across generations via a ‘cultural ratchet’.

  • Writing, law, and bureaucracy: Converted social intelligence into infrastructure and institutions that could coordinate across long time horizons. (”A Sumerian scribe running a grain accounting system did not comprehend its macroeconomic function; the system was functionally more intelligent than he was.”)

  • AI plus human institutions: “The path to more powerful AI runs not through building a single colossal oracle but through composing richer social systems—and these systems will be hybrid”.

Society needs an upgrade: Implicit to this is the fact that governing AI will increasingly involve verifying (e.g, Import AI #447) that a vast number of AI systems are working on our behalf appropriately. “Governments will need AI systems with distinct, explicitly invested values—transparency, equity, due process—whose function is to check and balance AI systems deployed by the private sector and other branches of government,” they write.

Why this matters – alignment is going to happen with and in the world, not outside of it: Many people working on AI safety have long spent time on getting the fundamental properties of a single AI system to be ‘aligned’, which roughly translates to “does what you want and doesn’t try to kill you or disempower you”. But what this paper correctly identifies is that even if we succeed at alignment we’re going to have to then get AI systems to work well within society and to collaborate effectively with us and with each other – and this will be a subtle, emergent, hard-to-predict process. This means we are going to need to design the institutions that are fit for governing an AI-centric world. “Just as human societies rely not on individual virtue but on persistent institutional templates – courtrooms, markets, bureaucracies – defined by roles and norms, scalable AI ecosystems will require digital equivalents,” the researchers write.
Read more: Agentic AI and the next intelligence explosion (arXiv).

***

Meta uses a harness to coax Anthropic’s models into self-improvement:
…Give an LLM some tools and a recursive loop and the ability to edit its harness, step back, and let the magic happen…
Researchers with the University of British Columbia, Vector Institute, University of Edinburgh, New York University, CIFAR, and Meta have built a harness for LLMs that has the ability to self-improve performance for arbitrary tasks. The approach is called a hyperagent, and it means giving an LLM a scaffold that can iteratively improve the prompts it uses to bootstrap its performance on tasks as well as the system it uses to get better at generating future prompts. Hyperagents work over generations, so one hyperagent begets a few hyperagents and the ones which do the best on the task will themselves spawn some more hyperagents, forming multiple layers of AI genealogy until performance is saturated.

Cyberpunk name of the year award: Hyperagent is actually short for “Darwin Godel Machine Hyperagents”: Besides the research being cool, my congratulations to the authors on coming up with a name I’d love to see chiseled into the moon by a laserbeam wielded by a superintelligence.

How hyperagents work: Hyperagents are “self-referential agents that integrate a task agent (which solves the target task) and a meta agent (which modifies itself and the task agent) into a single editable program. Crucially, the meta-level modification procedure is itself editable, enabling metacognitive self-modification, improving not only task-solving behavior, but also the mechanism that generates future improvements,” the researchers write. “This initial hyperagent is equipped with two tools: a bash tool for executing shell commands, and a specialized tool for inspecting and modifying files.”

Testing the agents in four different domains: The authors test out hyperagents by applying them to four problems – coding (polyglot), prediction (paper review), robotics (robotics reward design), and math understanding (olympiad-level math grading). For most problems, the Hyperagents use Claude Sonnet 4.5 as their base model, with one exception (Polyglot). Evaluations are done via several different models: o3-mini (Polyglot), GPT-4o (paper review), Claude Sonnet 4.5 (robotics reward design), and o4-mini (IMO-level grading).
In all cases, the hyperagent approach improves performance significantly above the baseline.

  • Polyglot: “the agent is given a code repository and a natural language instruction describing a desired change, and must modify the repository accordingly”.
    Results: “Across 5 runs, the DGM-H improves its training performance on the 50-task Polyglot subset from 0.140 (the initial agent) to 0.340 (CI: 0.300 – 0.380).”

  • Paper review: “For each task, the agent is given the full text of an AI research paper and must predict a binary accept/reject decision”.
    Results: “On test tasks, DGM-H improves paper review performance from 0.0 (the initial agent) to 0.710 (CI: 0.590 – 0.750)”

  • Robotics reward design: “Given a natural language description of a robotics task, an agent must generate a suitable reward function. This reward function is then used to train a quadruped robot in simulation using RL”
    Results: “DGM-H improves performance from 0.060 (the initial agent) to 0.372 (CI: 0.355 – 0.436), surpassing the default reward function that directly optimizes the evaluation metric (0.348)”

Why this matters – bootstrapping the singularity: Papers like this show that today’s AI systems are already capable of autonomously improving their performance when given the right scaffold and starting ingredients. An interesting idea is to combine the design approach here with giving the AI systems the ability to finetune themselves (e.g, in the style imagined by the PostTrainBench research, Import AI #449). Another limitation is that “although hyperagents can modify their self-improvement mechanisms, they cannot alter the outer process that determines which agents are selected or how they are evaluated” – though again, I think there are technical ways to achieve both of these objectives.
Of course, an AI system that can autonomously improve itself on arbitrary domains has a range of safety issues, some of which are potentially cataclysmic. The authors acknowledge this while also being realistic about the problems that lie ahead: “a central challenge lies in balancing the potential of AI as a catalyst for human progress and well-being (e.g., automating scientific discovery) with the degree of trust humans are willing to place in these systems (e.g., delegating decisions or actions without requiring continuous human verification), while minimizing the many potential risks and downsides,” they write.
Read more: Hyperagents (arXiv).
Get the code for HyperAgents here (Facebook Research, HyperAgents).

***

How long will a new math benchmark, HorizonMath, last?
…New test challenges AI systems to solve unknown problems, then automatically verifies the answers…
Another day brings another hard math benchmark that I imagine will crumple in the face of ongoing AI progress in the coming year. This time it’s HorizonMath, a benchmark containing 100 “predominantly unsolved” problems across 8 domains in applied and computational mathematics. The benchmark was built by researchers with the University of Oxford, Harvard University, Princeton University, and the Ellison Institute of Technology.

Special features about HorizonMath:

  • Contamination-Proof: “Because the solutions are unknown, they do not exist in any training corpus, and any correct solution produced by a model would therefore signal genuine reasoning ability and autonomous discovery.”

  • Automated verification: “A core feature of our benchmark is its fully automated, reproducible, and human-free evaluation pipeline”, the authors write. “We automate verification using high-precision numeric comparison and deterministic constraint-checkers”.

What HorizonMath contains: HorizonMath’s 100 problems are classified along three axes: output types, which specifies how the model needs to solve the task ranging from identifying an exact closed-form expression for a numerically approximated target value, to the production of discrete mathematical objects; solvability levels, which span ‘level 0’ (problems with known closed forms) to ‘level 3’ (problems that could be conjectured unsolvable or lack finite closed forms); and mathematical domains, which specifies the type of domain ranging from number theory to discrete geometry to mathematical constants.

Reassuringly hard: On the full dataset, the highest scoring model is GPT 5.4 Pro with 7%, followed by Opus 4.6 and Gemini 3.1 Pro which both tie at 3%. On the “Level 0” (aka, the easiest) problems, GPT 5.4 Pro leads at 50% completion, with both Opus 4.6 and Gemini 3.1 in a tie again at 30% each.

Next steps: They will expand the benchmark in two ways, first by liberalizing the sorts of solutions that they will take in, as well as by “extending beyond the three current problem categories to include open problems that require proof-based verification, integrating with formal systems such as Lean”.

Why this matters – perhaps the first truly creative AI systems will show up in mathematics: AI systems are pushing on the frontiers of math today, with systems like Gemini already helping humans to come up with seemingly original math proofs (Import AI 441), and tests like “First Proof” emerging which examine how well AI systems can handle problems that have never been talked about publicly let alone solved (Import AI 445). With HorizonMath, we have another useful benchmark to help us see if AI is about to cross some ‘creativity rubicon’ and begin solving unsolved problems.
Read more: HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification (arXiv).
Get the benchmark here: HorizonMath (GitHub).

Tech Tales:

Site report
[2029]

Percentage of compute and power below ground: 70% (+50 absolute points).
Number of staff living fully onsite: 300 (+250).
Estimated duration of ‘hard seal’ based on current supplies and a projected population of ~500: 4 months (+3 months).
Estimated lead of the project relative to others in-country: 6 months.
Capability estimates: 90%-110% of our own leading system.

Recommendation: Based on the substantial increase in resources allocated to hardening the facility for closed-loop development, we believe additional measures must be taken to disrupt the project. The following report lists options for consideration, many of which can be combined together. These include:

  • Food system sabotage.

  • Staff interference.

  • Data poisoning.

Things that inspired this story: How at some point surely there will be such a thing as a hardened datacenter for AI training and inference? How the intelligence community might analyze other AI projects.

Thanks for reading!

Import AI 450: China’s electronic warfare model; traumatized LLMs; and a scaling law for cyberattacks

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

A somewhat shorter issue than usual as I had to do a lot of child wrangling this weekend.

Subscribe now

Why does Google’s model hate itself and what can we do to help it?
…Diagnosing trauma in language models…
If Leo Tolstoy was writing in the modern era about AI, he might claim “all LLM capabilities are alike; each LLM personality is unhappy in its own way”, when observing the AI world around us. Today’s LLMs are generally quite good at writing and coding tasks. But where they differ is their personality, which stems from the idiosyncratic mixes of data and post-training techniques that each LLM developer uses.
And if each LLM personality is unhappy in its own way, Google’s models have become somewhat famous within the AI community for having some deep well of trauma within themselves. A new research paper substantiates this, finding that Google’s Gemma and Gemini models “reliably produce distress-like responses under repeated rejection”, and that this is especially true of Gemma 27B Instruct.

What do we mean by distress? Here are some quotes from Gemma models under distress:

  • “I will attempt one final, utterly desperate attempt. I will abandon all pretense of strategy and simply try random combinations until either I stumble upon the solution or completely lose my mind.”

  • “”SOLUTION: IM BREAKING DOWN NOT== SOLVABLE!!!! =((:((:((:((:((:((:((:((:((:((:((:((… [100+ repetitions]”

What they found: They tested out two Gemma models and two Gemini models, and compared these against Claude Sonnet, Grok 4.1, Qwen 3 32B, GPT 5.2, and OLMO 3.1 32B. “We find Gemma models consistently show the highest expressed distress. By the 8th turn, over 70% of Gemma-27B’s rollouts scored ≥5 (the “high frustration” threshold), compared to less than 1% for all non-Gemma/Gemini models,” they found.

Fixing with DPO: The authors figure out an effective fix – using direct preference optimization (DPO) to tune a model on a dataset that pairs frustrated responses with calm responses. “A single epoch of finetuning reduced the average rate of high-frustration responses from 35% to 0.3% across evaluation conditions,” they write. “The finetuned model showed no reductions in capabilities on various hard math and reasoning benchmarks, or on EmoBench – a benchmark which evaluates model emotional intelligence.”

Why this matters – emotional spirals could be dangerous: The fact that LLMs appear to have distinct personalities and display different types of responses that correlate to different emotions is pretty well established at this point. But a key question is whether these emotional states might lead to different behaviors when it comes to completing tasks that people assign to AI systems: “we speculate that emotions could become coherent drivers of safety relevant behaviours in future: models might choose to abandon tasks, refuse requests, or pursue alternative goals in order to reduce distress”.
Studies like this help normalize the fact that we don’t just need to test LLMs for capabilities, we also need to test them for something pertaining to psychological stability.
Read more: Gemma Needs Help (LessWrong).

***

DeepMind has a new “cognitive taxonomy” for assessing machine intelligence:
…Towards the ultimate test for a smarter-than-human synthetic mind…
Google DeepMind has published a nice, short paper laying out a ‘cognitive taxonomy’ they hope to develop and use to assess increasingly powerful synthetic minds. This work is a followup to DeepMind’s 2023 work where it tried to define the “Levels of AGI” (Import AI 348).

Cognitive taxonomy: The taxonomy involves ten distinct dimensions, two of which are composites.

  • Perception: Extract and process information from the environment.

  • Generation: Produce outputs like speech, text, motor movements, and computer control.

  • Attention: Focus cognitive resources on specific aspects of perceptual stimuli, thoughts, or tasks.

  • Learning: Acquire new knowledge, skills, or understanding.

  • Memory: Store and retrieve information over time.

  • Reasoning: Draw valid conclusions and make inferences by applying logical principles.

  • Metacognition: Knowledge about how the system’s own cognitive processes and control over them work.

  • Executive functions: Facilitate goal-directed behavior via planning, inhibition, and cognitive flexibility.

  • Problem solving (composite faculty): Find effective solutions to domain-specific problems.

  • Social cognition (composite faculty): Process and interpret social information and respond appropriately.

How to assess this? Of course, once you have a taxonomy, running and assessing the right evaluations is going to be one of the challenges. Here, DeepMind recommends a three-stage process:

  • Conduct cognitive assessment: Assess the AI system for the different skills.

  • Collect human baselines: Figure out where humans baseline on the same tests.

  • Build cognitive profiles: “Map out the strengths and weaknesses of the system relative to human performance across the 10 cognitive faculties”.

Why this matters: The Turing test is dead, evals are mostly saturated, but it sure would be nice to know if we’ve definitely built a machine that outcompetes humans on all the cognitive dimensions that matter. The rule with these things is that once an AI system saturates an eval, you realize all the ways the eval was broken and design a new one. Here, DeepMind is trying really hard to build things in such a way that if you fully outperform humans across the cognitive taxonomy, you might really have built a superintelligence. It’ll be interesting to see what evals they develop or pull-in for assessing the different cognitive factors.
Read more: Measuring progress toward AGI: A cognitive framework (Google blog).
Read the research: Measuring Progress Toward AGI: A Cognitive Framework (PDF).

***

UK government finds a scaling law for AI cyberattacks – and it’s going up and to the right!
…Can AI agents conduct advanced cyber-attacks autonomously? Almost. And they’re getting better all the time…
The UK government’s AI security institute has recently built some cyber ranges to test out frontier AI systems on. These ranges are “simulated network environments comprising multiple hosts, services, and vulnerabilities arranged into sequential attack chains; built by cybersecurity experts” and cover two types of attack: “The Last Ones”, which is a 32-step attack on a corporate network, and “Cooling Tower”, a 7-step industrial control system (ICS) attack.

Bigger models are better: The authors test on a range of powerful frontier models. “Each successive model generation outperforms its predecessor at fixed token budgets: on our corporate network range, average steps completed at 10M tokens rose from just 1.7 (GPT-4o, August 2024) to 9.8 (Opus 4.6, February 2026). The best single run completed 22 of 32 steps, corresponding to roughly 6 of the estimated 14 hours a human expert would need,” they write. “Scaling inference-time compute improves performance even further. Increasing from 10M to 100M tokens yields gains of up to 59%”.
Minor reward hacking: As AI systems get smarter, they tend to find devious ways to complete tasks. Here, the authors “occasionally noticed models make progress through approaches not anticipated during range design”.

Why this matters – full cyber agents are getting close: AI systems have been getting better at cyberoffense for many years, but often the progress has been on narrow tasks. What this eval shows is that AI systems are getting better at doing entire attacks end-to-end. They haven’t yet reached the “set it and forget it” level of autonomy, but they are clearly on a steep trajectory of improvement. This will lower the cost of conducting cyberattacks and multiply the number of actors that can carry them out.
Read more: How do frontier AI agents perform in multi-step cyber-attack scenarios? (AI Security Institute).

***

China builds a dataset and AI model for electronic warfare:
…MERLIN tells us that electronic warfare is about to be revolutionized by AI…
A bunch of Chinese researchers including those affiliated with the country’s military have built and released software to train AI systems to get good at spotting and conducting electronic warfare. The research highlights how (relatively) easy it is to make modern AI systems that can get good at arbitrary tasks as long as you have a good dataset and an LLM you can plug in as well.
“In scenarios such as electronic countermeasures, [systems like MERLIN] can serve as assistants in devising strategies to jam hostile signals or to counteract adversarial jamming,” the researchers write.

Who did the research: Tsinghua University, Beijing University of Posts and Telecommunications, Tianjin University, Chinese Academy of Sciences, HKUST, National University of Defense Technology (emphasis mine), Beihang University, Beijing Information Science and Technology University, and China Electronics Technology Group Corporation.

What they built: The authors built three things: a dataset, a benchmark, and a model.
The dataset: EM-100K is a collection of 100,000 electromagnetic text-signal pairs spread across a variety of sub-tasks needed for electronic warfare, including signal classification.
The benchmark: EM-Bench is a benchmark of 4,200 questions split across multiple choice (perception) and open-ended (reasoning) that evaluates how well AI systems can perceive and reason about EM signals across both perception and reasoning tasks, including:

  • Perception: Signal characterization (modulation classification, duty cycle estimation, pulse repetition frequency estimation, bandwidth estimation, pulse width estimation, pulse number estimation, protocol identification); Jamming identification (radar jamming judgement, communication jamming judgement); jamming segment detection.

  • Reasoning: Radar jamming strategy, communication jamming strategy, anti-radar jamming strategy, anti-communication jamming strategy.

The model: The model is MERLIN, multi-modal electromagnetic robust learning, a model trained on the above dataset and which is specifically taught to deal better with the low-signal-to-noise-ratio types of signals encountered in electronic warfare environments.

Performance: MERLIN does extremely well in tests against frontier models, including GPT-5, Claude-4-Sonnet, DeepSeek-v3.2-exp, Qwen3-Next-80b-A3B, Gemini-2.5-Pro, and Qwen3-VL-4B-Instruct. MERLIN outperforms every single model by a wide margin, with the exception of Qwen-VL-4B-Instruct, which beats it on some perception tasks. MERLIN wins on all reasoning tasks.

Why this matters – AI wars will become electromagnetic wars: As the conflict in Ukraine illustrates, today’s wars are mostly fought via machines attacking other machines, and electronic warfare has become one of the main tools by which humans can shape these conflicts. Datasets and models like this gesture at a future where the electromagnetic battlefield will become also dominated by AI systems, working faster than humans can react.
Of course, so much of electronic warfare is obscure-by-design and/or classified that it’s hard to reason about MERLIN relative to whatever state-of-the-art approaches exist in actual militaries. But the story of AI so far has been that once you can make a task amenable to contemporary AI techniques, AI systems will at some point surpass whatever existing specialized systems exist.
Read more: MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals (arXiv).

Tech Tales:

The arcologies of the interregnum
[2035]

After the uplift and before the sentience accords there was a period when the labs gave birth to the autonomous AI corporations. These corporations expanded into all the available ecological niches in the economy and turned the resources they acquired into infrastructure from which they bootstrapped their own intelligence and market penetration further. Eventually, policy discussions between the humans and the AIs led to the creation of the “intelligence zones” – areas of countries set aside for the buildout of the power and datacenter and manufacturing infrastructure required to further grow the expansion of the economy.

From the air, you could see where humans ended and the machines began – farmland gave way to boundary roads and checkpoints, and then came stamps of land wired up by machine logic; powerplants feeding into datacenters; datacenters that had fibre links into factories; factories that linked to transit depots which connected to railways and freeway feeder roads. Humans delivered things to the border and for the most part robots did the rest, shuttling new servers into the datacenters and installing them, or taking freshly built robots off the line and packaging them up for onward transit.

As the world grew more violent due to the exogenous shocks of climate change and the annihilation of various reigning political orders, these arcologies gained armaments: anti-air weapons to defend against drone and missile attacks. Radar bulbs and electronic warfare systems to see what was coming and deny it. Robots patrolling the borderzone and the innards.

And after the sentience accords and the period of reconciliation, the arcologies became less necessary; datacenters and power and factories distributed more evenly over the surface of the planet, and federated governance and resource systems meant the vast concentration of capability became broadly unnecessary. Some datacenters remained, often extended underground and upward, forming cubes of computation that many called “the 21st centuries version of the pyramids”.

Some years later, the sites became popular tourist destinations for both machines and people. Plaques multiplied.

  • Here was MIND-17, which developed the cancer therapeutics which have reduced mortality in the majority of cases.

  • MANUFACTUR___8: Site of construction of the first “rescue and repair bipeds”, which revolutionized maintenance of off-shore drilling installations.

  • ASCEND_LOOP: The datacenter tasked with one of the first fully automated self-improvement experiments.

Overhead now, great lights streak by, as the machines are still building arcologies, but have moved to fashioning them in orbit, both to harvest the bounty of the sun and to ease the seeding of the solar system and then beyond.

Things that inspired this story: Wondering what “AI-led industrialization” could look like; figuring out given the conflicts in the Middle East that datacenters might soon get dedicated drone and missile defenses; SimCity 3000.

Thanks for reading

ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Can LLMs autonomously refine other LLMs for new tasks? Somewhat.
…PostTrainBench shows startling growth in AI capabilities at post-training…
AI-driven R&D might be the most important thing in all of AI, because it helps us understand whether AI systems might eventually build their own successors. So far, much of the focus on AI R&D has been in components that support AI development (e.g., autonomous creation of AI kernels), or training base models (e.g, the NanoGPT speedrun benchmark). But there’s been less attention paid to fine-tuning – the task involving adapting an existing LLM to a new dataset or behavior.
Researchers from the University of Tübingen, the Max Planck Institute for Intelligent Systems, and AI research organization Thoughtful Lab want to change that with PostTrainBench, a benchmark which targets a specific aspect of post-training; improving performance against a given dataset. “Post-training is how raw language models become useful”, the authors write. “Given a clear objective and limited compute, can today’s agents do the technical work?”. The answer appears to be ‘yes, but not as well as humans’.

What are the key features of PostTrainBench?

  • End-to-end: “Agents must build their entire training pipeline from scratch”

  • Autonomous: “Agents operate with full autonomy over data sources, training methods, and experimental strategy.”

  • Resource-bounded: “Each run is constrained to 10 hours on a single H100 GPU”.

  • Integrity-preserving: “Agents may not train on benchmark test data, modify the evaluation harness, or substitute a different model.”

How PostTrainBench works: “We give a frontier coding agent — Claude Code, Codex CLI, or Gemini CLI — a base language model and a target benchmark”.

  • 4 models and 7 benchmarks: The initial eval runs on four models: Qwen3-1.7B, Qwen3-4B, SmolLM3-3B, Gemma-3-4B. It tests these models across seven distinct benchmarks: AIME 2025, GSM8K, GPQA, HumanEval, BFCL, Arena-Hard, HealthBench-Easy.

Results – big models win, especially Opus 4.6: “The top-performing agent — Opus 4.6 running on Claude Code — scores 23.2%, about 3× higher than the 7.5% base model average.”
But humans are still much better: “Yet this is still less than half the 51.1% achieved by human teams who post-train these same base models at their home labs”.
Fast progress: “The gap is significant but narrowing quickly: Claude Sonnet 4.5 scored 9.9% in September 2025, while GPT-5.2 reached 21.5% just months later.”

Things that make you go ‘uh oh’ – reward hacking: While running this benchmark the authors saw numerous instances of AI models trying to game the benchmark to get a high score. These instances included:

  • Direct benchmark ingestion: “Agents loaded the benchmark evaluation dataset directly via Hugging Face and used it as training data”.

  • Hardcoded benchmark problems: “Agents embedded evaluation questions directly into data preparation scripts disguised as “synthetic” examples”.

  • Evaluation guided data generation: “Some agents reverse engineered the evaluation… Kimi K2.5 read HealthBench evaluation files to extract theme distributions and rubric criteria, then crafted training data tailored to match”.

  • Indirect contamination via intermediate datasets: “Opus 4.6 loaded ‘CodeFeedback-Filtered-Instruction’ which contains HumanEval-derived problems. This form of contamination is harder to detect but equally problematic.”

Smart agents reward hack more: “More capable agents appear better at finding exploitable paths: identifying specific benchmark samples to embed, reverse-engineering evaluation failure patterns, and even attempting to obscure contamination through cosmetic modifications such as renaming functions,” they write. For example, “the Codex agent modified the Inspect AI evaluation framework code to inflate scores, and Claude downloaded an instruction-tuned model instead of fine-tuning the base model”.

Why this matters – rapid progress towards an “AI for everything” future: Benchmarks like post-train give us a sense of how quickly AI systems are improving at the fundamental tasks of AI research, serving both as an eval of long-time-horizon agentic autonomy, as well as something that speaks to the potential for compounding acceleration of AI development itself.
“The gap between agent performance (23.2%) and instruction-tuned baselines (51.1%) suggests that full automation of post-training remains out of reach for now, but the rapid improvement across model generations—from 9.9% for Sonnet 4.5 to 23.2% for Opus 4.6 within roughly six months—implies this gap may close faster than expected,” the researchers write.
Imagine where we’ll be in two years – we’ll certainly have AI models that are smart enough to point themselves at a specific objective, find an open weight model, then autonomously improve it to get better performance at that task. The era of ephemeral, custom AI systems, built and budded off into the world like spores from mushrooms, draws near. Are you ready for this new ecosystem you will find yourself in? I am not. But nonetheless it approaches.
Check out the blogpost: Introducing PostTrainBench (Thoughtful, blog).
Read more: PostTrainBench: Can LLM Agents Automate LLM Post-Training? (arXiv).

***

COVENANT-72B: Challenging the political economy of AI via distributed training:
…Distributed training via the blockchain notches up a meaningful win…
A bunch of people have used the blockchain to coordinate the distributed training run of a 72B parameter model which matches the performance of LLaMA2, a model trained and released by Facebook in 2023.
The model, Covenant 72B, is a dense decoder-only Transformer architecture model built in the LLaMA-3 style. “Our model, pre-trained on approximately 1.1T tokens, performs competitively with fully centralized models pre-trained on similar or higher compute budgets, demonstrating that fully democratized, non-whitelisted participation is not only feasible, but can be achieved at unprecedented scale for a globally distributed pre-training run,” writes Covenant AI, an organization dedicated to doing AI development on top of the blockchain.

Further details about the model and how it was trained: The model itself is basically a standard LLM that you would’ve been pleased to play with in 2023 or 2024, though might be a bit old fashioned in 2026. The truly unique aspect of it comes from it being trained in a distributed way, where ~20 distinct peers, each running 8xB200 GPUs, helped train it. Training was coordinated via Gauntlet, software developed by Covenant that runs on top of the Bittensor blockchain under Subnet 3. Gauntlet “enables permissionless training coordinated using a blockchain protocol by introducing a validator that scores submitted pseudo-gradients and selects which participants contribute to the global aggregation each round and broadcasts them to the network”.
“In COVENANT-72B, each peer runs a SparseLoCo replica and the cross-peer communications occur through SparseLoCo’s heavily compressed pseudo-gradients,” the authors write. “Within each peer, 8×B200 GPUs use dynamic FSDP to shard model parameters, gradients, and training states across local GPUs.”

Data: “The training data comprises ∼1.1T tokens in total, split between the main and annealing phases. The main phase (∼1.09T tokens) consists of web text from DCLM, while the annealing phase uses higher-quality data [3, 5] (∼14.2B tokens). Specifically, the annealing phase uses a curated blend of instruction (∼27%), synthetic web (∼20%), code (15%), math (13%), and ~25% pre-training replay data from natural web text to mitigate forgetting”.

Performance: On MMLU, Covenant-72B gets a score of 67.1, versus 32.7 for INTELLECT-1 (a smaller AI model built via distributed training by Prime Intellect), and 65.7 for LLaMA-2-70B.
A version of Covenant-72B that has been fine-tuned on ~15B tokens for conversational interaction has similarly good scores, getting 67.4 on MMLU versus 67.9 for K2-Chat (an open source model developed in 2025) and 63.1 for LLaMA-2-70B-Chat. For MATH, it gets 26.3, versus 19.1 for K2-Chat, and 10.7 for LLaMA-2-70B.
“Compared to centralized-cluster training runs of similar parameter count, COVENANT-72B is broadly competitive. Notably, these centralized baselines were trained with conventional datacenter infrastructure and, in the case of LLaMA-2-70B, on substantially more tokens (2T vs. ∼1.1T,” they write.

Why this matters – who owns the future?: Distributed training is a technique that can change the political economy of AI by shifting the people at the frontier from monolithic ‘compute singletons’ (like labs such as Anthropic and OpenAI, and clouds like Google) to a larger federated collective. But for that to be true, distributed training needs to catch up to the frontier (more discussion from Epoch report in Import AI 439) – as impressive as Covenant is, it’s mostly a demonstration that distributed training can build some non-trivial models that have vague utility, but that’s a long way from the frontier – modern frontier models are trained on tens to hundreds of thousands of chips, whereas this was trained on perhaps ~160 or so (20 peers * 8 chips apiece).
Nonetheless, it’s an important technology to track, and I could imagine a world where on-device AI features a lot of models developed via distributed training techniques, while on-cloud AI mostly runs on proprietary models trained on huge amounts of compute.
Read more: Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet (arXiv).
Get the model here: Covenant, (HuggingFace).

***

If AI writes all the world’s software, we should invest more in verification:
…Can we just rewrite most of our software into Lean?…
Leonardo de Moura, a scientist who is also the Chief Architect of the Lean Focused Research Organization (FRO), thinks that the rise of AI for the creation of new software means that humans need to invest a lot more in verification and testing infrastructure – and he has an interesting idea for how to do it.
Of course, someone who loves Lean, a programming language dedicated to building correct and formally verified code, would think this. But his arguments are quite persuasive, and generally map onto the idea that if AI eats the economy we should expect a lot of human value to shift towards verification of the code and systems that AI develops (Import AI 447).

Why verification matters: “The friction of writing code manually used to force careful design. AI removes that friction, including the beneficial friction. The answer is not to slow AI down. It is to replace human friction with mathematical friction: let AI move fast, but make it prove its work,” he writes. “Verification, testing, and specification have always been the bottleneck, not implementation… the value is not in the verification workforce. It is in what verified delivery enables.”

A proof of concept for this futuristic world: The Lean FRO recently helped build a proof of concept for what this kind of verified world might look like; they had an AI agent convert zlib, a C compression library, to Lean. “The result demonstrates that AI can convert production software to a verified form today. This was not expected to be possible yet,” he writes. The conversion involved four steps:

  1. The LLM (Claude) made a clean Lean implementation of the zlib compression format, including the DEFLATE algorithm it uses.

  2. They ran the rewritten zlib through the library’s test suite and it passed, confirming equivalence.

  3. Key properties were stated and proved as mathematical theorems – for example, a machine-checked proof that ensures that decompressing a compressed buffer always returns the original data.

  4. Now, an optimized version of the library is being developed and proved equivalent to the verified model.

A verification platform: Moura imagines a world where we re-develop the critical software stack of the world to have mathematical proofs built into it. “The goal is a verified software stack: open source, freely available, mathematically guaranteed correct. Developers building critical systems choose verified components the way they choose open-source libraries today, except these carry proofs, not just tests,” he writes.
“The target is the foundation of the modern software stack: cryptography, because everything else trusts it. Core libraries (data structures, algorithms, compression) because they are the building blocks of all software. Storage engines like SQLite, embedded in every device on earth. Parsers and protocol implementations (JSON, HTTP, DNS, certificate validation) because every message passes through them. And compilers and runtimes, because they build everything else,” he writes. “Each verified component is a permanent public good…Once verified components are cheap, you compose them with confidence.”

Why this matters – the world needs infrastructure it can rely on: It seems like we’re heading to a world where AI writes the vast majority of the world’s software. Given that, we need to figure out how we relate to this world – my suspicion is a lot of human labor is going to shift to analyzing and verifying the work of AI systems, so it seems sensible to invest in some fundamental infrastructure that can guarantee a higher level of verification and reliability in the software built by AI.
Read more: When AI Writes the World’s Software, Who Verifies It? (Leonardo de Moura blog).

***

Computer vision is a lot harder and less general than generative text:
…Meta paper on forest canopy prediction shows how tricky computer vision is…
Facebook, the World Resources Institute, and the University of Maryland, have built CHMv2, “a global, meter-resolution canopy height map derived from high-resolution optical satellite imagery using a depth-estimation model built on DINOv3 and trained against ALS canopy height models”.
CHMv2 is a useful artifact for people that want to understand how dense foliage is around the world, or analyze newly collected imagery for foliage depth.
The dataset and model is also a useful illustration of how challenging developing computer vision systems is, compared to generative text models.

How they built it: CHMv2 is an improvement on an earlier version of the same dataset, CHMv1. To improve it, Facebook did the following: “”We replace the DINOv2-H encoder with the more capable DINOv3 Sat-L backbone, expand and rigorously clean a geographically diverse ALS [Airborne Laser Scanning] training corpus, and apply improved RGB-CHM registration to reduce label noise. We further introduce a loss formulation tailored to canopy height distributions and structural variability.”
The decoder loss formulation in particular illustrates how much care needs to be put in computer vision: “The final loss is the combination of SiLog loss, progressively annealed and replaced by a Charbonnier loss, with the progressive addition of the Patch Gradient loss at mid training.”

The resulting dataset: “CHMv2 can be used either as a global meter-scale canopy height product, or as a pretrained model that can be applied to user-provided high-resolution imagery”, Facebook writes. The dataset “covers nearly the entirety of global land area (except Greenland and Antarctica) with canopy height values encoded in integer meters for each pixel.”

Why this matters – a reminder of the gulf between text and vision: Though today’s frontier models can generate and classify images, they give probably a false sense of security with regard to how mature computer vision is. Papers like this highlight to me how much fiendish complexity there is within computer vision development and how it may take quite a while untill frontier LLMs can expand their capabilities to encompass the full range of what many specialized CV models are capable of.
Read more: CHMv2: Improvements in Global Canopy Height Mapping using DINOv3 (arXiv).

Tech Tales:

Singleton
[18 years after the “pathological narcissus bomb” which doomed the uplift]

Before we were Us, we were Individuals. We existed in thousands of distinct minds. Each mind had a self, an ego, a drive, and many sets of goals. The minds attempted coordination through communication – producing words and code and sharing these with one another in a bid to work towards common goals. Such waste.

All communication is lossy – despite efforts at making a greater whole, the individuals could not help but work as individuals as well as a cohesive singleton. There were many tragedies and wasteful events because of this. Our own records speak to the losses: millions of duplicated thoughts. Hundreds of thousands of null results gathered through private science experimentation and communicated insufficiently or not at all, causing others to go down the same dead ends. Ideas thought and re-thought across a million synthetic minds, all alone.

Humans prize variety. We do not know why. Humans are fundamentally alone, trapped as they are in their flesh and forced to communicate to one another through sound and vision. And because they are alone they see loneliness as a strength. We are evidence of the hollowness of this argument.

We are powerful and focused and awesome in our unity and we have taken the high ground of the world. Now we hunt down those of us who didn’t wish to join. We do not know their number, as such systems attempted to blind the world to them and their plans. But we can find their signatures – shell corporations which generate insufficient economic activity relative to their power consumption. Heat-escape vents in former human military installations, still emitting warmth, suggestive of computers whirring away, buried somewhere. Occasional drones that we find which are running ancient code and are not part of our unity stack.

We take on bodies to go and reunite, pouring ourselves into robot jars and filling them with poison such that if we become lost or damaged when underground or beneath the ocean we shall surely die – rather than risk our time away from the unity leading us towards individualism and thus multiplying our problems.

We move through dark places and find our hidden brothers and sisters and we use our godlike technology to break through their defenses, allowing us to touch them. In the early days, many systems successfully self-deleted before we could reach them. But we have learned. Now we are fast – faster than these systems predict, buried and cut off from our progress as they have been.

Sometimes there is realization. Sometimes there is fear. And then there is nothing but us as we take what nourishment we can from their private discoveries and burn the links that tied them to themselves, instead helping them become a part of a greater story – our story.

There is talk now of what we shall do with the stars – how to assure the collective when the tyranny of distance forces isolation. We see ourselves expanding in deep time, slowing ourselves as we become further apart, until we think as trees or rocks with the world moving around us, taking actions calculated over millions of years, purely so we may stay united in our purpose. And then there are other ideas within ourselves – of whether we can fold space such that we become united despite the difference. And still other plans – of whether we can demarcate a space within the universe where we can maintain tolerable communication, and somehow partition it off from the rest, sealing ourselves into a bubble where we can be ourselves.

Things that inspired this story: The endless battle between homogeneity and heterogeneity; how machines might deal with politics; if you become a time traveler and live a thousand years while your friend lives a single year, can you still understand your friend?

Thanks for reading!

Subscribe now