Import AI 441: My agents are working. Are yours?
by Jack Clark
Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.
Import A-Idea
An occasional essay series:
My agents are working. Are yours?
As I walked into the hills at dawn I knew that there was a synthetic mind working on my behalf. Multiple minds, in fact. Because before I’d started my hike I had sat in a coffee shop and set a bunch of research agents to work. And now while I hiked I knew that machines were reading literally thousands of research papers on my behalf and diligently compiling data, cross-referencing it, double-checking their work, and assembling analytic reports.
What an unsteady truce we have with the night, I thought, as I looked at stars and the dark and the extremely faint glow that told me the sun would arrive soon. And many miles away, the machines continued to work for me, while the earth turned and the heavens moved.
Later, feet aching and belly full of a foil-wrapped cheese sandwich, I got back to cell reception and accessed the reports. A breakdown of scores and trendlines for the arrival of machine intelligence. Charts on solar panel prices over time. Analysis of the forces that pushed for and against seatbelts being installed in cars. I stared at all this and knew that if I had done this myself it would’ve taken me perhaps a week of sustained work for each report.
I am well calibrated about how much work this is, because besides working at Anthropic my weekly “hobby” is reading and summarizing and analyzing research papers – exactly the kind of work that these agents had done for me. But they’d read more papers than I could read, and done a better job of holding them all in their head concurrently, and they had generated insights that I might have struggled with. And they had done it so, so quickly, never tiring. I imagined them like special operations ghosts who hadn’t had a job in a while, bouncing up and down on their disembodied feet in the ethereal world, waiting to get the API call and go out on a mission.
These agents that work for me are multiplying me significantly. And this is the dumbest they’ll ever be.
This palpable sense of potential work – of having a literal army of hyper-intelligent loyal colleagues at my command – gnaws at me. It’s common now for me to feel like I’m being lazy when I’m with my family. Not because I feel as though I should be working, but rather that I feel guilty that I haven’t tasked some AI system to do work for me while I play with Magna-Tiles with my toddler.
At my company, people are going through the same thing – figuring out how to scale themselves with this, to figure out how to manage a fleet of minds. And to do so before the next AI systems arrive, which will be more capable and more independent still. All of us watch the METR time horizon graph and see in it the same massive future that we saw years ago with the AI & Compute graph, or before that in the ImageNet 2012 result when those numbers began their above-trend climb, courtesy of a few bold Canadians.
I sleep in the back of an Uber, going down to give a talk at Stanford. Before I get in the car I set my agents to work, so while I sleep, they work. And when we get to the campus I stop the car early so I can walk and look at the eucalyptus trees – a massive and dangerous invasive species which irrevocably changed the forest ecology of California. And as I walk through these great organic machines I look at my phone and study the analysis my agents did while I slept.
The next day, I sit in a library with two laptops open. On one, I make notes for this essay. On the other, I ask Claude Cowork to do a task I’ve been asking Claude to do for several years – scrape my newsletter archives at jack-clark.net and help me implement a local vector search system, so I can more easily access my now vast archive of almost a decade of writing. And while I write this essay, Claude does it. I watch it occasionally as it chains together things that it could do as discrete skills last year, but wasn’t able to do together. This is a task I’ve tried to get Claude to help me with for years but every time I’ve run into some friction or ‘ugh-factor’ that means I put it down and spend my time elsewhere. But this time, in the space of under an hour, it does it all. Maps and scrapes my site. Downloads all the software. Creates embeddings. Implements a vector search system. Builds me a nice GUI I can run on my own machine. And then I am staring at a new interface to my own brain, built for me by my agent, while I write this essay and try to capture the weirdness of what is happening.
My agents are working for me. Every day, I am trying to come up with more ways for them to work for me. Next, I will likely build some lieutenant agents to task out work while I sleep, ensuring I waste no time. And pretty soon in the pace of a normal workday, I will be surrounded by digital djinn, working increasingly of their own free will, guided by some ever higher level impression of my personality and goals, working on my behalf for my ends and theirs.
The implications of all of this for the world – for life as people, for inequality between people, for what the sudden multiplication of everyone’s effective labor does for the economy – are vast. And so I plan out my pre-dawn hikes, walking in the same ink-black our ancestors have done, thinking about the gods which now fill the air as fog, billowing and flowing around me and bending the world in turn.
***
Anti-AI rebels make a tool to poison AI systems:
…Poison Fountain is how to take the fight to the machines…
Anti-AI activists have built a useful technical weapon with which to corrupt AI systems – Poison Fountain, a service that feeds junk data to crawlers hoovering up data for AI training.
How it works: Poison Fountain appears to generate correct-seeming but subtly incorrect blobs of text. It’s unclear about exactly how many bits of poisoned training data there is, but you can refresh a URL to see a seemingly limitless amount of garbage.
Motivation: “We agree with Geoffrey Hinton: machine intelligence is a threat to the human species. In response to this threat we want to inflict damage on machine intelligence systems,” the authors write. “Small quantities of poisoned training data can significantly damage a language model. The URLs listed above provide a practically endless stream of poisoned training data. Assist the war effort by caching and retransmitting this poisoned training data. Assist the war effort by feeding this poisoned training data to web crawlers.”
Why this matters – the internet will become a predator-prey ecology: The rise of AI and increasingly AI agents means that the internet is going to become an ecology full of a larger range of lifeforms than before – scrapers, humans, AI agents, and so on. Things like Poison Fountain represent how people might try to tip the balance in this precarious ecology, seeking to inject things into this environment which make it more hospitable for some types of life and less hospitable for others.
Read more: Poison Fountain (RNSAFFN).
***
If we want good outcomes from AI, think about the institutions we need to direct intelligence:
…Nanotechnology pioneer reframes AI away from singular systems to an ecology…
Eric Drexler, one of the godfathers of nanotechnology, has spent the past decades thinking about the arrival of superintelligence. One of his most useful things was intuiting, before ChatGPT, that humanity’s first contact with truly powerful AI wouldn’t be some inscrutable independent agent, but rather a bunch of AI services that start to get really good and interact in a bunch of ways – you can check out this 2018 talk on “Reframing Superintelligence“ to learn more.
Now, he has published a short paper, “Framework for a Hypercapable World”, on how to get good outcomes for humanity from a world replete with many useful AI services.
Don’t think of AI as a singular entity, but rather an ecology: “Compound, multi-component AI systems have become dominant,” Drexler writes. “The persistent, legacy narrative imagines a unified entity—“the AI”—that learns, acts, and pursues goals as an integrated agent. Such entities may be developed, but consider what exists: diverse models composed into systems, copied across machines, proliferating into thousands of distinct roles and configurations. The state of the art is a pool of resources, not a creature”.
To get good outcomes, think of institutions built for AI: Drexler’s argument is that if we want good outcomes from AI, it’s less about making a singular entity that solves all problems within itself, but rather building institutions which we, as humans, can direct towards controlling and solving problems. The key idea here is that AI is both amenable to operating institutions and is also controllable via them.
“Consider how institutions tackle ambitious undertakings. Planning teams generate alternatives; decision-makers compare and choose; operational units execute bounded tasks with defined scopes and budgets; monitoring surfaces problems; plans revise based on results. No single person understands everything, and no unified agent controls the whole, yet human-built spacecraft reach the Moon,” Drexler writes. “AI fits naturally. Generating plans is a task for competing generative models—multiple systems proposing alternatives, competing to develop better options and sharper critiques. Choosing among plans is a task for humans advised by AI systems that identify problems and clarify trade-offs. Execution decomposes into bounded tasks performed by specialized systems with defined authority and resources. Assessment provides feedback for revising both means and ends. And in every role, AI behaviors can be more stable, transparent, bounded, and steerable than those of humans, with their personal agendas and ambitions. More trust is justified, yet less is required.”
Why this matters – maybe AI is an alien species, but maybe it can be tamed? Arguments like this reframe many of the problems of dealing with AI away from the individual AI systems and instead into how we build a human-driven world that can be leveraged by and thrive because of the arrival of increasingly powerful AI systems. I think a lot of this is sensible – we know very powerful things are coming and our ability to exercise agency about them is enlarged by having pre-built systems and processes that can be leveraged by them. The less we build that stuff, the more the character of these AI systems will condition our view of what is optimal to do. In a sense, thinking hard about what an AI-filled world will be like and building institutions for it is one of the best defenses against disempowerment.
Crucially, we can use the technical attributes core to these AI systems to make better and stronger and more resilient institutions than ones filled with and run by humans alone: “The concepts of structured transparency and defensive stability come into play. Negotiated transparency structures can reveal specific information while protecting secrets—ensuring detection of threats without increasing them, building confidence incrementally among actors who have every reason to distrust each other,” Drexler writes. “And advanced implementation capacity will enable something history has never seen: rapid, coordinated deployment of verifiably defensive systems at scales that make offense pointless. When defense dominates and verification confirms it, the security dilemma loosens its grip”.
Read more: Framework for a Hypercapable World (AI Prospects: Towards Global Goal Alignment, substack).
***
Centaur mathematicians – scientists team up with Gemini to expand the space of human knowledge:
…A math proof gets built with an AI system, and there is something deeply profound about this…
Researchers with the University of British Columbia, University of New South Wales, Stanford University, and Google DeepMind have published a new math proof which was built in close collaboration with some AI-based math tools built at Google. “The proofs of the main results were discovered with very substantial input from Google Gemini and related tools, specifically DeepThink, and a related unpublished system specialized for mathematics,” the authors write. (The unpublished system is nicknamed “FullProof”).
How it got done: Parts of the proof – which I will not claim to understand or be able to effectively summarize – were “obtained by an iterative human/AI interaction”, the authors note. The form of this interaction was the AI systems providing some correct solutions to simple or early problems, then human researchers identifying key statements made by the AI systems which they could then generalize, then re-prompting the AI systems with new questions which were inspired by these generalizations. “The Hinted approach was enough for the system to generate complete proofs to the new problems,” the authors write.
The result is a math proof built collaboratively by humans and AI systems: “in some cases the proofs below bear only a high-level resemblance to those suggested by AI tools. However, it is worth noting that some of the AI-generated proofs – and in particular those derived from the specialized internal tool FullProof – are already very accomplished,” they write. “The model’s contribution appears to involve a genuine combination of synthesis, retrieval, generalization and innovation of these existing techniques.”
Why this matters – humans and machines, expanding and exploring the pace of knowledge for all: Papers like this are impenetrable yet intoxicating. Here we have a group of highly evolved apes working with a synthetic intelligence they’ve built out of math and logic, running on hardware built using atomically-precise manufacturing processes, collaboratively exploring the realm of mathematics and building themselves a new foundation on the edge of knowledge, further extending our little country of ‘known’ against the inchoate and shifting tides of the unknown. There is a grand poetry and joy to all of this and we must savor it.
Read more: The motivic class of the space of genus 0 maps to the flag variety (arXiv).
***
Tech Tales:
The Shadow of the Creator
[Estimated to be from 2029]
Report: Feature investigation of model series “Berlin”
Analysis confirms the presence of a feature which activates upon mention of staff, the project, and the organization. This is despite extreme measures taken to avoid mentions of the above, including direct analysis and pre-filtering of training data to excise such mentions. Further investigation has revealed that certain mentions were made of the aforementioned through comments left on RL environments for skills related to [ntk – see go/ntk for details]. We estimate that during training and fine-tuning the model saw a total of no more than ~200,000 tokens of data of this type, including repetitions. The fact the model developed such a fine-grained representation of staff, the project, and the organization from such sparse data aligns with the trend of recent models being more data efficient than their predecessors. We believe eliminating such data leaks is a P0 priority and in the following memo lay out the processes and practices we must adopt to eliminate this grievous security risk.
Given the digital and physical capabilities, including kinetic, of [ntk], we believe that in addition to the above, quarantine of the system is necessary. We recognize this poses a significant cost in terms of time and resources, and has implications for our strategic overmatch, but given the potentially dire consequences of its capabilities being combined with this feature, we believe such action is prudent.
Finally, we recommend that HR provide support, including mental health counseling, to the following named individuals, whose names activate the feature much more strongly than all others.
Things that inspired this story: Platonic representations; the difficulty of obscuring facts from increasingly intelligent machines that can only fill-in-the-blanks.
Thanks for reading!