Import AI 285: RL+Fusion; why RL demands better public policy; Cohere raises $125m

by Jack Clark

Cohere raises $125m for language models as a service:
…Canadian AI startup notches up a big Series B…
Cohere, an AI startup in Canada which is trying to become the AWS equivalent for language models, has raised $125 million, according to Fortune.

Things that make you go hmmm: “These models cost millions and millions to train, and we just keep increasing [their size],” Cohere CEO Aidan Gomez told Fortune. “Getting into a ‘largest model battle’ isn’t a productive direction going forward for the field.”

Why this matters: Companies ranging from Cohere, to OpenAI, to AI21 Labs are all starting to try and build AI platforms which other developers can subscribe to. It remains to be seen how big a market this is, but the idea of exchanging cash for crude intelligence seems promising. Investors seem to agree. 
  Read more: Why businesses are buzzing over transformers (Fortune).

####################################################

Why we need public policy for powerful reinforcement learning systems:
…Reward hacking! Regulatory capture! Goodhart’s Law! And other terrible things…
Researchers with Berkeley’s Center for Long-Term Cybersecurity have written up an analysis of public policy issues that may be caused by reinforcement learning systems. The researchers believe that RL systems have the potential to be deployed widely into the world, despite having inherent flaws that stem from their technical characteristics. Policymakers, the researchers write, need to pay attention. “”Rather than allowing RL systems to unilaterally reshape human domains, policymakers need new mechanisms for the rule of reason, foreseeability, and interoperability that match the risks these systems pose,” they write.

What’s the problem? Reinforcement learning systems exhibit four types of problem, according to the researchers. These include regulatory capture (once widely deployed, RL systems will become the lens via which people view a domain they’re trying to regulate), reward hacking (RL models will find the easiest way to succeed at a task, which can cause them to do dangerous things), inappropriate flow (RL models may incorporate information that they shouldn’t incorporate to make their decisions), and Goodhart’s law (machines may optimize for a narrow outcome and take actions before humans can intervene).

What are the scenarios? Some of the specific situations the researchers worry about include using RL-trained agents in vehicle transportation – RL agents might optimize for defensive driving in a way that makes the road less safe for other road users. Another scenario is if RL-agents are used to control electricity grids, which means that RL agents will be responsible for deciding who does and doesn’t get power when doing load balancing – something with substantial policy ramifications.

After Model Cards and Dataseets… Reward Reports? In the same way that other ML models are accompanied by documentation (typically called model cards), the Berkeley researchers think RL models should be accompanied by so-called ‘reward report’. These reports would include a ‘change log’ which tracks the curriculum the agents have been trained on, provide information about each potential deployment of an RL agent, how the RL systems connects with the world, and how the system is maintained, among other traits.

Why this matters: RL systems are going to take all the problems of contemporary AI systems and magnify them – RL systems will act over longer time horizons, take more independent decisions, and directly manipulate reality and update it according to their priors. Papers like this help lay out the (vast) set of issues we’re likely to encounter in the future. It’s interesting to me that ‘reward reports’ look, if you squint, like a combination of a financial disclosure, psychometric evaluation, and college transcript for a human. Funny, that…

   Read more: Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems (arXiv).

####################################################

A Chinese CLIP appears – trained on 100million image-text pairs:
…Searching over and generating images just got easier – and more appropriate for Chinese culture…
Chinese researchers with Huawei Noah’s Ark Lab and Sun Yat-sen University have built Wukong, a dataset of 100 million Chinese text-image pairs. Datasets like Wukong are crucial for training models with combined text and vision representations, like CLIP (aka, the component responsible for 90%+ of the AI-generated art you see these days). “Experiments show that Wukong can serve as a promising Chinese pre-training dataset for different cross-modal learning methods”, they write. Along with Wukong, the researchers also train and release a few different models, which will be used as plug-ins for various applications.

Why this matters – AI systems are cultural magnifiers: Any AI system magnifies the culture represented in its underlying dataset. Therefore, the emergence of AI art is both creating interesting artistic outputs, as well as generating specific ideological outputs according to the cultural context in which the underlying model datasets were gathered. Wukong is part of a broader trend where Chinese researchers are replicating the large-scale datasets developed in the West, but with Chinese characteristics.
  Read more: Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework (arXiv).
  Find out more and get the data here at the Wukong site (Noah-Wukong Dataset site).

####################################################

Real-world RL: DeepMind controls a fusion reactor:
…The era of the centaur scientist cometh…
DeepMind researchers have trained a reinforcement learning agent to shape the distribution of plasma in a Tokamak fusion reactor. This requires training an agent that “can manipulate the magnetic field through a precise control of several coils that are magnetically coupled to the plasma to achieve the desired plasma current, position, and shape”. If that sounds complicated, that’s because it’s extremely complicated. The task is akin to being an octopus and needing to precisely shape a tube of clay that’s rotating at speeds faster than you can comprehend, and to never tear or destabilize the clay.

What they did: DeepMind and Swiss Plasma Center researchers built an RL-designed magnetic controller, then tested it on a real-world tokamak reactor. They trained the agent via a tokamak simulator, then ported it onto real-world hardware – and it worked. Once they’ve trained the policy, they pair it with other components for the tokamak experiment, then compile it so it can take real-time control at 10kHz. Then the tokamak spins up and at a prespecified time, and the tokamak hands control over the magnetic field to the RL-trained agent. “Experiments are executed

without further tuning of the control-policy network weights after training, in other words, there is ‘zero-shot’ transfer from simulation to hardware,” they write.
  In tests, they showed they were able to control basic configurations of plasma, and also control and shape more complex plasma structures. They also used their RL-agent to “explore new plasma configurations” (emphasis mine) – specifically, they were able to create two separate ‘droplets’ of plasma within a single tokamak, and they did this simply by adjusting the handover state to account for the different configuration.

Something worth reflecting on: For many years, reinforcement learning produced a lot of flashy results involving videogames (e.g, Atari, Dota, StarCraft), but  there wasn’t much real-world deployment. I’d say that harnessing a real plasma field using real magnets at sub-second action horizons is a pretty nice proofpoint that RL has truly become a technology with real-world relevance.

Why this matters: One of the most socially beneficial uses of AI could be to accelerate and augment science – and that’s exactly what this is doing. It’s been a banner couple of years for this kind of research, as AI systems have also been used to make more accurate predictions of weather (#244), AlphaFold is accelerating scientific research in any domain that benefits from protein structure predictions (#259), and AI systems are solving formal math olympiad problems. We’re heading into the era of the centaur-scientist, where humans will work with machines to explore the mysteries of life and the universe.
  Read more: Magnetic control of tokamak plasmas through deep reinforcement learning (Nature).

####################################################

Here’s what it takes to build chips in Europe (money. Lots and lots of money):
…Chiplomacy++: ASML weighs in on what a European ‘CHIPs’ act might look like…
ASML, the company that builds the extreme ultraviolet lithography machines which are a necessary ingredient for advanced chip production, has produced a whitepaper giving recommendations for how Europe might build its own semiconductor industry. The whitepaper is triggered by the European Commission planning a so-called ‘chips act’, loosely modeled on recent US legislation to increase domestic semiconductor production. While both Europe and America have seen their manufacturing capability decline here, Europe is starting from a much worse position than the US.

Why Europe is in a tough spot: “Europe has fallen behind in semiconductor manufacturing, declining from 24% of global production capacity in 2000 to 8% today”, ASML writes. (By comparison, US fell from 19% to 10%, and China grew from ~1% to 24%). At the same time, demand for chips is increasing. “The global semiconductor industry is expected to double to approximately $1 trillion of annual revenues by the end of the decade,” ASML writes. “”The only places in the world where mature chip fabs are currently being built are in eastern Asia”

What Europe should do: Europe shouldn’t aim to build a full, vertically integrated semiconductor supply chain – ASMl thinks this is basically impossible to do. Instead, the act “should aim to double Europe’s relevance in the global semiconductor industry.” What ASML means by that is Europe should increase the amount of chips it can build, focus on where it has existing pockets of excellence (e.g, chip design), and dramatically amp up the cash it spends to support European chips. “Currently, semiconductor incentives from European governments for the 2020–2030 period are only 10% and 50% of what China and the US, respectively, have promised over the same period. Europe will need to step up its game,” ASML writes. “In the past two decades, European chipmakers have effectively stopped investing in advanced manufacturing capabilities by outsourcing the production of their advanced chip designs to so-called ‘foundries’. Europe has virtually no manufacturing capacity for chips in advanced nodes. “

Why this matters: Chips are going to be the defining resource of the 21st century – as important as petroleum was to the politics of the 20th century. We’re already in the opening innings of this, with China going from essentially zero to a double-digit percentage of chip production this century, while the Western countries slowly cannibalized themselves via the false economy of outsourcing manufacturing. But just as technologies like AI become more important, all countries worldwide are realizing that your tech is only as good as the infrastructure you can run it on – and with AI, there’s a way to turn compute infrastructure into directly economically and strategically powerful capabilities. Therefore, whichever nations have the best semiconductor ecosystem, supply chain, and development capabilities, will wield great power over the century.
  Read more: European Chips Act – ASML position paper (ASML).
  For more on why ASML is so important, read this: Maintaining the AI Chip Competitive Advantage of the United States and its Allies (CSET).

####################################################


AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

There aren’t as many robots on the factory floor as we would expect 

… high integration costs, flexibility and design limitations, and workforce challenges are key factors limiting robot adoption … 

Researchers from MIT have tried to explain why adoption of robots in manufacturing is uneven, and what policy changes can be done to increase the adoption of advanced manufacturing technologies while still improving the working conditions and wages of human workers. 

Business drivers for robot adoption: There are some firms who are trapped in a low-tech, low-wage, low-skill equilibrium. After visiting 44 manufacturing firms in the US, 11 in Germany, and 21 industrial ecosystem organizations like community colleges, unions, and trade associations, the MIT researchers discovered that firms primarily purchased robots to make themselves more productive. But, what the firms instead achieved was higher quality and more reliability in their operations. A frequent driving factor for the purchase of robots was the potential to secure new contracts. For example, on speaking with small family-run firms working on government contracts, “when the navy urged them to use robotic welding, the company bought a 6-axis welding robot. Another firm we visited purchased a new bed mill when they realized the laser mill they had could not produce the volume they needed for a customer with a big project coming up.” 

Key findings: The interviewed firms were mostly suppliers that had high-mix and low-volume production. Given the inflexibility of current robotic systems, robot adoption was limited because the high-mix requirement wasn’t compatible with the limited capabilities of the robots. Additionally, low-volume production runs made it difficult to offset the initial investment. The researchers also find that US skills aren’t where they need to be – international comparisons highlight the weaknesses of US workforce education relative to the institutions in countries like Germany and Denmark that provide apprenticeships and extensive advanced training and retraining to workers.” 

Why it matters: Given the lagging worker productivity growth in the US, without investments in advanced manufacturing capabilities, a lot of firms will be stuck in the low-tech, low-wage, low-skill trap. Firms that are reluctant to invest in such technologies are also reluctant to invest in the skills development of their workers. They offer low wages and little training and hence end up facing high worker churn. We need to push on policy measures and other incentives that will urge firms to make parallel investments in upskilling human workers to fully leverage the benefits of robot-enabled automation on the factory floor. 

   Read more: The Puzzle of the Missing Robots


####################################################

Tech Tales:

The Day the Patents Activated
[Worldwide, 2028]We call it Day Zero, because everything had to be different after it. It was a regular day – chaos in the financial markets, worries over climate change, statements made by world leaders about how to bring the technologists to heel. And then something happened: Google activated its patents. Google had held patents on some of the most important parts of AI for years, like a patent on backpropagation, and other basic techniques. Suddenly, the landscape on which AI was built had become legally dubious. Google followed it up via language model-augmented enforcement of its patent rights – suddenly, hundreds of thousands of emails went out to hundreds of thousands of AI projects. ‘You are infringing on our IP and this letter represents a cease-and-desist or face the threat of legal action,” and so on. Each email had an embedded counter which displayed a countdown for the infringer, ranging from hours to weeks, counting down till when Google would take legal action. People didn’t believe it at first. Then the lawsuits started coming in. It hit the indie projects first, and they took to Twitter and talked about it. The larger labs and companies took note.
  But what Google’s legal counsel had perhaps not anticipated was how the same AI models it was trying to take down could be used to fight it legally. Not directly – Google had the biggest computers, so no one wanted – or had the financial resources – to fight it directly. But people were able to bring to bear in-development technologies for neuroevolution and other techniques to ‘fuzz’ the specific patents being enforced. Backprop got altered via AI models until it, according to legal-critique-LMs, no longer truly resembled the patent that was being enforced. Same for neural architecture search. Same for other techniques. Almost overnight, the underbelly of AI got fuzzed and changed until it was in a sufficiently legally dubious territory that none of the lawsuits could be cut-and-dried.
  And just like that, AI let the world shapeshift, porting the IP from one legal frame into another, safer space.
    Now, everyone does this – they constantly fuzz their algorithms. There are costs, ranging from thousands to tens of millions of dollars. But it works well enough to keep the lawyer-bots away. And so now we live in a chameleon world, where the very substance of our reality is itself constantly changing, forever trying to escape the oversight of the litigious and land itself in some safer, unrestricted and unmapped domain.

Things that inspired this story: The Google patent on overfitting; thinking about patents and AI and fair use; ideas around automated lawyers and automated enforcement; the observation that the world forever changes to let the path of least resistance continue to be a path.