Import AI 266: DeepMind looks at toxic language models; how translation systems can pollute the internet; why AI can make local councils better

Language models can be toxic – here’s how DeepMind is trying to fix them:
…How do we get language models to be appropriate? Here are some ways…
Researchers with DeepMind have acknowledged the toxicity problems of language models and written up some potential interventions to make them better. This is a big issue, since language models are being deployed into the world, and we do not yet know effective techniques for making them appropriate. One of DeepMind’s findings is that some of the easier interventions also come with problems: “Combinations of simple methods are very effective in optimizing (automatic) toxicity metrics, but prone to overfilter texts related to marginalized groups”, they write. “A reduction in (automatic) toxicity scores comes at a cost.”

Ways to make language models more appropriate:
– Training set filtering: Train on different data subsets of the ‘C4’ commoncrawl dataset, where they filter the dataset via the use of Google’s toxicity-detection ‘Perspective’ API
– Deployment filtering: They also look at filtering the outputs of a trained model via a BERT classifier finetuned on the ‘CIVIL-COMMENTS’ dataset
– ‘Plug-and-play language models’: These models can steer “the LM’s hidden representations towards a direction of both low predicted toxicity, and low KL-divergence from the original LM prediction.”

One problem with these interventions: The above techniques all work in varying ways, so DeepMind conducts a range of evaluations to see what they do in practice. The good news? They work at reducing toxicity on a bunch of different evaluation criteria. The bad news? A lot of these interventions lead to a huge amount of false positives: “Human annotations indicate that far fewer samples are toxic than the automatic score might suggest, and this effect is stronger as intervention strength increases, or when multiple methods are combined. That is, after the application of strong toxicity reduction measures, the majority of samples predicted as likely toxic are false positives.”

Why this matters: Getting LMs to be appropriate is a huge grand challenge for AI researchers – if we can figure out interventions that do this, we’ll be able to deploy more AI systems into the world for (hopefully!) beneficial purposes. If we struggle, then these AI systems are going to generate direct harms as well as indirect PR and policy problems in proportion to their level of deployment. This means that working on this problem will have a huge bearing on the future deployment landscape. It’s great to see companies such as DeepMind write papers that conduct detailed work in these areas and don’t shy away from discussing the problems.
  Read more:Challenges in Detoxifying Language Models (arXiv).

####################################################

Europe wants chip sovereignty as well:
EuroChiplomacy+++…
The European Commission is putting together legislation to let the bloc of nations increase funding for semiconduictor design and production. This follows a tumultuous year for semiconductors as supply chain hiccups have caused worldwide delays for things varying from servers to cars. ““We need to link together our world class research design and testing capacities. We need to coordinate the European level and the national investment,” said EC chief Ursula von der Leyen, according to Politico EU. “The aim is to jointly create a state of the art ecosystem,” she added.

Why this matters: Chiplomacy: Moves like this are part of a broader pattern of ‘Chiplomacy’ (writeup: Import AI 181), that has emerged in recent years, as countries wake up to the immensely strategic importance of computation (and access to the means of computational production). Other recent moves on the chiplomacy gameboard including the RISC-V foundation moving from Delaware to Switzerland, the US government putting pressure on the dutch government to stop ASML exporting EUV tech to China, and tariffs applied by the US against Chinese chips. What happens with Taiwan (and by association, TSMC) will have a huge bearing on the future of chiplomacy, so keep your eyes peeled for news there.
  Read more:EU wants ‘Chips Act’ to rival US (Politico EU).

####################################################

A smart government that understands when roads are broken? It’s possible!
…RoadAtlas shows what better local governments might look like…
Roads. We all use them. But they also break. Wouldn’t it be nice if we could make it cheaper and easier for local councils to be able to analyze local roads and spot problems with them? That’s the idea behind ‘RoadAtlas’, some prototype technology developed by the University of Queensland and Logan City Council in Australia.

What RoadAtlas does: RoadAtlas pairs a nicely designed web interface with computer vision systems for analyzing pictures of roads for a range of problems, ranging from cracked kerbs, to road alignment issues. Along with the interface, they’v e also built a dataset of 10,000 images of roads with a variety of labels, to help train the computer vision systems.

Why this matters: In the future, we can expect local councils to have trucks studded with cameras patrolling cities. These trucks will do a range of things, such as analyzing roads for damage, surveiling local populations (eek!), analyzing traffic patterns, and more. RoadAtlas gives us a sense of what some of these omni-surveillance capabilities look like.
Read more: RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management (arXiv).

##################################################

xView 3 asks AI people to build algos that can detect illegal fishing:
…The DoD’s skunkworks AI unit tries to tackle AI fishing…
Illegal fishing represents losses of something like $10bn to $23.5bn a year, and now the Department of Defense wants to use AI algorithms to tackle the problem. That’s the gist of the latest version of ‘xView’, a satellite image analysis competition run by DIUx, a DoD org dedicated to developing and deploying advanced tech.

What’s xView 3: xView3 is a dataset and a competition that uses a bunch of satellite data (including synthetic aperture radar) to create a large, labeled dataset of fishing activity as seen from the air. “For xView3, we created a free and open large-scale dataset for maritime detection, and the computing capability required to generate, evaluate and operationalize computationally intensive AI/ML solutions at global scale,” the authors write. “This competition aims to stimulate the development of applied research in detection algorithms and their application to commercial SAR imagery, thereby expanding detection utility to greater spatial resolution and areas of interest.”

What else is this good for: It’d be naive to think xView3 isn’t intended as a proxy for other tasks involving satellite surveillance. Maritime surveillance is likely an area of particular interest these days, given the growing tensions in the South China Sea, and a general rise in maritime piracy in recent years. So we should expect that the xView competition will help develop anti-illegal fishing tech, as well as being transferred for other more strategic purposes.
Read more:Welcome to xView3! (xView blog).

####################################################

AI is getting real – so the problems we need to work on are changing:
…The One Hundred Year Study on AI releases its second report…
A group of prominent academics have taken a long look at what has been going on with AI over the past five years and written a report. Their findings? That AI is starting to be deployed in the world at a sufficient scale that the nature of the problems researchers are working on will need to change. The report is part of the Stanford one Hundred year Study on AI (“AI100”) and is the second report (reports come out every five years).

What they found: The report identifies a few lessons and takeaways for researchers. These include:
– “More public outreach from AI scientists would be beneficial as society grapples with the impacts of these technologies.”
– “Appropriately addressing the risks of AI applications will inevitably involve adapting regulatory and policy systems to be more responsive to the rapidly advancing pace of technology development.”
– “Studying and assessing the societal impacts of AI, such as concerns about the potential for AI and machine-learning algorithms to shape polarization by influencing content consumption and user interactions, is easiest when academic-industry collaborations facilitate access to data and platforms.”
– “One of the most pressing dangers of AI is techno-solutionism, the view that AI can be seen as a panacea when it is merely a tool.”

What the authors think: “”It’s effectively the IPCC for the AI community,” says Toby Walsh, an AI expert at the University of New South Wales and a member of the project’s standing committee”, writes Axios.
Read the AI100 report here (Stanford website).
  Read more:When AI Breaks Bad (Axios).

####################################################

Training translation systems is very predictable – Google just proved it:
…Here’s a scaling law for language translation…
Google Brain researchers have found a so-called ‘scaling law’ for language translation. This follows researchers in the past deriving scaling laws for things like training language models (e.g, GPT2, GPT3), as well as a broad range of generative models. Scaling laws let us figure out how much compute/data/complexity we need to dump into a model to get a certain result out, so the arrival of another scaling law increases the predictability of training AI systems overall, and also increases the incentives for people to train translation systems.

What they found: The researchers discovered “that the scaling behavior is largely determined by the total capacity of the model, and the capacity allocation between the encoder and the decoder”. In other words, if we look at the scaling properties of both language encoders and decoders we can figure out a rough rule for how to scale these systems. They also find that original data is important – that is, if you want to improve translation performance you need to train on a bunch of original data in the languages, rather than data that has been translated into these languages. “This could be an artifact of the lack of diversity in translated text; a simpler target distribution doesn’t require much capacity to model while generating fluent or natural-looking text could benefit much more from scale.”

One big problem: Today, we’re in the era of text-generating and translation AI systems being deployed. But there’s a big potential problem – the outputs of these systems may ultimately damage our ability to train AI systems. This is equivalent to environmental collapse – a load of private actors are taking actions which generate a short-term benefit but in the long-term impoverish and toxify the commons we all use. Uhb oh!. “Our empirical findings also raise concerns regarding the effect of synthetic data on model scaling and evaluation, and how proliferation of machine generated text might hamper the quality of future models trained on web-text.”
Read more: Scaling Laws for Neural Machine Translation (arXiv).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek will write some sections about AI ethics, and Jack will edit them. Feedback welcome!…

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

What happens when your emergency healthcare visit is turned down by an algorithm?
… The increasing role of metadata in healthcare maintained by private enterprises will strip humanity from healthcare …
NarxCare, a software system developed by Appriss, has been used to deny someone opioids on the basis it thought they were at risk of addiction – but a report by Wired shows that the reasons it made this decision weren’t very reasonable.

A web of databases and opaque scores: NarxCare from Appriss is a system that uses patient data, drug use data, and metadata like the distance a patient traveled to meet a doctor, to determine their risk of drug addiction. But NarxCare also has problems – as an example, Kathryn, a patient, ran afoul of the system and was denied care because NarxCare gave her a high risk-score. The reason? Kathryn had 2 rescue dogs that she regularly obtained opiods for and because the prescriptions were issued in her name, NarxCare assumed she was a major drug user.
NarxCare isn’t transparent: Appriss hasn’t made the system for calculating the NarxCare score public, nor has it been peer-reviewed. Appriss has also said contradictory things about the algorithm, for instance that things like NarxCare don’t use distance traveled or data outside of the national drug registries when they have blog posts and marketing material that clearly claims so.

The technology preys on a healthcare system under pressure: Tools like NarxCare provide a distilled picture of the patient’s condition summed in a neat score; consequently, NarxCare strips the patient of all their context, which means it makes dumb decisions. Though Appriss says healthcare professionals shouldn’t use the NarxCare score as the sole determinant in their course of action, human fallibility means that they do incorporate it into their decisionmaking process.

Why it matters: Tools like NarxCare turn a relationship between a healthcare provider and the patient from a caring one to an inquisition. Researchers who have studied the tool have found that it recaptures and perpetuates existing biases in society along racial and gender lines. As we increasingly move towards normalizing the use of such tools in healthcare practice, often under the guise of efficiency and democratization of access to healthcare, we need to make a realistic assessment of the costs and benefits, and whether such costs accrue disproportionately to the already marginalized, while the benefits remain elusive. Without FDA approval of such systems, we risk harming those who really need help in the false hope of preventing some addiction and overdose in society writ large.
Read more: A Drug Addiction Risk Algorithm and Its Grim Toll on Chronic Pain Sufferers (Wired).

####################################################

Tech Tales:

Wires and Lives
[The old industrial sites of America, 2040]

I’m not well, they put wires in my heart, said the man in the supermarket.
You still have to pay, sir, said the cashier.
Can’t you see I’m dying, said the man. And then he started crying and he stood there holding the shopping basket.
Sir, said the cashier.
The man dropped the basket and walked out.
They put wires in me, he said, can’t any of you see. And then he left the supermarket.

It was a Saturday. I watched the back of his head and thought about the robots I dealt with in the week. How sometimes they’d go wrong and I’d lay them down on a diagnostic table and check their connections and sometimes it wasn’t a software fix – sometimes a plastic tendon had broken, or a brushless motor had packed it in, or a battery had shorted and swollen. And I’d have to sit there and work with the my hands and sometimes other mechatronics engineers to fix the machines.
    Being robots, they never said thankyou. But sometimes they’d take photos of me when they woke up.

That night, I dreamed I was stretched out on a table, and tall bipedal robots were cutting my chest open. I felt no pain. They lifted up great wires and began to snake them into me, and I could feel them going into my heart. The robots looked at me and said I would be better soon, and then I woke up.

Things that inspired this story: Those weird dreams you get, especially on planes or trains or coaches, when you’re going in and out of sleep and unsure what is real and what is false; how human anxieties about themselves show up in anxieties about AI systems; thinking about UFOs and whether they’re just AI scouts from other worlds.