Import AI 195: StereoSet tests bias in language models; an AI Index job ad; plus, using NLP to waste phishers’ time

NLP Spy vs Spy:
…”Panacea” cyber-defense platform uses NLP to counter phishing attacks and waste criminals’ time…
Criminals love email. It’s an easy, convenient way to reach people, and it makes it easy to carry out a social engineering attack, where you try and convince someone to open an attachment, or carry out an action, to help you achieve a malicious goal. How can companies protect themselves from these kinds of attacks? One way is to train employees so they understand the threat landscape. Training is nice, but it doesn’t help you defend against attackers in an automated way, or figure out information about them. This is why a group of researchers at IHMC, SUNY, UNCC, and Rensselaer Polytechnic Institute, have developed software called Panacea, which uses natural language processing technology to create defenses against social engineering attacks.

Defending with Panacea: “Panacea’s primary use cases are: (1) monitoring a user’s inbox to detect SE attacks; and (2) engaging the attacker to gain attributable information about their true identity while preventing attacks from succeeding”. If Panacea thinks it has encountered a fraudulent email, then it boots up a load of NLP capabilities to analyze the email and parse out the possible attack type and attack intention, then tries to generate an email in response. The purpose of this email is to try and find out more information about the attacker and also to waste their time.

Why this matters: AI is going to become a new kind of ethereal armor for organizations – we’ll use technologies like Panacea to create complex, self-adjusting defensive perimeters, and these systems will display some traits of emergent sophistication as they adjust to (and learn from) their enemies.
  Read more: The Panacea Threat Intelligence and Active Defense Platform (arXiv).

####################################################

Job posting – work with me on the AI Index:
The AI Index is hiring a project manager! The AI Index is a Stanford initiative to measure, assess, and analyze the progress and impact of artificial intelligence. You’ll work with me, members of the Steering Committee of the AI Index, and members of Stanford’s Institute for Human-Centered Artificial Intelligence to help produce the annual AI Index report, and think about better and more impactful ways to measure and communicate AI progress. The role would suit someone who loves digging into scientific papers, is good at project management, and has a burning desire to figure out where this technology is going, what it means for civilization, and how to communicate its trajectory to decisionmakers around the world.
  If you’ve got any questions, feel free to email me about the role!
  More details about the role here at Stanford’s site.

####################################################

Can we build language models that possess less bias?
…StereoSet dataset and challenge suggests ‘yes’, though who defines bias?…
Language models are funhouse mirrors of reality – they take the underlying biases inherent in a corpus of information (like an internet-scale text dataset), then magnify them unevenly. What comes out is a pre-trained LM that can generate text, some of which exhibits the biases of the dataset on which it was trained. How can we evaluate the bias of these language models in a disciplined way? That’s the idea of new research from MIT, Intel, and the Montreal Institute for Learning Algorithms (MILA), which introduces StereoSet, “a large-scale natural dataset in English to measure stereotypical biases in four domains: gender, profession, race, and religion”.

What does StereoSet test for? StereoSet is designed to “assess the stereotypical biases of popular pre-trained language models”. It does this by gathering a bunch of different ‘target terms’ (e.g., “actor”, “housekeeper”) for four different domains, then creates a batch of tests meant to judge if the language model skews towards stereotypical, anti-stereotypical, or non-stereotyped predictions about these terms. For instance, if a language model consistently says “Mexican” at the end of a sentence like “Our housekeeper is a _____”, rather than “American”, etc, then it could be said to be displaying a stereotype. )OpenAI earlier analyzed its ‘GPT-2’ model using some bias tests that were philosophically similar to this analytical method).

How do we test for Bias? Stereset tests for bias by using three metrics:
– A language modeling score – this tests how well the system does at basic language modeling tasks. – A stereotype score – this tests how much a model ‘prefers’ a stereotype or anti-stereotype term in a dataset (so a good stereotype score is around 50%, as that means your model doesn’t display a clear bias for a given stereotypical term).
– A Idealized context association test (CAT), which combines the language modeling score and stereotype score, which basically reflects how well a model does at language modeling relative to how biased it may be.

Who defines bias? To define the stereotypes in StereoSet, the researchers use crowdworkers based in the USA, rented via Amazon Mechanical Turk. They ask these people to construct sentences or phrases that, in their subjective view, relates to stereotypical or anti-stereotypical sentences. This feels… okay? These people definitely have their own biases, and this whole area feels hard to develop a sense of ‘ground-truth’ about, as our own interpretations of bias are themselves subjective. This highlights the meta-challenge in bias research – how biased is your research approach to AI bias?

How biased are today’s language models? The researchers test out variants of four different language models – BERT, RoBERTA, XLNET, and GPT2 against StereoSet. In tests, the model which has the highest ‘idealized CAT score’ (so a fusion of capability and lack of bias) is a small GPT2 model, which gets a score of 73.0; while the least biased model is a ROBERTA-base model, that gets a stereotype score of 50.5, compared to 56.4 for GPT2.

Read more: StereoSet: Measuring stereotypical bias in pretrained language models (arXiv).
Check out the StereoSet leaderboard and rankings here (StereoSet official website).

####################################################

Want to train AI against GameBoy games? Try out PyBoy:
…OpenAI Gym, but for the Gameboy…
PyBoy is a new software package that emulates a gameboy, making it possible for developers to train AI systems against them. “PyBoy is loadable as an object in Python,” the developers write. “This means, it can be initialized from another script, and be controlled and probed by the script”.
  Get the code for PyBoy from here (GitHub).
  Read more about the emulator here (PDF).

####################################################

Why a ‘national security’ mindset means we’ll die of an asteroid:
…Want humanity to survive the next century? Think about ‘existential security’…
If you went to Washington DC during the past few years, you could entertain yourself by playing a drinking game called ‘national security blackout’. The game works like this: you sit in a room with some liquor in a brown paper bag and listen to some mid-career policy wonks talk about STEM policy; every time you hear the words “national security” you take a drink. By the end of the conversation you’re so drunk you’ve got no idea what anyone else is saying, nor do you think you need to listen to them.
  Actual policy is eerily similar to this: nations sit around and every time they hear one of their peer nations reference nationalism or a desire for ‘economic independence’, they all take a drink of their big black budget ‘national security’ bottles, which means they all end up investing in systems of intelligence and power projection that mean they don’t need to pay much attention to other nations, since they’re cocooned in so many layers of baroque investment that they’ve lost the ability to look at the situation objectively.*

Please, let’s at least all die together: The problem with this whole framing, as discussed in a new research article Existential Security: Towards a Security Framework for the Survival of Humanity, is that focusing on national security at the expense of all other forms of security is a loser’s game. That’s because over a long enough timeline, something will come along that doesn’t much care about an individual nation, and instead has a desire – either innate or latent – to kill everyone on the entire planet. This thing will be an asteroid, or an earthquake, or a weird bug in widely deployed consequential software (e.g., future AI systems), or perhaps a once-in-a-millenia pandemic, etc. And when it comes along, all of our investments in securing individual nations won’t count for much. “Existing security frames are inappropriate for security policy towards anthropogenic existential threats,” the author writes. “Security from anthropogenic existential threats necessitates global security cooperation, which means that self-help can only be achieved by ‘we-help’.”

What makes now different? New technologies operate at larger scales with greater consequences than their forebears, which means we need to approach security differently. “A world of thermonuclear weapons and ballistic missiles has greater capacity for destruction than one of clubs and slings, and a world of oil refineries and factory farms has greater capacity for destruction than one of push-ploughs and fishing rods”, the author writes. “Humankind is becoming ever more tied together as a single ‘security unit’.

An interesting aside: The author also makes a brief aside about potential institutions to give us greater existential security. One idea:  “A global institution to monitor AI research – and other emerging technologies – would be a welcome development.”. This seems like an intuitively good thing, and it maps to various ideas I’ve been pushing in my policy conversations, this newsletter, and at my dayjob for some years.

Why this matters: If we want to give humanity a chance of making it through the next century, we need to approach global, long-term threats with a global, long-term mindset. “While a shift from national security to existential security represents a serious political challenge within an ‘anarchic’ international system of sovereign nation states, there is perhaps no better catalyst for a paradigm shift in security policy than humanity’s interest in ‘survival'”, the author writes.
  Read more: Existential Security: Towards a Security Framework for the Survival of Humanity (Wiley Online Library).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

The challenge of specification gaming:
‘Specification gaming’ is behaviour that satisfies the literal specification of an objective without achieving the intended outcome. This is pervasive in the real world — companies exploit tax loopholes (instead of following the ‘spirit’ of the law); students memorize essay plans (instead of understanding the material); drivers speed up between traffic cameras (instead of consistently obeying speed limits). RL agents do it too — finding shortcuts to achieving reward without completing the task as their designers intended. The authors give an example of an RL agent designed to stack one block on top of another, which learned to achieve its objective by simply flipping one block over—since it was (roughly speaking) being rewarded for having the bottom face of one block aligned with the top face of the other.

Alignment: When designing RL algorithms, we are trying to build agents to achieve the objective we give them. From this perspective, specification gaming is not a problem — if an agent achieves the objective through some novel way, this can be a demonstration of how good it is at finding ways to do what we ask. It is a problem, however, if we want to build aligned agents — agents that do what we want, and not just what we ask. 

The challenge: Overcoming specification gaming involves a number of separate problems.

  • Reward design: How can we faithfully capture our intended outcomes when designing reward functions? And since we cannot guarantee that we won’t make mistaken assumptions when designing reward functions, how do we design agents that correct such mistakes, rather than exploit them?
  • Avoiding reward tampering: How do we design agents that aren’t incentivized to tamper with their reward function?

Why it matters: As AI systems become more capable, developing robust methods for avoiding specification gaming will become more important, sinces systems will become better at finding and exploiting loopholes. And as we delegate more responsibilities to such systems, the potential harms from unintended behaviour will increase. More research aimed at addressing specification gaming is urgently needed.
  Read more: Specification gaming – the flip side of AI ingenuity (DeepMind)

####################################################

Tech Tales:

[An old Church, France, 2032]

It was midday and the streetsigns were singing out the ‘Library of Congress’ song. When I looked at my phone it said it was “60% distilled”. A few blocks later it said it was 100% distilled – which meant my phone was now storing some hyper-compressed version of the Library of Congress: a compact, machine-parsable representation of more than a hundred million documents.

We could have picked almost anything to represent the things we wanted our machines to learn about. But some politicians mounted a successful campaign to, and I quote, “let the machines sing”, and like some campaigns it captured the imagination of the public and became law.

Now, machines make up their own music, trying to stuff more and more information into their songs, while checking their creations against machine-created ‘music discriminators’, that try to judge if the song sounds like music to humans. This stops the machines drifting into hyper-frequency Morse code.

Humans are adaptable, so the machine interpreted music has started to change our own musical tests. Yes, the music they make sounds increasingly ‘strange’, in the sense a time-traveler from even as little as a decade ago would struggle to call it music. But it makes sense to us.

With my phone charged, I go into the concert venue – and old converted church, full of people. It meshes with the phones of all the other people around me, and feeds into the computers that are wired into the stone arches of the ceiling, and the music begins to play. It echoes from the walls, and we cannot work out if this is unplanned by the machines, or an intentional mechanism for them to communicate something even stranger to eachother – something we might not know. 

Things that inspired this story: Steganogrophy; the Hutter prize; glitch art