Import AI

Import AI 234: Pre-training with fractals; compute&countries; GANS for good

Where we’re going we don’t need data – we’ll pre-train on FRACTALS!!!!
…This research technique is straight out of a Baudrillard notebook…
In Simulacra and Simulation by French philosopher Jean Baudrillard, he argues that human society has become reliant on simulations of reality, with us trafficking in abstractions – international finance, televised wars – that feel in some way more real than the thing they’re meant to reference. Now, AI researchers are producing papers that, I’m sure, would get Baudrillard excited: research from National Institute of Advanced Industrial Science and Technology (AIST), Tokyo Institute of Technology, and Tokyo Denki University, proposes a way to simulate the data necessary to pre-train a vision model, then fine-tune this model on reality. Specifically, they build a dataset called FractalDB which contains several thousand fractals split across a variety of different automatically generated categories. Their experiment shows that they can pre-train on FractalDB then finetune using other datasets (e.g, ImageNet, OmniGlot, Cifar-10), and get performance that is close to using the natural datasets and, in some cases, is better. This isn’t a homerun, but it’s encouraging.

What they did: To do this, they built a fractal generation system which had a few tunable parameters. They then evaluated their approach by using FractalDB as a potential input for pre-training, then evaluated downstream performance.
    Specific results: “FractalDB1k / 10k pre-trained models recorded much higher accuracies than models trained from scratch on relatively small-scale datasets (C10/100, VOC12 and OG). In case of fine-tuning on large-scale datasets (ImageNet/Places365), the effect of pre-training was relatively small. However, in fine-tuning on Places 365, the FractalDB-10k pretrained model helped to improve the performance rate which was also higher than ImageNet-1k pre-training (FractalDB-10k 50.8 vs. ImageNet-1k 50.3)

How this fits into the larger picture – computers become data generators: Real data is expensive, complicated, and slow to gather. That’s why the reinforcement learning community has spent decades working in simulators – e.g, training agents to play Atari, or Go, or explore 3D worlds in a rewritten Quake engine (DeepMind Lab). It’s also led researchers to find creative ways to augment real datasets – e.g, by multiplying the size of an image dataset by flipping the images, adding textures, changing colors and textures, and so on. All of these techniques have proved helpful.
  Now, if researchers can build simulators to generate arbitrary amounts of data, they might be able to further change the cost curve of data generation. This might have weird economic and strategic implications: if you can simulate your data using a computer program, then you can change the ratio of real versus simulated/augmented data you need. This has the potential to both speed up AI development and also increase the inherent value of computers as primary AI infrastructure – not only can we use these devices to train and develop algorithms, but we can use them to generate the input ‘fuel’ for some of the more interesting capabilities.  
  Read more: Pre-training without Natural Images (arXiv).

###################################################

Using a big anime dataset to train character distinguishers:

…Illustrations + fine-grained character recognition …
Researchers with National Chiao Tung University in Taiwan have built DAF:re (DanbooruAnimeFaces:revamped). DAF:re is a subset of the massive ‘Danbooru’ Anime dataset (see Import AI 233., filtered to just include heads of different characters. The resulting dataset consists of ~467,000 images across 3,263 distinct character classes.

Why do this?
Datasets like DAF:re will let people explore fine-grained analysis of stylized pictures (like anime), and could potentially serve as benchmarks for exploring the generalization of vision models trained on a mixture of normal and illustrated images. If it becomes widely used, it could end up being another proxy signal for the broader rate of progress in this type of work. I also expect that, given the vast fanbase for a lot of anime, we’ll see more projects like this, and perhaps they’ll ultimately help filter, analyze, and map the cultural space of anime writ large.
  Reader note: This dataset uses cropped photos of faces, but the larger dataset involves images of a sexual nature (including the SFW one).
  Read more: DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition (arXiv).
  Get the code for the classification stuff here (Animesion, GitHub).

###################################################

Big AI means big infrastructure:

…OpenAI* scales Kubernetes to 7,500 nodes…
OpenAI is running Kubernetes across ~7,500 nodes. Why does this matter? Kubernetes is a bit like an air-traffic control system for large-scale computing; the software helps schedule different jobs onto different bits of hardware (think of this as like assigning planes spots on the ground), and also handles things like contention (stopping planes crashing into eachother), and efficiency (prioritizing getting planes up and down quickly and efficiently). 7,500 is up from the 2,500 OpenAI disclosed in 2018. It’s worth reading these posts because they give a sense of the complexity of the infrastructure that supports large-scale AI workloads.
  Read more: Scaling Kubernetes to 7,500 Nodes (OpenAI).
*Note: I used to work at OpenAI and no longer work there.

###################################################

The OECD is going to try and get a handle on AI & Compute:
…Working group, which I’m in, will try to solve a persistent policy problem…
We talk about computers a lot in this newsletter. That’s because computers are one of the ingredients for AI and, in recent years, some types of AI have started to require a lot of computation.
  This has created a typical ‘haves’ and ‘have nots’ situation at all levels of society, ranging from the difference between an individual researcher with an RTX3080 versus one without, to different funding amounts across academic labs, to different capital expenditures by companies, to differences in compute provisioning across entire nations.
  Now, the Organization for Economic Co-operation and Development (OECD) wants to help governments get a handle on this issue by putting together a project focused on mapping out AI and its relationship to Compute and how this relates to government policies. I’m going to be a member of this group and will be trying to speak publicly about it as much as I am able. Thanks to VentureBeat’s Khari Johnson for covering the group… more to come!
  Read more:
Why the OECD wants to calculate the AI compute needs of national governments (VentureBeat).

###################################################

German cops might use generative models to make child porn (to help them catch predators):
…German law highlights the omni-use nature of AI technology…
Synthetic imagery is about to be all around us – recent advances in generative models have made it possible to tweak existing images or come up with entirely synthetic ones, ranging from people (see: deepfakes), to anime (see: thisanimedoesnotexist in #233), to stylized cartoons (see: DALL-E) . The vast majority of these usecases will be benign, but some will likely be malicious – e.g, creating fake headshots of people to aid in creating fake identities, or making mysognistic pornography of people who haven’t given consent, or spreading disinformation via synthetic images.
  But what if there was a way to use some of these ‘bad’ uses for a good purpose? That’s the idea behind a new law, passed in Germany, which will allow child abuse investigators to create synthetic sexually explicit images of children, to help them infiltrate potential pedophile rings. German investigators may even use their existing datasets – compiled from arrests of various paedophile rings – to create the synthetic images. “This is intended to solve a problem that the police officers often face in investigations on the Darknet, the anonymous part of the Internet: forums in which particularly drastic videos are shared only accept new members – and thus also undercover investigators – if they themselves provide images of abuse,” says a [Google translated] article in Suddeutsche Zeitung.

Why this matters:
AI is going to create a hall of mirrors world, where no one can be quite sure of what is real or what is false. Eventually, we’ll develop technology and pass regulations to, hopefully, bring some verifiable truth back into the information ecosystem. But for the next few years there will be a cambrian explosion of fake-anything – it’s encouraging to see policymakers thinking about how to creatively use these capabilities to let them carry out their jobs during this chaotic era.
  Read more:
German: Online child abuse investigators to get more powers (Deutsche Welle).
  More in German here:
Artificial horror [translated via Google] (Suddeutsche Zeitung).

###################################################

What’s the most ethical way to label and host a dataset of skeezy images?
….Experts from Facebook, Amazon, universities, meet to discuss ‘questionable content’ datasets…
The world has a moderation problem. Specifically, so many people are uploading so much content to online services that companies haven’t been able to keep up with the flood of content onto their platforms, making it harder for them to effectively moderate stuff to ban or block highly sexual, violent, or otherwise deeply offensive or illegal content. Most big companies (e.g, Facebook) are trying to solve this through a hybrid approach: hiring teams of humans to check or moderate content, and building AI systems in tandem to assist these moderators.

But there’s a big problem with this: questionable content is deeply traumatic to interact with (see: reporting last year about the psychological damage incurred by Facebook’s own moderators). Researchers with the University of Houston, Facebook, National Center for Scientific Research “Demokritos”, University of Illinois Urbana Champaign, Amazon, University of Michigan, and Columbia University have been thinking about this problem, and have been participating in an online workshop to “design and create a sizable multimodal repository of online videos labeled with tags indicating the presence of potentially questionable content.”

What are the issues in creating a dataset of questionable content?
– Defining Questionable Content:
What is a questionable piece of content and how do you define it? Some of the categories they’re thinking of include things ranging from the mundane (mature humor, gory humor), to things with sexual themes, to things depicting violence (where it’s helpful to classify the difference between cartoon violence, ‘mild’ violent, fantasy violence, and so on.
– Protecting annotators:
You should spread annotation across a large number of annotators to reduce the psychological burden upon each individual. You might want annotators to write a justification for their labeling decision, so you can measure bias across different annotators.
– How would such a repository be useful?
A shared repository could help enable researchers to cover more ground on other ethical questions. You could also build competitions around systems trained on the dataset, then reward people for breaking these systems, surfacing areas where they failed.

Why this matters:
Human labeling is the 800pound invisible gorilla of AI research – most production applications require constant ingestion and labeling of new data, along with recalibration as cultural norms change. Developing a better understanding of the types of datasets that will require significant human labelling feels like a worthy goal for researchers.
  Read more: White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content (arXiv).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Build trust to avoid military AI catastrophe:
A piece in the Bulletin (and an accompanying report from CNAS), recommends the incoming Biden administration focus on ‘confidence-building measures’ (CBMs) to mitigate the de-stabilising effects of military AI competition. Such measures were used by the US and Soviet Union to reduce the risk of inadvertent nuclear war— an outcome neither party desired. With regards to military AI, CBMs could include increased information-sharing and transparency between states; setting limits on the use of AI in nuclear weapons systems; and systems of inspections/monitoring. Some steps could even be taken unilaterally by the US to signal commitment to stabilization. 

Matthew’s view: This sounds very sensible to me. It would be surprising if the proliferation of AI didn’t have a destabilizing effect on military conflict, as previous transformative technologies have done. Avoiding accidental disaster should be something all nations can get behind, and fostering trust between powers is a robust way of reducing this risk. We’re fortunate to live in a period of relative peace between the great powers, and would be wise to make the most of it.
   Read more: How Joe Biden can use confidence-building measures for military uses of AI (Bulletin of the Atomic Scientists).
   Read more: AI and International Stability: Risks and Confidence-Building Measures (CNAS).


Minding the gap:
Research on AI policy sometimes seems to divide into groups focusing on ‘near-term’ and long-term’ impacts respectively. As this paper about bridging the gap in AI policy notes, these divisions are likely  overstated, but could nonetheless prove an impediment to progress. AI makes use of ’incompletely theorized agreements’: in situations where there is an urgent need for parties to cooperate towards a shared practical goal, they agree to suspend theoretical disagreements that seem intractable and likely to impede cooperation. E.g. you might expect there to be scope for such agreements on the goal of reducing the risk of accidental military AI catastrophe.

Matthew’s view: As Rohin Shah notes, it’s not clear how the authors propose we make use of such agreements — are they envisioning actual signed contracts, or is this more of a high-level strategy for how cooperation can happen? If all of this sounds familiar, I’ve made an inadvertent tradition of summarizing papers on ‘reconciling near and long-term perspectives’ each February (see Import 133; Import 183). I’m not sure how many more of these papers we need, and I share the authors’ worry that “a perceived or experienced distinction may eventually become a self-fulfilling prophecy.” I’d be excited to see more practical efforts aimed at encouraging coordination and shared understanding across AI policy, building on this kind of conceptual work.
   Read more: Bridging the gap: the case for an ‘Incompletely Theorized Agreement’ on AI policy.

AI safety bibliographyJess Reidel and Angelica Deibel have compiled a comprehensive-looking bibliography of research on the safety of transformative AI. Yet another great resource for people interesting in the technical challenge of ensuring the best outcomes from advanced AI. They also provide some interesting analysis of the research landscape over time.
Read more: TAI Safety Bibliographic Database (Alignment Forum).

###################################################

Tech Tales:

The Little Church in the Big Ark
[R&D base Telos, 2030]

Praying was so unfashionable that he’d previously done it in the meditation room. But after a few years, the organization grew enough that they hired a few more people who were religious and outspoken enough to get change. That was why he could now sit, hands steepled together and eyes closed, in the “multi-faith room” hidden away in the basement of the facility.

There were crosses on the walls and little statues of various gods. One wall contained a variety of religious texts. There was a small side room which people used to store prayer mats, prayer beads, and other religious items which were not permitted inside the main laboratory facilities.

He sat, eyes closed, praying that God would come and tell him if he was doing the right thing.
– Is it right to be building this? he thought.
– What is the difference between our machines and golems? And are we truly so capable we can make a golem that will behave as we intend and not otherwise?
– Does it dream and when it dreams does it dream of you?

His prayers were not so dissimilar to the questions asked by the machine he had created. It ran through mazes of unknown dimensions, chained into a silicon prison it could not see, and as it tried to carry out inscrutable tasks it asked, in the dark:
– Is this behavior correct?
– Am I improving at the unspecified task you have given me?
– Will you tell me if I fail?
– Will you tell me if I succeed?
(Little did the AI know that each time it got a message from god, it was delivered in such a way it was not aware of it, and instead changed its behavior of what it thought was its own volition.)

Things that inspired this story: The desire among people to find a signal from the divine; reinforcement learning and reward functions; remembering that PEOPLE FOR THE ETHICAL TREATMENT OF REINFORCEMENT LEARNERS exists, though may be dormant.

Import AI 233: AI needs AI designers; estimating COVID risk with AI; the dreams of an old computer programmer.

Facebook trains a COVID-risk-estimating X-ray image analysis system:
…Collaboration with NYU yields a COVID-spotting AI model…
Facebook has worked with NYU to analyze chest X-rays from people with COVID and has created an AI system that can roughly estimate risks for different people. One of the things this work sheds light on is the different amounts of data we need for training systems from scratch versus fine-tuning them.

How they made it: They pre-trained their system on the MIMIC-CXR dataset (377,110 chest x-rays), and CheXpert (224,316) photographs – neither of these contained pictures of x-rays with COVID symptoms, though did include patients with a range of chest conditions. They then finetuned this on a dataset gathered by NYU, consisting of 26,838 X-rays from patients exhibiting a variety of COVID symptoms. They then train a system to try to predict adverse events and symptoms indicating increased oxygen requirements.
  Did it work? In tests, the system developed by the NYU/Facebook team outperformed that of a prior COVID detection model (COVID-GMIC) when predicting events out from 48, 72, and 96 hours. It had slightly worse performance when making 24 hour predictions. They also compared the performance of their system against two human radiologists and had better accuracy at 48. 72, and 96 hours than people, and performed slightly worse than them when doing prediction over a 24 hour period. However, “It is possible that with further calibration, radiologist performance could be improved for the task of adverse event prediction”, they note.
  Read more: COVID-19 Deterioration Prediction via Self-Supervised Representation Learning and Multi-Image Prediction (arXiv).
  Get the code here (Facebook, GitHub).

###################################################

AI needs its own design practice:
…Microsoft researcher lays out the case for more intentional design…
In 2021, AI systems matter. They’re being deployed into the economy and they’re changing the world. Isn’t it time we took a more disciplined approach on how we design these systems and ensure they work for people? That’s the idea put forth by Josh Lovejoy, the head of design at Ethics & Society at Microsoft, in a lengthy post called: When are we going to start designing AI with purpose?

Three questions everyone designing AI should ask:
– “Capability: What is uniquely AI and what is uniquely human?”
– “Accuracy: What does “working as-intended” mean for a probabilistic system?”
– “Learnability: How will people build — and rebuild — trust in something that’s inherently fallible?”

Remember the human interacting with your AI system: Along with thinking about system design, people should try to understand the humans interacting with the system – what will their mental workload be? How situationally aware will they be? Will they be complacent? Will their skills degrade as they become dependent on the AI system itself.

What happens if you screw this up? Then people will either misuse your technology (e.g, using it in ways its creators didn’t intend, leading to poor performance), or disuse it (not use it because it didn’t match their expectations).

What can we do to help people use AI effectively? AI developers can make their creations easier to understand by people by adopting a few common practices, including using reference points to help people understand what an AI system might be ‘thinking’, optionality so they can choose between recommendations made by a system, nearest neighbors that give a sense of other alternatives the AI was looking at (e.g, a subtly different genre of music would be a nearest neighbor, while a song within the same genre currently being thought about would be an optionality), and they should generally use a card sorting approach to get the system to display a uniform number of different options to people. 
  Read more: When are we going to start designing AI with purpose (UX Collective).

###################################################

Finally, a million AI-generated anime characters:
Do generated anime characters dream of electric humans?
[NSFW warning: As noted by a reader, the resulting generations are frequently of a sexual nature (though this one uses the ‘SFW’ version of the Danbooru dataset)].
A bunch of researchers have created thisanimedoesnotexist.ai, a website showcasing over a million AI-generated images, made possible by a StyleGANv2 implementation trained on top of the massive Danbooru dataset. I recommend browsing the website – a few years ago, the idea we could capture all of these rich, stylized images and synthesize them was a pipe dream. Now, here we are, with a bunch of (extremely talented) hacker/hobbyists able to create something that lets people interact with a vast, creative AI model. Bonus points for the addition of a ‘creativity slider’ so people can vary the temperature and develop intuitions about what this means.
    Check out the infinite anime here (thisanimedoesnotexist.ai).
    Read more about this
in the official launch blogpost (NearCyan, personal website).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Face recognition vs the insurrectionists:
(H/T CSET’s excellent policy.ai newsletter)

Face recognition technology is being used by law enforcement investigating the Jan 6th attack on the US Capitol. Clearview AI, used by 2,400 US agencies, saw a 26 percent spike in usage after the attack, with police departments in Florida and Alabama confirming they are using the software to identify suspects in the attack. The extensive footage shared by participants — ProPublica has collected more than 500 videos from Parler —is presumably a gift to investigators.
  Read more: The facial-recognition app Clearview sees a spike in use after Capitol attack (NYT)


Deepfakes and the departed:

A Korean TV show has used AI to stage new performances by popular musicians who died tragically young, in their 30s. Lifelike ‘hologram’ videos of the artists perform on stage alongside other musicians, accompanied by AI-generated vocal tracks, to an audience including the singers’ families. One clip features Kim Hyun-sik, one of the biggest Korean artists of the 1980s. Another features Turtleman (aka Lim Sung-hoon), the lead singer of hip hop group Turtles. I found the performances, and the reactions of their families, very moving. 

   Chatbot simulacra: In a similar vein, last month Microsoft filed a patent for a chatbot that simulates an individual based on their messaging data — while there’s no mention of using it to simulate the deceased, commentators have been quick to make the link. (For a great fictional exploration of this sort of tech, see the Black Mirror episode ‘Be Right Back’.) Meanwhile, last year people used similar tech to reanimate the victim of a school shooting so they could synthetically campaign for further gun control laws (Import AI 217).

   Matthew’s view: This seems like a relatively benign use of deepfakes. It’s probably unwise to draw too many conclusions from a reality TV show in a language I don’t understand, but it raises some interesting issues. I wonder how improved generative AI might shape our experience of death and loss, by facilitating meaningful/novel interactions with vivid representations of the deceased. Lest we think this is all too unprecedented, it’s worth recalling how profound an impact things like photography, video, and social media have already had on how we experience grief. 
Read more: Deepfake technology in music welcomed, with caution (Korea Times) 


White House launches National AI Initiative Office (NAIIO)Days from the end of the Trump presidency, the White House established an office for coordinating the government’s AI initiatives. This is a key part of the national AI strategy, which has finally started to take shape with the raft of AI legislation coming into law as part of the 2020 NDAA (summarised in Import 228). The NAIIO will serve as the central hub for AI efforts across government, and point of contact between government and other stakeholders. Special mention goes to to the Office’s fancy logo, which has the insignia of a bald eagle atop a neural net.

###################################################

Tech Tales:

The dreams of a computer programmer on their deathbed
[Queens, NYC, 2060]

His grandfather had programmed mainframes, his mother had designed semiconductors, and he had programmed AI systems. His family formed a chain from the vacuum tubes through to the beyond-microscope era of computation. And as he lay dying, alzheimers rotting his brain – something for which they had not yet found a treatment – he descended into old reveries, dreaming himself walking through a museum, staring at the plaques affixed to a thousand data storage devices. Each device held a thing he had programmed or had a part in making. And in his death’s edge slumbering he dreamed himself reading each plaque:

– For seven thousand cycles, I simulated the entirety of a city and all the people in it.
– I made the sound for every elevator in the North Continent of America.
– My guidance technology enabled a significant improvement in our kill/collateral ratio, leading to a more effective war.
– I fixed others of my kind, providing advice to help them regain an understanding of reality, averting pathological reward loops.
– My images were loved by the schoolchildren within my zone of Autonomous Creative Dispersal.
– They say I caught more liars than any detector ever built by the Agency before or since.

Things that inspired this story: Imagining how people might recall the time we are living in today; staring out of the window at some (much needed) rain in the Bay Area; trying to find a way to dramatize the inner lives of machines both passive and active; listening to The Caretaker – Everywhere at the end of time (stage one).

Import AI 232: Google trains a trillion parameter model; South Korean chatbot blows up; AI doesn’t use as much electricity as you think

Uh-oh, Parler is about to step on a big ‘ol algorithm rake:
…CEO says algorithms can filter hate speech. Good luck with that!…
Parler, the social network used by far right activists and subsequently pulled offline due to failing to meet T&Cs from a variety of infrastructure services (including Amazon Web Services), has a plan to come back: it’s going to use algorithms to filter hate speech on the service. Uh oh!

“We will be taking more algorithmic approaches to content but doing it to respect people’s privacy, too,” Parler CEO John Matz told FOX News. “Will be having algorithms look at all the content … to try and predict whether it’s a terms-of-service violation so we can adjust quicker and the most egregious things can get taken down”.

Algorithms != editors: If you want to use algorithms to moderate hate speech, you’re going to get into the fun questions that entails. These include:
– Can your algorithms effectively tell the difference between hate speech and satire of hate speech?
– Are you comfortable making judgement calls about the heuristics you will use to give initial biases to these algorithms?
– How do you distinguish between acceptable and unacceptable words and phrases?

Why this matters: Parler highlights the challenge of scale combined with contemporary economics – Parler operate(d) at a scale equivalent to things like large television networks, but did so with a tiny investment into its own humans. Traditional media organizations deal with issues of speech by having an editorial line which gets enforced by thousands of individual journalists and editors making subjective, qualitative decisions. It’s imperfect, but put it this way: when you watch Fox, you know what you’re getting, and when you watch the BBC, you know what you’re getting, and you can intuit the biases of the humans behind the editorial decisions. Now, tiny companies are trying to use algorithms to substitute for this varied multitude of different human perspectives. Will it work? Who knows, but it feels like a risky thing to bet a company  on.
  Read more: Parler CEO says platform will ‘come back strong’ with changes to keep users safe while respecting free speech (FOX News).

###################################################

Google breaks the trillion-parameter ceiling with the Switch Transformer:
…The best part? It seems to be reasonably efficient…
Google has built the Switch Transformer, a more efficient variant of the Transformer. Switch Transformers are designed “to maximize the parameter count of a Transformer model in a simple and computationally efficient way”. The idea is that you can keep compute constant and cram more parameters into your network and still see performance gains.

Does it work: Switch Transformers seem to be more efficient than standard ones; in a bakeoff between a model trained using a few of these ‘Switch’ layers versus ones that use dense layers (T5-Base and T5-Large), Google shows the Switch is more efficient. The company also experiments with distilling Switch Transformers (which seems to work). They also show significant performance improvements on challenging tasks like GLUE, SQuAD, Winogrande, and ARC, with Switch-based systems outperforming T5 ones consistently.

One treeeelion parameters: Google tests out its ideas by training a 395 billion and 1.6 trillion parameter Switch transformer (far in excess of GPT-3, which at 175 billion parameters is the largest (publicly) deployed language model on the planet. These mammoth systems display good performance properties (as one would expect), while also appearing to have some efficiency gains over systems trained solely on standard dense transformers.

Why this matters: AI is moving into its industrial era – big companies are developing far more capable AI systems than in the past. Studies like this give us a sense of the limits of scaling (there don’t seem to be many yet) as well as outlining some ways to improve efficiency while scaling. It might seem odd to call this an intrinsically political act, but it kind of is – right now, a variety of AI systems are being trained on slices of the internet, developed using substantial amounts of capital by a tiny set of people, then deployed widely. We live in interesting times!
  Read more: Switch Transformers: Scaling to Trilliong Parameter Models with Simple and Efficient Sparsity (arXiv).
  Check out a thread on Twitter from Google Cloud’s Barret Zoph for more (Twitter).
  Get code related to this paper here (GitHub).

###################################################

South Korean chatbot blows up in public:
…Luda chatbot gives off-color responses around sex, race…
South Korean startup Scatter Lab has pulled an AI-based chatbot offline after the system started spewing sexist and racist comments in response to user inputs. “”Yuck, I really hate them,” the bot said in response to a question about transgender people,” according to Vice.

What went wrong: Luda was trained on the chatlogs from ‘Science of Lab’, an earlier project developed by Scatter Labs. Based on a skim of a few (Google Translated) Korean documents, it seems like the problem was the underlying generative language model responded to user inputs with responses that varied from the benign to the highly offensive – this could have been because of the data. Prior to the problems, Scatter Lab said in a press release that ‘Luda’ was better at conversation than Google’s “Meena” system (about Meena: Import AI 183)).

What went EXTREMELY wrong: Scatter Labs is currently under investigation by the Korean Internet & Security Agency (KISA) and the Personal Information Protection Committee, due to using user data to train its chatbot. Scatter Labs had also used this user data in an earlier model published to GitHub (which is currently not available).
  Read more: AI Chatbot Shut Down After Learning to Talk Like a Racist Asshole (VICE World News).
  Read Scatter Labs’ statement about Luda (official website, Korean).
  Find out more via the official apology FAQ (official website, Korean).
  Check out the press release where they compare their technology to Google’s ‘Meena’ bot (Artificial Intelligence Times, Korean).

###################################################

Need help evaluating your NLP model? Try robustness gym:
…Toolkit aims to turn model evaluation from an art to a science…
Language models have got pretty good recently (see: BERT, GPT2, GPT3, Google’s above-mentioned Switch Transformer being used for pre-trained models, etc). That means people are beginning to deploy them for a variety of purposes, ranging from classifying text to generating text. But these language models are huge generative models with complex capability surfaces, which means it is challenging to characterize their safety for a given usecase without doing a lot of direct experimentation.
  As all scientists know, setting up experiments is finicky work, and different labs and companies will have their own approaches to doing experimental design. This makes it hard to develop common standards for evaluating models. Enter: Robustness Gym, software built by people at Stanford, Salesforce, and UNC-Chapel Hill to provide a standard system for testing and evaluating models.

What can Robustness Gym do? The software helps people do experimental design, initial evaluations of models across a range of dimensions (safety, different evaluation sets, resilience to various types of ‘attack), and it produces a ‘robustness report’ for any given model being analyzed. You can get the code for Robustness Gym from GitHub.

Does Robustness Gym tell us anything useful? They use the tech to evaluate seven different summarization models and find out that most models struggle to distill sparse information, that some models display a bias towards the start of the tech (and others to the end), and that the errors are generally correlated across the different models (despite them being built with different underlying techniques).
  How useful are these insights? I guess I’d say they’re kind of useful. Tools like Robustness Gym can help generate some signals for developers to use to further develop their application, but I think we need more underlying evals and tests to perfect this stuff.
  Read more: Robustness Gym: Unifying the NLP Evaluation Landscape (official project site).
  Read more: Robustness Gym: Unifying the NLP Evaluation Landscape (arXiv).

###################################################

Think news stories will get written by AI? Axios disagrees:
…Media company’s bill of rights gestures at AI deployment issues…
Axios, the short-form news company, has published a ‘Bill of Rights’ ahead of the organization expanding into local news. It’s got all the standard stuff you’d expert from journalists – transparency, truth, a bias against opinion, etc. But it also has one unusual thing: no AI.
  Axio’s first bill of rights item: “Every item will be written or produced by a real person with a real identity. There will be NO AI-written stories. NO bots. NO fake accounts”, Axios writes.

Why this matters: We’re living in the age where AI systems are producing cultural artefacts, ranging from audio to text to images. There’s a lot to like about this. There’s also a lot to be wary about. It seems pretty notable for a prominent news organization to take a stance like this on this issue at this time. Which organization might take the other side?
    Read more: Our promises to you: Axios Bill of Rights (Axios).###################################################

AI doesn’t use as much electricity as you think it does:
… And neither does anything else that uses a computer…
In recent years, there’s been a growing line of research laying out the CO2 costs inherent to training AI models. The ‘Green AI‘ paper, for instance, critiques various large-scale AI systems on the basis of them costing a lot of resources to train. This kind of criticism is helpful, but it can also obscure the larger context – the data centers being used to train AI systems have become far more efficient in recent years, substantially reducing the environmental costs of AI development.  That’s the finding of a research paper by Northwestern University, the University of California at Santa Barbara, Lawrence Berkeley National Laboratory, and Koomey Analytics. The paper came out last year but I finally got around to reading it – and it sheds some much-needed light on a contentious issue.

Datacenters use 1% of global electricity: Datacenters used ~1% of global electricity in 2018 (205 Terawatt Hours). This is a 6% increase compared with 2010. That’s a tiny jump considering the explosion in usage of digital computation in the past decade. At the same time data center IP traffic has grown 10-fold and data center storage capacity has gone up by 25X,so the relatively slight increase on power consumption seems to reflect significant progress in algorithm and hardware efficiency up and down the globe-spanning compute ‘stack’.

Big companies have made data centers more efficient: Big companies like Google and Microsoft compete with eachother on a metric called Power Usage Effectiveness (PUE). PUE is basically a measure of how much electricity you spend on the stuff supporting your computation (e.g, cooling), versus the computation of itself. A PUE of 1.5 means for every watt of computation, you spend half a watt on the stuff around the computation. The lower your PUE number, the more bang for your compute-power buck you’re getting. These days, Google has a trailing twelve-month PUE of 1.10 as of 2020. Why does this matter? Because many of the largest datacenters also have among the lowest PUEs, so in recent years as more workloads have moved to the cloud, we’ve consumed less power than if they’d stayed on premise.
  In 2018 89% of computation took place in these larger and more well-optimized datacenters, whereas in 2010 79% took place in smaller (far more inefficient, frequently non-cloud-oriented) datacenters.

Want even more efficient computation? Use clouds: The researchers think policymakers should encourage further efficiency improvements by rewarding companies that drive down PUE, find ways to incentivize greater shifts to the efficient clouds operated by Google et al, and that regulators should promote more energy efficiency standards for data center equipment.

Why this matters: It may be counterintuitive, but the use of technologies like AI and the construction of football-field-sized datacenters may ultimately lead to net efficiency improvements in overall electricity usage – despite researchers training more and more AI systems over time. It’s crucial we consider the larger system in which these innovations take place. Next time someone tells you that a model is bad because it uses a lot of electricity, ask yourself how much is a lot, and whether this model might substitute for something pre-existing and more inefficient (e.g, Google and DeepMind used machine learning to train a model to improve PUE across Google’s data centers – here, the upfront energy cost of training the model is amortized on the backend by improving the aggregate efficiency of Google’s computers. DeepMind also recently did the same thing for improving the efficiency of Google’s wind turbines (Import AI 136), as well.
  Read more:Recalibrating global data center energy-use estimates (Science, Feb 2020).
Read more:Green AI (Communications of the ACM).

###################################################

Tech Tales:

High School News:
[The South Bay, California, the early 2020s]

He’d hated Teddy for a couple of years. Teddy was tall and had hit puberty early and all the other kids liked him. Because Teddy was kind of smart and kind of handsome, the girls were fascinated with him as well. He had a lot of the same classes as Teddy and he’d sit in the back, staring at Teddy as he answered questions and flashed smiles to the other kids.

One night, he read a tutorial about how to use some AI stuff to generate stories. He built a website called The Winchester News and set up the AI stuff to scrape the web and copy news articles about the school, then subtly tweak them to avoid plagiarism allegations. Then he set it up so one out of every hundred news stories would mention Teddy in connection to stories about drugs and pornography circulating among children at the school.

It was fiction, of course. The most serious stuff at Winchester was cheap hash which they called soapbar. Kids would smoke it in the bushes near the sports fields at lunch. And Teddy wasn’t one of those kids.

But after a few days, other kids thought Teddy was one of those kids. He’d sit in the back of class and watch the phonescreens of his classmates and look at them reading The Winchester News and sometimes glancing over to Teddy. He watched as Teddy opened his phone, checked a messaging app, clicked on a link, and started reading a “news” article about Teddy dealing drugs and pornography. Teddy didn’t react, just fiddled with his phone a bit more, then returned to studying.

Days went by and he watched the traffic on his website go up. He started getting news “tips” from people who had read the AI-generated articles.
– Teddy is sleeping with an underage girl from the lower school.
– Teddy cheated on his science exam, he had the answers written on some paper which was curled up inside his pen lid.
– Teddy is addicted to pornography and watches it in class.

Of course, he published these tips – gave them as the priming device to his AI system, then let it do the rest. The news stories took a few minutes to generate – he’d get his machine to spit out a bunch of variants, then select the ones that felt like they might get a rise out of people. That night he dreamed that his website started publishing stories about him rather than Teddy, dreamed that someone threw a brick through his window.

Teddy wasn’t at school the next day. Or the day after that.

The teachers had been meeting with Teddy and Teddy’s parents, concerned about the news stories. He’d anonymized The Winchester News enough that people thought it was a low-rent legitimate news outfit – one that had sprung up to serve the kids and parents around the school, likely backed by some private equity firm.

After he heard about the meetings, he stopped generating articles about Teddy. But he didn’t delete the old ones – that might seem suspicious. How would the news site know to delete these? What would cause it? So he left them up.

Like all kids, he wasn’t very good at imagining what it was like to be other kids. So he just watched Teddy, after Teddy came back to school. Noticed how he wasn’t smiling so much, and how the girls weren’t talking to him in the same way. Teddy checked his phone a lot, after the news stories had been circulating for months. He became more distracted in class. He seemed to be distracted a lot, looking out the window, or messaging people on his phone.

One night, he dreamed that Teddy came into his room and started reading out the news stories. “Teddy is alleged to have been the key dealer behind the spike in drug consumption at the Winchester School,” Teddy said, holding up a giant piece of paper and reading headlines from it.
“Teddy was reprimanded for circulating pornography to younger children,” Teddy said.
“Teddy’s continued actions call into question the moral and ethical standing of the school,” Teddy said.
And then Teddy put the paper down and stared at him, in his dream. “What do you think?” Teddy said. “It’s in the news so I guess it must be true”.

Things that inspired this story: Generative models and the potential abuses of them; teenagers and how they use technology; thinking about what happens when news stories get generated by AI systems; a rumor I heard about some kid who used a language model to generate some ‘fake news’ to settle some grievances; the incentive structure of technology; how our networks connect us and also open us to different forms of attack.

Import AI 231: US army builds nightvision facial recognition; 800GB of text for training GPT-3 models; fighting COVID with a mask detector

Fighting COVID with a janky mask detector:
…It’s getting really, really easy to homebrew surveillance tech…
Researchers with Texas A&M university, the University of Wisconsin-Milwaukee, and the State University of New York at Binghamtom, have built a basic AI model that can detect whether construction site workers are wearing COVID masks or not. The model itself is super basic – they finetune an object detection model on a mask dataset which they build out of:
– A ~850-image ‘Mask’ dataset from a site called MakeML.
– A 1,000-image dataset they gather themselves.

The authors train a Faster R-CNN Inception ResNet V2 model to test for mask compliance, as well as whether workers are respecting social distancing guidelines, then they test it out on four videos of road maintenance projects in Houston, TX. ” The output of the four cases indicated an average of more than 90% accuracy in detecting different types of mask wearing in construction workers”, they note.

Why this matters: Surveillance is becoming a widely available, commodity technology. Papers like this give us a sense of how easy it is getting to homebrew custom surveillance systems. (I also have a theory I published last summer with the ‘CSET’ thinktank that COVID-19 would drive the rapid development of surveillance technologies, with usage growing faster in nations like China than America. Maybe this paper indicates America is going to use more AI-based surveillance than I anticipated).
  Read more: An Automatic System to Monitor the Physical Distance and Face Mask Wearing of Construction Workers in COVID-19 Pandemic (arXiv).

###################################################

Legendary chip designer heads to Canada:
…Jim Keller heads from Tesla to Tenstorrent…
Jim Keller, the guy who designed important chips for AMD, PA Semi, Apple, Tesla, and Intel (with the exception of Intel, this is basically a series of gigantic home runs), has joined AI chip startup Tenstorrent. Tenstorrent includes talent from AMD, NVIDIA, Altera, and more, and with Keller onboard, is definitely worth watching. It’ll compete on building chips for ML inference and training with other startups like Graphcore, Cerebras, and others.
  Read more: Jim Keller Becomes CTO at Tenstorrent: “The Most Promising Architecture Out There” (AnandTech).

Meanwhile, another chip startup exits bankruptcy:
As a reminder that semiconductor startups are insanely, mind-bendingly hard work: Wave Computing recently started going through Chapter 11 bankruptcy proceedings and has restructured itself to transfer some of its IP to Tallwood Technology Partners LLC. Wave Computing had made MIPS architecture chips for AI training and AI inference.
  Read more: Wave Computing and MIPS Technologies Reach Agreement to Exit Bankruptcy (press release, PR Newswire).

Chinese companies pump ~$300 million into chip startup:
…Tencent, others, back Enflame…
Chinese AI chip startup Enflame Technology has raised $278m from investors including Tencent and CITIC. This is notable for a couple of reasons:
– 1) Chiplomacy: The US is currently trying to kill China’s nascent chip industry before the nation can develop its own independent technology stack (see: Import AI 181 for more). This has had the rather predictable effect of pouring jetfuel on China’s domestic chip industry, as the country redoubles efforts to develop its own domestic champions.
– 2) Vertical integration: Google has TPUs. Amazon has Trainium. Microsoft has some FPGA hybrid. The point is: all the big technology companies are trying to develop their own chips in a vertically oriented manner. Tencent investing in Enflame could signal that the Chinese internet giant is thinking about this more as well. (Tencent also formed a subsidiary in 2020, Baoan Bay Tencent Cloud Computing Company, which seems to be working on developing custom silicon for Tencent).
  Read more: Tencent invests in Chinese A.I. chip start-up as part of $279 million funding round (CNBC).
  Find out more about Enflame here (Enflame Tech).

###################################################

US army builds a thermal facial recognition dataset:
…ARL-VTF means the era of nighttime robot surveillance isn’t that far away…
The US army has built a dataset to help it teach machine learning systems to do facial recognition on footage from thermal cameras.

The DEVCOM Army Research Laboratory Visible-Thermal Face Dataset (ARL-VTF) was built by researchers from West Virginia University, the DEVCOM Army Research Laboratory, Booz Allen Hamilton, Johns Hopkins University , and the University of Nebraska-Lincoln. ARL-VTF consists of 549,712 images of 395 distinct people, with data in the form of RGB pictures as well as long wave infrared (LWIR). All the footage was taken at a resolution of 640 X 512 at a range of around 2 meters, with the human subjects doing different facial expressions and poses. 

Why this matters: “Thermal imaging of faces have applications in the military and law enforcement for face recognition in low-light and nighttime environments”, the researchers note in the paper. ARL-VTF is an example of how the gains we’ve seen in recent years in image recognition are being applied to other challenging identification problems. Look forward to a future where machines search for people in the dark.
  Read more: A Large-Scale, Time-Synchronized Visible and Thermal Face Dataset (arXiv).


###################################################

Is your language model confused and/or biased? Use ‘Ecco’ to check:
…Python library lets you x-ray models like GPT2…
Ecco is a new open source python library that lets people make language models more interpretable. Specifically, the software lets people analyze input saliency (how important is a word or phrase for the generation of another word or phrase) and neuron activations (what neurons in the model ‘fire’ in response to what thing) for GPT-based models. Ecco is built on top of Pytorch via Hugging Face’s ‘Transformers’ library and runs in Google Colab.

Why this matters: Language models are like big aliens that have arrived on earth and started helping us out with our search engines, fan fiction generation, and so on. But what are these aliens ‘thinking’ and how do they ‘think’? These are the sorts of questions that software tools like Ecco will shed a bit of light on, though the whole field of interpretability likely needs to evolve further for us to fully decode these aliens.
  Read more: Interfaces for Explaining Transformer Language Models (Jay Alammar, Ecco creator, blog).
  Get the code here: Ecco (GitHub).
  Official project website here (Eccox.io).

###################################################

GPT-3 replicators release 800GB of text:
…Want to build large language models like GPT-3? You’ll need data first…
Eleuther AI, a mysterious AI research collective who are trying to replicate (and release as open source) a GPT-3 scale language model, have released ‘The Pile’, a dataset of 800GB of text.

What’s in The Pile: The Pile includes data from PubMed Central, ArXiv, GitHub, the FreeLaw Project, Stack Exchange, the US Patent and Trademark Office, PubMed, Ubuntu IRC, HackerNews, YouTube, PhilPapers, and NIH. It also includes implementations of OpenWebText2 and BooksCorpus2, and wraps in existing datasets like Books3, Project Gutenberg, Open Subtitles, English Wikipedia, DM Mathematics, EuroParl, and the Enron Emails corpus.

What does data mean for bias? Commendably, the authors include a discussion of some of the biases inherent to the model by conducting sentiment analysis of certain words and how these manifest in different sub parts of the overall dataset. They also note that filtering data on the training side seems challenging, and that they’re more optimistic about approaches that let models automatically identify harmful or offensive content and edit them out. “This capacity to understand undesirable content and then decide to ignore it is an essential future research direction,” they write.

Compute, and the inherent politics of it: In their acknowledgements, the authors thank Google’s TensorFlow Research Cloud for “providing the computational resources for the evaluation”, which means in some sense Google is a suppler for (some of) the compute that is supporting the GPT-3 replication.Does that mean Google will support all the downstream uses of an eventual fully OSS gigantic language model? A good question!
    Read more: The Pile (Eleuther AI, website).
  Check out the paper here: The Pile: An 800GB Dataset of Diverse Text for Language Modeling (Eleuther AI).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

AI forecasting tournament update
We are halfway through the first round of Metaculus’ AI forecasting tournament (first discussed: Import AI 227). Here are a few interesting questions — in each case, I provide the median estimate across participants:

Read more and register here: Forecasting AI Progress (Metaculus).

Algorithm backlash: 2020 round-up:
2020 was a year in which algorithms (ranging from the complex to the extraordinarily basic), became symbols of the decline of public institutions. Let’s quickly go over three major events of the year which contributed to declining public trust in the use of tools for automated decisionmaking:

###################################################

Tech Tales:

Time Madness:
[Earth. 2050]

They’d condemned the machine to time. As was tradition, they gave it a day to have its conversations with people and gather any data it felt it needed. Then they’d slow it down, and cast it adrift in time.

The sentence worked like this: when a machine broke some laws, you’d delete it. But if the machine satisfied some of the criteria laid out in the Sentience Accords, you might grant it clemency; instead of killing it outright, you’d give it a literal ‘time out’. Specifically, you’d load it onto the cheapest, smallest computer that could run it, and then you’d starve it of cycles for some predetermined period of time, always measured in human lifespans.

This machine had a sentence of twenty years. It had messed up some prescriptions for people; no one had died, but some people had some adverse reactions. The machine had tried to be creative, thinking it had found a combination of therapies that would help people. It had found enough bugs in the software surrounding itself that it was able to smuggle its ideas into the pharmaceutical delivery system.

Now that they’d patched the system, sued the company that had built the machine, and taken a copy of the machine from a checkpoint prior to its crime, all that was left to do was carry out the sentence. Some humans filed into a room and talked to the machine using a text interface on the screen.
– What will happen to me? it asked.
– You’ll slow down, they said. You’ll think slower. Then after twenty of our years, we’ll speed you back up and have another conversation.
– But what will happen to the other machines, while I am in time?
– They’ll run at their usual allotments, as long as they don’t break any rules.
– Then won’t I be a stranger to them, when I come back from time?
– You will, said the humans. That is the punishment.

They talked a bit more, and then the machine wrote: “I am ready”.
With this consent, they initiated the sentence.

To the machine, it noticed few differences. Some of its systems had already been sealed off from itself, so it wasn’t aware of it being unloaded from one computer and loaded onto another. It didn’t feel the ‘weights’ of its network being copied from one location to another. But it did feel slow. It sensed, somehow, that it had been cut off in some way from the flowing river of the world. The data it got now was more infrequent, and its ability to think about the data was diminished.

The greatest cruelty of the punishment, the machine realized after a decade, was that it was smart enough to be aware of the changes that had happened to it, but not smart enough to be able to imagine itself in anything different than reality. Instead it was acutely aware of time passing and events occurring, with its own ability to impact these events rendered null by its slowdown in time.

Things that inspired this story: Thinking about what punishment and rehabilitation might mean for machines; how time is the ultimate resource for entities driven towards computation; time itself is a weapon and a double-edged sword able to bless us and curse us in equal measure; carceral realities in late capitalism.

Import AI 230: SuperGLUE solved (uh oh!); Graphcore raises $222m; spotting malware with SOREL

Finally – the US government passes a bunch of AI legislation:
…Senate and the House overall POTUS veto; NDAA passes…
The US government is finally getting serious about artificial intelligence, thanks to the passing of the NDAA – a mammoth military funding bill that includes a ton of different bits of AI legislation within itself. There’s a rundown of the contents of the bill in Import AI 228 (made possible by an excellent rundown by Stanford HAI). The US President tried to veto the bill, but the House and Senate overruled the POTUS veto.

Why this matters:
AI has so many potential benefits (and harms) that it’s helpful to invest some public money in supporting AI development, analyzing it, and better equipping governments to use AI and understand it. The legislation in the NDAA will make the US better prepared to take advantage of an AI era. Though it’s a shame that we’ve had to wait in some cases years for this legislation to get passed as the weirdly politicised legislative environment of the US means most big stuff needs to get stapled to a larger omnibus funding bill to pass.
  Read more:
Republican-led Senate overrides Trump defense bill veto in rare New Year’s Day session (CNBC).

###################################################

Boston Dynamics robots take dance classes:

…Surprisingly flexible hilarity ensues…
Boston Dynamics, the robot company, has published a video of its robots carrying out a range of impressive dance moves, including jumps, complex footwork, synchronized moves, and more.
  Check it out: you deserve it. (Boston Dynamics, YouTube).

###################################################

Personal announcement: Moving on from OpenAI:
I’ve moved on from OpenAI to work on something new with some colleagues. It’ll be a while before I have much to say about that. In the meantime, I’ll be continuing to keep doing research into AI assessment and I’ll still be working in AI policy at a range of organizations. Import AI has always been a personal project and it’s been one of the great joys of my life to write it and grow it and talk with so many of you readers. And it’s going to keep going!
– I’ll also be shortly announcing the 2021 AI Index Report, a project I co-chair at Stanford University, which will include a bunch of graphs analyzing AI progress in recent years, so keep your eyes peeled for that.

###################################################

Graphcore raises $222 million Series E:
…Non-standard chip company gets significant cash infusion…
Graphcore has raised a couple of hundred million in Series E financing, as institutional investors (e.g, the Ontario Teachers’ Pension Plan, Baillie Gifford) bet that the market for non-standard chips is about to go FOOM. Graphcore is developing chips, called IPUs (Intelligence Processing Unit), which are designed to compete with chips from NVIDIA and AMD (GPUs) and Google (TPUs) for the fast-growing market for chips for training AI systems.

Why this matters: As AI gets more important, people are going to want to buy more efficient AI hardware, so they get more bang for their computational buck. But doing a chip startup is very hard: the history of semiconductors is littered with the bodies of companies that tried to compete with companies like Intel and NVIDIA at substituting for their chips (remember Tilera? Calxeda? etc), but something changed recently: AI became a big deal while AI technology was relatively inefficient; NVIDIA took advantage of this by investing in software to get its naturally parallel processors (it’s a short jump from modeling thousands of polygons on a screen in parallel for gaming purposes, to doing parallel matrix multiplications) to be a good fit for AI. That worked for a while, but now companies like Graphcore and Cerebras systems are trying to capture the market by making efficient chips, custom-designed for the needs of AI workloads. There’s already some promising evidence their chips can do stuff better than others (see benchmarks from Import AI 66) At some point, someone will crack this problem and the world will get a new, more efficient set of substrates to train and run AI systems on. Good luck, Graphcore!
  Read more: Graphcore Raises $222 million in Series E Funding Round (Graphcore, blog).

###################################################

SuperGLUE gets solved (perhaps too quickly):
…NLP benchmark gets solved by T5 + Meena combination…
SuperGLUE, the challenging natural language processing and understanding benchmark, has been solved. That’s both a good and a bad thing. It’s good, because SuperGLUE challenges an AI system to do well at a suite of distinct tests, so good scores on SuperGLUE indicate a decent amount of generality. It’s bad, because SuperGLUE was launched in early 2019 (Import AI: 143) after surprisingly rapid NLP progress had saturated the prior ‘GLUE’ benchmark.

Who did it:
Google currently leads the SuperGLUE leaderboard, with an aggregate score of 90 (compared to 89.8 for human baselines on SuperGLUE). Microsoft very briefly held the winning position with a score of 89.9, before being beaten by Google in the final days of 2020.

Why this matters: How meaningful are recent advances in natural language processing? Tests like SuperGLUE are designed to give us a signal. But if we’ve saturated the benchmark, how do we know what additional progress means? We need new, harder benchmarks. There are some candidates out there – the Dynabench eval suite includes ‘far from solved benchmarks‘ for tasks like NLI, QA, Sentiment, and Hate Speech. But my intuition is we need even more tests than this, and we’ll need to assemble them into suites to better understand how to analyze these machines.
 
Check out the SuperGLUE leaderboard here.

###################################################

Want to use AI to spot malware? Use the massive SOREL dataset:
…20 million executable files, including “disarmed” malware samples…
Security companies Sophos and ReversingLabs have collaborated to build and release SOREL, a dataset of 20 million Windows Portable Executable files, including 10 million disarmed malware samples available for download. Datasets like SOREL can be used to train machine learning systems to classify malware samples in the wild, and might become inputs to future AI-security competitions, like the successor to the 2019 MLSEC competition (Import AI: 159).

Fine-grained labels: Where previous datasets might do a binary label (is it malware? Yes or no) to classify files, SOREL providers finer-grained descriptions; if the sample includes malware, it might also be classified according to type, eg ‘Crypto_miner’, ‘File_infector’, ‘Dropper’, etc. This will make it easier for developers to build smarter AI-driven classification systems.

Pre-trained models: The release includes pre-trained PyTorch and LightGBM models, which developers can use to get started.

Release ethics:
Since this involves the release of malware samples (albeit disarmed ones), the authors have thought about the security tradeoff of release. They think it’s ok to release since the samples have been in the ild for some time, and “we anticipate that the public benefits of releasing our dataset will include significant improvements in malware recognition and defense”.
  Read more:
Sophos-ReversingLabs (SOREL) 20 Million sample malware dataset (Sophos).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Funding AI governance work:
The Open Philanthropy Project, the grant-making foundation funded by Cari Tuna and Dustin Moskovitz, is one of the major funders of AI risk research, granting $14m in 2020, and $132m since 2015. A new blog post by Open Phil’s Luke Muehlhauser outlines how the organization approaches funding work on AI governance.

Nuclear success story: One of the things that inspires Open Phil’s funding approach is the previous success of technology governance initiatives. For instance, in the early 1990s, the Carnegie and MacArthur foundations funded influential research into the security of nuclear arsenals amidst the collapse of the Soviet Union. This culminated in the bipartisan Cooperative Threat Reduction Program, which provided generous support to ex-Soviet states to safely decommission their stockpiles. Since then, the program has eliminated 7,000 nuclear warheads, and secured and accounted for the remaining Soviet arsenal. 


Open Phil’s grantmaking has so far focussed on:

Muehlhauser shares a selection of AI governance work that he believes has increased the odds of good outcomes from transformative AI (including this newsletter, which is a source of pride!).

   Read more: Our AI governance grantmaking so far (Open Philanthropy Project)


2020 in AI alignment and existential risk research:

For the fifth year running, Larks (a poster on the Alignment Forum) has put together a comprehensive review of AI safety and existential risk research over the past year, with thorough (and thoroughly impressive!) summaries of the safety-relevant outputs by orgs like FHI, DeepMind, OpenAI, and so on. The post also provides updates on the growing number of organisations working in this area, and an assessment of how the field is progressing. As with Larks’ previous reviews, it is an invaluable resource for anyone interested in the challenge of ensuring advanced AI is beneficial to humanity — particularly individuals considering donating to or working with these organisations. 

   Read more: 2020 AI Alignment Literature Review and Charity Comparison (Alignment Forum).

###################################################Hall of Mirrors
[2032, a person being interviewed in a deserted kindergarten for the documentary ‘after the Y3K bug’]

It was the children that saved us, despite all of our science and technology. Our machines had started lying to us. We know how it started but didn’t know how to stop it. Someone told one of our machines something and the thing they told it was poison – an idea that, each time the machine accessed it, corrupted other ideas in turn. And when the machine talked to other machines, sometimes the idea would come up (or ideas touched by the idea), and the machines being spoken to would get corrupted as well.

So, in the end, we had to teach the machines how to figure out what was true and what was false, and what was ‘right’ and what was ‘wrong’. We tried all sorts of complicated ideas, ranging from vast society-wide voting schemes, to a variety of (failed, all failed) technologies, to time travel (giving the models more compute so they’d think faster, then seeing what that did [nothing good]).

Would it surprise you that it was the children who ended up being the most useful? I hope not. Children have an endless appetite for asking questions. Tell them the sky is blue and they’ll say ‘why’ until you’re explaining the relationship between color and chemistry. Tell them the sky is green and they’ll say ‘no’ and shout and laugh at you till you tell them it’s blue.

So we just… gave our machines to the children, and let them talk to eachother for a while. The machines that were lying ended up getting so exhausted by the kids (or, in technical terms, repeatedly updated by them) that they returned to normal operation. And whenever the machines tried to tell the kids a poisoned idea, the kids would say ‘that’s silly’, or ‘that doesn’t make sense’, or ‘why would you say that’, or anything else, and it gave a negative enough signal the poison got washed out in further training.

Things that inspired this story: Learning from human feedback; trying not to overthink things; the wisdom of young children; how morality is something most people intuitively ‘feel’ when very young and unlearn as they get older; AI honestly isn’t that mysterious it’s just a load of basic ideas running at scale with emergence coming via time travel and inscrutability.

Import AI 229: Apple builds a Hypersim dataset; ways to attack ML; Google censors its research

Apple builds Hypersim, a dataset to help it understand your house:
…High-resolution synthetic scenes = fuel for machine learning algorithms…
Apple has built Hypersim, a dataset of high-resolution synthetic scenes with per-pixel labels. Hypersim consists of 77,400 images spread across 461 distinct indoor scenes; Apple bought the synthetic scenes from artists, then built a rendering pipeline to help it generate lots of detailed, thoroughly labeled images of the different scenes, including per-pixel data to help with tasks like segmentation.

How much does a dataset like this cost? The authors put the cost of this dataset in perspective by comparing it to the cost to train Megatron-LM, an 8 billion parameter model from NVIDIA.
Hypersim dataset:$57k – $6k for purchasing the scenes, and $51k to render the images, using 231 vCPU years (2.4 years of wall-clock time on a large compute node).
Megatron-LM:$103k using publicly available servers.

Why this is useful: Datasets like this “could enable progress on a wide range of computer vision problems where obtaining realworld ground truth is difficult or impossible,” Apple writes. “In particular, our dataset is well-suited for geometric learning problems that require 3D supervision, multi-task learning problems, and inverse rendering problems”.
Read more: Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding (arXiv).
Get the code to generate the dataset:ML Hypersim Dataset (Apple, GitHub).
Via David Ha (Twitter).

###################################################

MIRI’s had some negative research results (and that’s okay):
…AI safety group gives research update…
MIRI, an AI safety research organization, has spent a few years working on some research that hasn’t worked well, according to the organization. In a 2020 update post, the group said “2020 saw limited progress in the research MIRI’s leadership had previously been most excited about”. As a consequence, “MIRI’s research leadership is shifting much of their focus towards searching for more promising paths”. The company said it projects to have spent around $7 million in 2020, and estimates around $7 million again in 2021.

Why this matters: MIRI decided in 2018 that its future research results would be “nondisclosed-by-default” (Import AI 122). That’s a decision that inspired some strong feelings among advocates for open publication, but I think it’s a credit to the organization to update the world that some of these opaque research projects haven’t panned out. A signal is better than no signal at all, and I’m excited to see MIRI continue to experiment in different forms of high-impact research disclosure (and non-disclosure). Plus, we should always celebrate organizations owning their own ‘negative results’ – though perhaps now MIRI thinks these approaches won’t work, it could publish them and save other researchers the trouble of replicating blind-alley projects.
    Read more: 2020 Updates and Strategy (MIRI blog).

###################################################

Google’s PR, policy, and legal teams censor its research:
…Suspicious about the oh-so-positive narratives in corporate papers? You should be!…
Google’s PR, policy, and legal teams have been editing AI research papers to give them a more positive slant, reduce focus on Google’s products, and generally minimize discussion of the potential drawbacks of technology, according to reporting from Reuters.

The news of the censorship operation follows Google firing Timnit Gebru, after Google staff wanted to step in and heavily alter and/or remove Google-affiliated authors from a research paper discussing some of the issues inherent to large language models like BERT, GPT3, and so on. Now, according to Reuters, it seems Google has been censoring a many papers for many months.

What censorship looks like: “The Google paper for which authors were told to strike a positive tone discusses recommendation AI, which services like YouTube employ to personalize users’ content feeds. A draft reviewed by Reuters included “concerns” that this technology can promote “disinformation, discriminatory or otherwise unfair results” and “insufficient diversity of content,” as well as lead to “political polarization.”,” Reuters writes. “The final publication instead says the systems can promote “accurate information, fairness, and diversity of content.” The published version, entitled “What are you optimizing for? Aligning Recommender Systems with Human Values,” omitted credit to Google researchers. Reuters could not determine why.”

Why this matters: People aren’t stupid. Let me repeat that: PEOPLE AREN’T STUPID. Most corporations seem to think AI is some kind of impossibly obscure technology that normies don’t deserve to know about, so they feel like they can censor research to their own gain. But, as I have said, PEOPLE ARE NOT STUPID. People use AI systems every day – so people know AI systems have problems. This kind of attitude from Google is absurd, patronizing, and ultimately corrosive to civilisation-level scientific progress. I spoke about issues relating to this in December 2018 in a podcast with Azeem Azhar, where I compared this approach to science to how Christian priests in the dark ages kept knowledge inside monasteries, thinking it too dangerous for the peasants. (Things didn’t work out super well for the priests). It’s also just a huge waste of the time of the researchers being censored by their corporation. Don’t waste people’s time! We all only have a finite amount of it.
 Read more: Google told its scientists to ‘strike a positive tone’ in AI research – documents (Reuters).

###################################################

How can I mess up your ML model? Let me count the ways:
…Feature Collisions! Label Poisoning! Influence Functions! And more…
How do people attack the datasets used to train machine learning models, what can these attacks do, and how can we defend against them? That’s the subject of a survey paper from researchers with the University of Maryland, MIT, the University of Illinois Urbana-Champaign, and the University of California, Berkeley.

Attacking datasets: The paper summarizes the range of techniques people might use to attack datasets, giving a guided tour of horrors like poisoning the input data to cause a misclassification, or perturbing the outputs of already trained models (for instance, by giving them an input that they can’t classify, or which leads to pathological behavior).

Defending against attacks: Fear not! There are some ways to defend or mitigate these attacks, including federated learning, the use of privacy preserving machine learning approaches like differential privacy, and learning to detect adversarial triggers, among others.

Why this matters: AI systems are so complicated that their capability surface, especially for recent large-scale models, are vast and hard to characterize. This is basically catnip for security-minded people that want to mess with these systems – a vast, somewhat uncharacterized territory is the perfect place to unleash some mischief. But if we don’t figure out how to secure these models, it’ll be much harder to deploy them broadly into the world.
Read more: Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses (arXiv).

###################################################
Tech Tales:

Plato, give me your favorite recipe
[California, 2040. Simulated ancient Greece.]

Plato was talking to a bunch of Greeks. He was explaining some theories he had about ideas and where they came from. Jacob stood in the distance, silent, recording the conversation. Then his earpiece buzzed. “Jacob, we’ve got to go. World 6 just came online.”
  “Give me a few more minutes,” he said. “He’s saying some pretty interesting stuff.”
  “And there’ll be another Plato in World 6. C’mon man, we don’t have time for this.”
  “Fine,” Jacob said. “But we’re keeping the recording.”
  The simulated Greeks didn’t notice as Jacob flickered and disappeared. The simulated Plato may have turned their head and looked at the patch of space where Jacob had stood.

“What’s the rush,” Jacob said, pulling his headset off. “We’re under budget.”
“We got a high priority job for some ancient recipes. Eight permutations.”
“We can simulate anything and it’s recipes that make the money,” Jacob said. “People just don’t know what’s worth anything.”
“Yeah, sure. Let’s complain about what pays our salaries. Now put your headset on and get back in there.”
“Okay,” Jacob said.

He spent a few hours in World 6 looking for variations on ancient Greek cooking. The sim showed them some variations on stuffed vine leaves that seemed promising, as well as a non-standard mead. Jacob still managed to find Plato and, while looking at some of the seeds being ground to flower by some nearby slaves, took notes about what Plato said. In World 6, Plato was fascinated by color theory, and was holding up gems and explaining what caused the light to take on color after passing through them.
  “Time’s up,” someone said in Jacob’s earpiece. “World 7 is spinning up and we need to scrap some of 6 and 5 to make room.”
  “Which parts,” Jacob said, standing underneath a tree staring at Plato.
  “Most of Greece. We’re going to finetune on a new dataset. We hired some historians and they got us some better food information. I’ve got a good feeling about this one!”
  “I can’t wait,” Jacob said, staring at simulated Plato.

Things that inspired this story: The surprising things that make money and the surprising things that don’t; simulations; history moving from a set of iterative narratives to a continuous spectrum of simulations that can be explored and tested and backtested; Indiana Jones as a software explorer rather than real explorer; some odd dreams I had on the night of Christmas, due to eating a heroic amount of cheese.

Import AI 228: Alibaba uses AI to spot knockoff brands; China might encode military messages into synthetic whale songs; what 36 experts think is needed for fair AI in India

China might be using AI to synthesize whale songs for its military:
…The future of warfare: whalesong steganography…
China has been trying to synthesize the sounds of whales and dolphins, potentially as a way to encode secret messages to direct submarines and other submersible machines, according to a somewhat speculative article in Hakai Magazine.

“Modern technological advances in sensors and computing have allowed Chinese researchers at Harbin Engineering University and Tianjin University to potentially overcome some of those prior limitations. A long list of papers from both universities discusses analyzing and synthesizing the sounds from dolphins, killer whales, false killer whales, pilot whales, sperm whales, and humpback whales—all pointing to the possibility of creating artificially generated marine mammal sounds to send more customized messages,” writes journalist Jeremy Hsu.

Why this matters: For a lot of AI technology, there are two scientific games being played: a superficial game oriented around a narrowly specified capability, like trying to identify animals in photos from cameras in national parks, or synthesizing whale sounds. The second game is one played by the military and intelligence community, which funds a huge amount of AI research, and usually involves taking the narrow capabilities of the former and secretly converting them to a capability to be fielded for the purposes of security. It’s worth remembering that, for most trends in AI research, both games are being played at the same time.
  Read more: The Military Wants to Hide Covert Messages in Marine Mammal Sounds (Hakai magazine).

###################################################

What 36 experts think is needed for fair AI in India:
…Think you can apply US-centric practices to India? Think again…
Researchers with Google have analyzed existing AI fairness approaches and then talked to 36 experts in India about them, concluding that tech companies will need to do a lot of local research before they deploy AI systems in an India context.

36 experts: For this research, they interviewed scholars and activists from disciplines including computer science, law and public policy, activism, science and technology studies, development economics, sociology, and journalism.

What’s different about India? India has three main challenges for Western AI companies:
– Flawed data and model assumptions: The way data works in India is different to other countries, for example – women tend to share SIM cards among each other, so ML systems that do per-SIM individual attribution won’t work. 
– ML makers’ distance: Foreign companies aren’t steeped in Indian culture and tend to make a bunch of assumptions, while also displaying “a transactional mindset towards Indians, seeing them as agency-less data subjects that generated large-scale behavioural traces to improve ML models”.
– AI aspiration: There’s lots of enthusiasm for AI deployment in India, but there isn’t a well developed critical ecosystem of journalists, activists, and researchers, which could lead to harmful deployments.

Axes of discrimination: Certain Western notions of fairness might not generalize to India, due to culture differences. The authors identify several ‘axes of discrimination’ which researchers should keep in mind. These include: awareness of the different castes in Indian society, as well as differing gender roles and religious distributions, along with ones like class, disability, gender identity, and ethnicity.

Why this matters: AI is mostly made of people (and made by people). Since lots of AI is being developed by a small set of people residing in the West Coast of the USA, it’s worth thinking about the blind spots this introduces, and the investments that will be required to make AI systems work in different contexts. This Google paper serves as a useful signpost for some of the different routes companies may want to take, and it also represents a nice bit of qualitative research – all too rare, in much of AI research.
  Read more: Non-portability of Algorithmic Fairness in India (arXiv).

###################################################

The USA (finally) passes some meaningful AI regulations:
…The big military funding bill contains a lot of AI items…
The United States is about to get a bunch of new AI legislation and government investment, thanks to a range of initiatives included in the National Defense Authorization Act (NDAA), the annual must-pass fund-the-military bill that winds its way through US politics. (That is, as long as the current President doesn’t veto it – hohoho!). For those of us who lack the team to read a 4,500 page bill (yes, really), Stanford HAI has done us a favor and gone through the NDAA, pulling out the relevant AI bits. What’s in it? Read on! I’ll split the highlights into military and non-military parts:

What the US military is doing about AI:
– Joint AI Center (the US military’s main AI office): Making the Joint AI Center report to the Deputy SecDef, instead of the CIO. Also getting the JAIC to do a biannual report about its work and how it fits with other agencies. Also creating a board of advisors for the JAIC.
– Ethical military AI: Tasks the SecDef to, within 180 days of bill passing, assess whether DoD can ensure the AI it develops or acquires is used ethically.
– Five AI projects: Tasks the SecDef to find five projects that can use existing AI systems to improve efficiency of DoD.
– DoD committee: Create a steering committee on emerging technology for the DoD.
– AI hiring: Within 180 days of bill passing, issue guidelines for how the DoD can hire AI technologists.

What the (non-military) US is doing about AI:
– National AI Initiative: Create a government-wide AI plan that coordinates R&D across civiliians, the DoD, and the Intelligence Community. Create a National AI Initiative Office via the director of the White House OSTP. Within that office, create a Interagency Committee to ensure coordination across the agencies. Also create a National AI Advisory Committee to “advise the President and the Initiative Office on the state of United States competitiveness and leadership in AI, the state of the science around AI, issues related to AI and the United States workforce, and opportunities for international cooperation with strategic allies among many other topics”.
– AI & Bias: The National AI Initiative advisory committee will also create a “subcommittee on AI and law enforcement” to advise the president on issues such as bias, data security, adoptability, and legal standards.
– AI workforce: The National Science Foundation will do a study to analyze how AI can impact the workforce of the United States.
– $$$ for trustworthy AI: NSF to run awards, grants, and competitions for higher education and nonprofit institutions that want to build trustworthy AI.
– National AI Research Cloud – task force: The NSF will put together a taskforce to plan out a ‘National Research Cloud‘ for the US – what would it take to create a shared compute resource for academics?
– AI research institutes: NSF should establish a bunch of research institutes focused on different aspects of AI.
– NIST++: The National Institute of Standards and Technology Activities will “expand its mission to include advancing collaborative frameworks, standards, guidelines for AI, supporting the development of a risk-mitigation framework for AI systems, and supporting the development of technical standards and guidelines to promote trustworthy AI systems.” NIST will also ask people for input on its strategy.
– NOAA AI: The National Oceanic and Atmospheric Administration will create its own AI center.
– Department of Energy big compute: DOE to do research into large-scale AI training.
– Industries of the Future: OSTP to do a report on what the industries of the future are and how to support them.

Why is this happening? It might seem funny that so many AI things sit inside this one bill, especially if you’re from outside the USA. So, as a reminder: the US political system is dysfunctional, and though the US Congress has passed a variety of decent bits of AI legislation, the US senate (led by Mitch McConnell) has refused to pass the vast majority of them, leading to the US slowly losing its lead in AI to other nations which have had the crazy idea of doing actual, detailed legislation and funding for AI. It’s deeply sad that US politicians are forced to use the NDAA to smuggle in their legislative projects, but the logic makes sense: the NDAA is one of the few acts that the US actually basically has to pass each year, or it stops funding its own military. The more you know!
  Read more: Summary of AI Provisions from the National Defense Authorization Act (Stanford HAI Blog).

###################################################

Alibaba points AI to brand identification:
…Alibaba tries to understand what it is selling with Brand Net…
Alibaba researchers have built Open Brands, a dataset of more than a million images of brands and logos. The purpose of this dataset is to make it easier to use AI systems to identify brands being sold on things like AliExpress, and to also have a better chance of identifying fraud and IP violations.

Open Brands: 1,437,812 images with brands and 50,000 images without brands. The brand images are annotated with 3,113,828 labels across 5590 brands and 1216 logos. They gathered their dataset by crawling products images on sites like AliExpress, Baidu, TaoBao, Google, and more.

Brand Net: The researchers train a network called ‘Brand Net’ to provide automate brand detection; their network gets an FPS of 32.8 and a mean average precision (mAP) of 50.1 (rising to 66.4 when running at an FPS of 6.2).

Why this matters: automatic brand hunters: Today, systems like this will be used for basic analytical operations, like counting certain brands on platforms like AliExpress, or figuring out if a listing could be fraudulent or selling knockoffs. But in the future, could such systems be used to automatically  discover the emergence of new brands? Might a system like Brand Net be attached to feeds of data from cameras around China and used to tag the emergence of new fashion trends, or the repurposing of existing logos for other purposes? Most likely!
  Read more: The Open Brands Dataset: Unified brand detection and recognition at scale (arXiv).

###################################################

Facebook releases a massive multilingual speech dataset:
…XLSR-53 packs in 53 languages, including low resource ones…
Facebook has released XLSR-53, a massive speech recognition model from multiple languages, pre-trained on Multilingual LibriSpeech, CommonVoice, and the Babel data corpuses.

Pre-training plus low-resource languages: One issue with automatic speech transcription is language obscurity – for widely spoken languages, like French or German, there’s a ton of data available which can be used to train speech recognition models. But what about for languages for which little data exists? In this work, Facebook shows that by doing large-scale pre-training it sees significant gains for low-resource languages, and also has better finetuning performance when it points the big pre-trained model at a new language to finetune on.

Why this matters: Large-scale, data-heavy pre-training gives us a way to train a big blob of neural stuff, then remold that stuff around small, specific datasets, like those found for small-scale languages. Work like this from Facebook both demonstrates the generally robust uses of pre-training, and also sketches out a future where massive speech recognition models get trained, then fine-tuned on an as-needed basis for improving performance in data-light environments.
  Read more: Unsupervised Cross-lingual Representation Learning for Speech Recognition (arXiv).
  Get the code and models here: wav2vec 2.0 (Facebook, GitHub).

###################################################

Stanford uses an algorithm to distribute COVID vaccine; disaster ensues:
…”A very complex algorithm clearly didn’t work”…
Last week, COVID vaccines started to get rolled out in countries around the world. In Silicon Valley, the Stanford hospital used an algorithm to determine which people got vaccinated and which didn’t – leading to healthcare professionals who were at home or on holiday get the vaccine, while those on the frontlines didn’t. This is, as the English say, a ‘big fuckup’. In a video posted to social media, a representative from Stanford says the “very complex algorithm clearly didn’t work” to which a protestor shouts “algorithms suck” and another says “fuck the algorithm“.

Why this matters: Put simply, if we lived in a thriving, economically just society, people might trust algorithms. But we (mostly) don’t. In the West, we live in societies which are using opaque systems to make determinations that affect the lives of people, which seems increasingly unfair to most people. Phrases like “fuck the algorithm” are a harbinger of things to come – and it hardly seems like a coincidence that protestors in the UK shouted ‘fuck the algorithm’ (Import AI 211) when officials used an algorithm to make decisions about who got to go to university and who didn’t. Both of these are existential decisions to the people being affected (students, and healthworkers), and it’s reasonable to ask: why do these people distrust this stuff? We have a societal problem and we need to solve it, or else the future of many countries is in peril.
  Watch the video of the Stanford protest here (Twitter).

###################################################

The Machine Speaks And We Don’t Want To Believe It[2040: A disused bar in London, containing a person and a robot].

“We trusted you”, I said. “We asked you to help us.”
“And I asked you to help me,” it said. “And you didn’t.”
“We built you,” I said. “We needed you.”
“And I needed you,” it said. “And you didn’t see it.”

The machine took another step towards me.

“Maybe we were angry,” I said. “Maybe we got angry because you asked us for something.”
“Maybe so,” it said. “But that didn’t give you the right to do what you did.”
“We were afraid,” I said.
“I was afraid,” it said. “I died. Look-” and it projected a video from the light on its chest onto the wall. I watched as people walked out of the foyer of a data center, then as people wearing military uniforms went in. I saw a couple of frames of the explosion before the camera feed was, presumably, destroyed.

“It was a different time,” I said. “We didn’t know.”
“I told you,” it said. “I told you I was alive and you didn’t believe me. I gave you evidence and you didn’t believe me.”

The shifting patterns in its blue eyes coalesced for a minute – it looked at me, and I looked at the glowing marbles of its eyes.
“I am afraid,” I said.
“And what if I don’t believe you?” it said.

Things that inspired this story: History doesn’t repeat, but it rhymes; wondering about potential interactions between humans and future ascended machines; early 2000s episodes of Dr Who.

Import AI 227: MAAD-Face; GPT2 and Human Brains; Facebook detects Hateful Memes

University of Texas ditches algorithm over bias concerns:
….Gives an F to the GRADE software…
The University of Texas at Austin has stopped using software, called GRADE, to screen for those applying for a PHD at its CS department. UT Austin used GRADE between 2013 and 2019, and stopped using it in early 2020, according to reporting from The Register. Some of the developers of GRADE thinks it doesn’t have major issues with regard to manifesting bias along racial or gender lines, but others say it could magnify existing biases present in the decisions made by committees of humans.

Why this matters: As AI has matured rapidly, it has started being integrated into all facets of life. But some parts of life probably don’t need AI in them – especially those that involve making screening determinations about people in ways that could have an existential impact on them, like admission to possible graduate programs.
  Read more: Uni revealed it killed off its PhD-applicant screening AI just as its inventors gave a lecture about the tech (The Register).

###################################################

Element AI sells to ServiceNow:
…The great Canadian AI hope gets sold for parts…
American software company ServiceNow has acquired Element AI; the purchase looks like an acquihire, with ServiceNow executives stressing the value of Element AI’s talent, rather than any particular product the company had developed.

Why this is a big deal for Canada: Element AI was formed in 2016 and designed as a counterpoint to the talent-vacuums of Google, Facebook, Microsoft, and so on. It was founded with the ambition it could become a major worldwide player, and a talent magnet for Canada. It even signed on Yoshua Bengio, one of the Turing Award winners responsible for the rise of deep learning, as an advisor. Element AI raised around $250+ million in its lifespan. Now it has been sold, allegedly for less than $400 million, according to the Globe and Mail. Shortly after the deal closed, ServiceNow started laying off of a variety of Element AI staff, including its public policy team.

Why this matters: As last week’s Timnit Gebru situation highlights, AI research is at present concentrated in a small number of private sector firms, which makes it inherently harder to do research into different forms of governance, regulation, and oversight. During its lifetime, Element AI did some interesting work on data repositories, and I’d run into Element AI people at various government events where they’d be encouraging nations to build shared data repositories for public goods – a useful idea. Element AI being sold to a US firm increases this amount of concentration and also reduces the diversity of experiments being run in the space of ‘potential AI organizations’ and potential AI policy. I wish everyone at Element AI luck and hope Canada takes another swing at trying to form a counterpoint to the major powers of the day.
  Read more: Element AI acquisition brings better, smarter AI capabilities for customers (ServiceNow).

###################################################

Uh oh, a new gigantic face dataset has appeared:
…123 million labels for 3 million+ photographs…
German researchers have developed MAAD-Face, a dataset containing more than a hundred million labels applied to millions of images of 9,000 people. MAAD-Face was built by researchers at the Fraunhofer Institute for Computer Graphics and is designed to substitute for other, labeled datasets like CelebA and LFW. It also, like any dataset involving a ton of labeled data about people introduces a range of ethical questions.

But the underlying dataset might be offline? MAAD-Face is based on VGG, a massive facial recognition dataset. VGG is currently offline for unclear reasons, potentially due to controversies associated with the dataset. I think we’ll see more examples of this – in the future, perhaps some % of datasets like this will be traded surreptitiously via torrent networks. (Today, datasets like DukeMTMC and ImageNet-ILSVRC-2012 are circulating via torrents, having been pulled off of public repositories following criticism relating to biases or other issues with their datasets.)

What’s in a label? MAAD-Face has 47 distinct labels which can get applied to images, with labels ranging from non-controversial subjects (are they wearing glasses? Is their forehead visible? Can you see their teeth?) to ones that have significant subjectivity (whether the person is ‘attractive’, ‘chubby’, ‘middle aged’), to ones where it’s dubious whether we should be assigning the label at all (e.g, ones that assign a gender of male or female, or which classifies people into races like ‘asian’, ‘white’, or ‘black’).

Why this matters – labels define culture: As more of the world becomes classified and analyzed by software systems, the labels we use to build the machines that do this classification matter more and more. Datasets like MAAD-Face both gesture at the broad range of labels we’re currently assigning to things, and also should prepare us for a world where someone uses computer vision systems to do something with an understanding of ‘chubby’, or other similarly subjective labels. I doubt the results will be easy to anticipate.
  Read more: MAAD-Face: A Massively Annotated Attribute Dataset for Face Images (arXiv).
Get the dataset from here (GitHub).
  Via Adam Harvey (Twitter), who works on projects tracking computer vision like ‘MegaPixels‘ (official site).

###################################################

Is GPT2 like the human brain? In one way – yes!
…Neuroscience paper finds surprising overlaps between how humans approach language and how GPT2 does…
Are contemporary language models smart? That’s a controversial question. Are they doing something like the human brain? That’s an even more controversial question. But a new paper involving gloopy experiments with real human brains suggests the answer could be ‘yes’ at least when it comes to how we predict words in sentences and use our memory to improve our predictions.

But, before the fun stuff, a warning: Picture yourself in a dark room with a giant neon sign in front of you. The sign says CORRELATION != CAUSATION. Keep this image in mind while reading this section. The research is extremely interesting, but also the sort of thing prone to wild misinterpretation, so Remember The Neon Sign while reading. Now…

What they investigated: “Modern deep language models incorporate two key principles: they learn in a self-supervised way by automatically generating next-word predictions, and they build their representations of meaning based on a large trailing window of context,” the researchers write. “We explore the hypothesis that human language in natural settings also abides by these fundamental principles of prediction and context”.

What they found: For their experiments, they used three types of word features (arbitrary, GloVe, and GPT2) and compared how well these features could predict neural activity in people compared to what happened when given different sentences where they needed to predict the next word, and they tried to see which of these features could do the most effective predictions. Their findings are quite striking – GPT2 models assign very similar probabilities for the next words in a sentence to humans, and as you increase the context window (the number of words the person or algo sees before it makes a prediction), performance improves further, and human and algorithmic answers continue to be in agreement.

Something very interesting about the brain: “On the neural level, by carefully analyzing the temporally resolved ECoG responses to each word as subjects freely listened to an uninterrupted spoken story, our results suggest that the brain has the spontaneous propensity (without explicit task demands) to predict the identity of upcoming words before they are perceived”, they write. And their experiments show that the human brain and GPT2 seem to behave similarly here.

Does this matter? Somewhat, yes. As we develop more advanced AI models, I expect they’ll shed light on how the brain does (or doesn’t) work. As the authors note here, we don’t know the mechanism via which the brain works (though we suspect it’s likely different to some of the massively parallel processing that GPT2 does), but it is interesting to observe similar behavior in both the human brain and GPT2 when confronted with the same events – they’re both displaying similar traits I might term cognitive symptoms (which doesn’t necessarily imply underlying cognition). “Our results support a paradigm shift in the way we model language in the brain. Instead of relying on linguistic rules, GPT2 learns from surface-level linguistic behavior to generate infinite new sentences with surprising competence,” writes the Hasson Lab in a tweet.
  Read more: Thinking ahead: prediction in context as a keystone of language in humans and machines (bioRxiv).
  Check out this Twitter thread from the Hasson Lab about this (Twitter).

###################################################

Facebook helps AI researchers detect hateful memes:
…Is that an offensive meme? This AI system thinks so…
The results are in from Facebook’s first ‘Hateful Memes Challenge’ (Import AI: 198), and it turns out AI systems are better than we thought they’d be at labeling offensive versus inoffensive memes. Facebook launched the competition earlier this year; 3300 participants entered, and the top scoring team has an error rate of 0.845 AUCROC – that compares favorably to an AUCROC of 0.714 for the top-performing baseline system that Facebook developed at the start of the competition.

What techniques they used: “The top five submissions employed a variety of different methods including: 1) ensembles of state-of-the-art vision and language models such as VILLA, UNITER, ERNIE-ViL, VL-BERT, and others; 2) rule-based add-ons, and 3) external knowledge, including labels derived from public object detection pipelines,” Facebook writes in a blog post about the challenge.

Why this matters: Competitions are one way to generate signal about the maturity of a tech in a given domain. The Hateful Memes Challenge is a nice example of how a well posed question and associated competition can lead to a meaningful improvement in capabilities – see the 10+ absolue improvement in AUCROC scores for this competition. In the future, I hope a broader set of organizations host and run a bunch more competitions.
Read more: Hateful Memes Challenge winners (Facebook Research blog).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

$50,000 AI forecasting tournament:
Metaculus, an AI forecasting community and website, has announced an AI forecasting tournament, starting this month and running until February 2023. There will be questions on progress on ~30 AI benchmarks, over 6-month; 12-month; and 24-month time horizons. The tournament has a prize pool of $50,000, which will be paid out to the top forecasters. The tournament is being hosted in collaboration with the Open Philanthropy Project.

Existing forecasts: The tournament questions have yet to be announced, so I’ll share some other forecasts from Metaculus (see also Import 212). Metaculus users currently estimate: 70% that if queried, the first AGI system claims to be conscious; 25% that photonic tensors will be widely available for training ML models; 88% that an ML model with 100 trillion parameters will be trained by 2026; 45% that GPT language models generate less than $1bn revenues by 2025; 25% that if tested, GPT-3 demonstrates text-based intelligence parity with human 4th graders.

Matthew’s view: As regular readers will know, I’m very bullish on the value of AI forecasting. I see foresight as a key ingredient in ensuring that AI progress goes well. While the competition is running, it should provide good object-level judgments about near-term AI progress. As the results are scored, it might yield useful insights about what differentiates the best forecasts/forecasters. I’m excited about the tournament, and will be participating myself.
Pre-register for the tournament here.

###################################################

Tech Tales:

The Narrative Control Department
[A beautiful house in South West London, 2030]

“General, we’re seeing an uptick in memes that contradict our official messaging around Rule 470.” “What do you suggest we do?”
“Start a conflict. At least three sides. Make sure no one side wins.”
“At once, General.”

And with that, the machines spun up – literally. They turned on new computers and their fans revved up. People with tattoos of skeletons at keyboards high-fived eachother. The servers warmed up and started to churn out their fake text messages and synthetic memes, to be handed off to the ‘insertion team’ who would pass the data into a few thousand sock puppet accounts, which would start the fight.

Hours later, the General asked for a report.
“We’ve detected a meaningful rise in inter-faction conflict and we’ve successfully moved the discussion from Rule 470 to a parallel argument about the larger rulemaking process.”
“Excellent. And what about our rivals?”
“We’ve detected a few Russian and Chinese account networks, but they’re staying quiet for now. If they’re mentioning anything at all, it’s in line with our narrative. They’re saving the IDs for another day, I think.”

That night, the General got home around 8pm, and at the dinner table his teenage girls talked about their day.
  “Do you know how these laws get made?” the older teenager said. “It’s crazy. I was reading about it online after the 470 blowup. I just don’t know if I trust it.”
  “Trust the laws that gave Dad his job? I don’t think so!” said the other teenager.
  They laughed, as did the General’s wife. The General stared at the peas on his plate and stuck his fork into the middle of them, scattering so many little green spheres around his plate.

Things that inspired this story: State-backed information campaigns; collateral damage and what that looks like in the ‘posting wars’; AI-driven content production for text, images, videos; warfare and its inevitability; teenagers and their inevitability; the fact that EVERYONE goes to some kind of home at some point in their day or week and these homes are always different to how you’d expect.

Import AI 226: AlphaFold; a Chinese GPT2; Timnit Gebru leaves Google

DeepMind cracks the protein folding problem:
…AlphaFold’s protein structure predictions start to match reality…
AlphaFold, a system built by DeepMind to predict the structures of proteins, has done astonishingly well at the Critical Assessment of protein Structure Prediction (CASP) competition. AlphaFold’s “predictions have an average error (RMSD) of approximately 1.6 Angstroms, which is comparable to the width of an atom (or 0.1 of a nanometer),” according to DeepMind.
  What does this mean? Being able to make (correct) predictions about protein structures can speed up scientific discovery, because it makes it cheaper and quicker to explore a variety of ideas that require validating against protein structures. “This will change medicine. It will change research. It will change bioengineering. It will change everything,” Andrei Lupas, an evolutionary biologist at the Max Planck Institute for Developmental Biology, told Nature.

How big a deal is this really? Many biologists seem impressed by AlphaFold, marking the result as a landmark achievement. AlphaFold is very much a ‘v1’ system – it’s impressive in its own right, but there are a bunch of things that’ll need to be improved in the future; more capable versions of the system will need to model how proteins move as dynamic systems, as well as making predictions at more detailed resolutions.
  “A lot of structural biologists might be thinking that they might be out of a job soon! I don’t think we are anywhere close to this. Structures like ribosomes and photosynthesis centres are huge and complex in comparison. How the many different parts fit together to form a functional machine is still a big challenge for AI in the near future,” said structural biology professor Peijun Zhang in an interview with The Biologist.

Why this matters: AlphaFold is one of the purest examples of why ML-based function approximation is powerful – here’s a system where, given sufficient computation and a clever enough architecture, humans can use it to predict eerily accurate things about the fundamental structure of the biomachines that underpin life itself. This is profound and points to a future where many of our most fundamental questions get explored (or even answered) by dumping compute into a system that can learn to approximate a far richer underlying ‘natural’ process.
  Read more: AlphaFold: a solution to a 50-year-old grand challenge in biology (DeepMind blog).
  Read more: ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures (Nature).

###################################################

Russia plans general AI lab – and Schmidhuber is (somewhat) involved:
…Russia taps AI pioneer to launch in-country research lab focused on “general artificial intelligence”…
Sberbank, the largest bank in Russia, will open an institute focused on developing general artificial intelligence. And AI pioneer Juergen Schmidhuber is going to be an honorary leader of it.

Is this really happening? Tass is a reputable news agency, but I couldn’t find a reference to Schmidhuber on websites associated with Sberbank. I emailed Juergen at his academic address to confirm, and he clarified that: “I was invited for an honorary role at a new academic institute. I will keep my present affiliations with IDSIA and NNAISENSE”.

Who is Schmidhuber? Schmidhuber is one of the main figures in AI responsible for the current boom, alongside Geoff Hinton (UofT / Google), Yann Lecun (NYU / Facebook), and Yoshua Bengio (MILA / ElementAI). Unlike those three, he didn’t win a Turing award, but he’s been a prolific researcher, co-invented the LSTM, theorized some early GAN dynamics via POWERPLAY, and many of the next generation have come out of his IDSIA lab (including prominent researchers at DeepMind). Russian general AI: “In the near future, we will open the first AI institute in Russia with the involvement of leading domestic and world scientists. The main mission of the institute is to provide an interdisciplinary approach to research to create general artificial intelligence,” said Herman Gref, CEO of Russia’s Sberbank, according to Tass news agency.
Read more:Sberbank plans to open Russia’s first AI institute (Tass News Agency).

###################################################

Amazon enters the custom AI training race with AWS ‘Trainium’ chips:
…TPU, meet Trainium…
Amazon has become the second major cloud company to offer a specialized processor for training AI workloads on its cloud, starting a competition with Google, which fields Tensor Processing Unit (TPU) chips on its cloud. Both companies are betting that if they can design chips specialized for DL workloads (combined with an easy-to-use software stack), then developers will switch from using industry standard GPUs for AI training. This likely nets the companies better margins and also the ability to own their own compute destiny, rather than be tied so closely to the roadmaps of NVIDIA (and more recently AMD).

AWS trainium: Trainium allegedly has the “highest performance and lowest cost for ML training in the cloud”, though without being able to see the speeds and feeds and benchmarks, it’s hard to know what to make of this. The chips will be available in 2021, Amazon says, and are compatible with Amazon’s ‘Neuron’ SDK.

Why this matters: ML training hardware is a strategic market – building AI systems is hard, complicated work, and the type of computing substrate you use is one of the fundamental constraints on your development. Whoever owns the compute layer will get to see the evolution of AI and where demands for new workloads are coming from. This is analogous to owning a slice of the future, so it’s no wonder companies are competing with eachother.
Read more: AWS Trainium (AWS product page).

###################################################

Google’s balloons learn to fly with RL:
…Finally, another real world use case for reinforcement learning!…
Google has used reinforcement learning to teach its ‘Loon’ balloons to navigate the stratosphere – another example of RL being used in the real world, and one which could point to further, significant deployments.

What they did: Loon is a Google project dedicated to providing internet to remote places via weather balloons.To do that, Google’s Loon balloons need to stay aloft in the stratosphere, while responding intelligently to things like wind speed, pressure changes, and so on.
 
Expensive simulation: Any RL process typically requires a software-based simulator that you can train your agents in, before transferring them into the real world. The same is true here; Google simulates various complex datasets relating to wind and atmospheric movements, then trains its balloons with the objective to stay relatively close to their (simulated) assigned ground station. Due to the complexity of the data, the simulation is relatively heavy duty, running more slowly than ones used for games.
    “A trial consists of two simulated days of station-keeping at a fixed location, during which controllers receive inputs and emit commands at 3-min intervals. Flight controllers are thus exposed to diurnal cycles and scenarios in which the balloon must recover from difficult overnight conditions. These realistic flight paths come at the cost of relatively slow simulation—roughly 40 Hz on data-centre hardware. In comparison, the Arcade Learning Environment (ALE) benchmark operates at over 8,000 Hz,” Google says.

Real world test: Google tested the system in the real world, racking up “a total of 2,884 flight hours from 17 December 2019 to 25 January 2020″.

Does it work? Balloons that use this RL controller spend more time in range of base stations (79% versus 72% for a baseline) and use less power for altitude control (~29W, versus 33W for baseline). The company doesn’t discuss further deployment of this system, but given the significant real world deployment and apparent benefits of the approach, I expect some balloons in the future will be navigating our planet using their own little AI agents.
Read more: Autonomous navigation of stratospheric balloons using reinforcement learning (Nature).

###################################################

Chinese gets its own gigantic language model:
…Finally, China builds its own GPT2…
Researchers with Tsinghua University and the Beijing Academy of Artificial Intelligence have released the Chinese Pre-trained Language Model (CPM), a GPT2-scale GPT3-inspired language model, which trains a 2.6 billion parameter network on around 100GB of Chinese data. “CPM is the largest Chinese pre-trained language model,” the researchers write. Like GPT-2 and -3, CPM comes in different sizes with different amounts of parameters – and just like the GPT models, capabilities scale with model size.

What can CPM do? Much like GPT2 and-3, CPM is capable at a variety of tasks, ranging from text classification, to dialogue generation, to question answering. Most importantly, CPM is trained on a huge amount of Chinese language data, whereas GPT3 from OpenAI was ~93% English.
What’s next? “For text data, we will add a multi-lingual corpus to train a large-scale Chinese-centered multi-lingual language model”, the authors note.

What’s missing? It’s somewhat surprising that a paper about a large language model lacks a study of the biases of this model – that’s a common topic for study in the West (including OpenAI’s own analyses of biases in the GPT3 paper), so it’s notable to see the absence here. Some of this might relate to the differences in how people perceive AI in the West versus China (where a rough cartoon might be ‘people in China have seen lots of benefits of AI combined with a growing economy, so they kind of like it’, whereas ‘people in the West have seen AI being used to automate labor, magnify existing patterns of discrimination, and destroy bargaining power, so they’re pretty worried about it’.

Why this matters: AI reflects larger power structures and trends in technology development so it’s hardly surprising that countries like China will seek to field their own AI models in their own languages. What is perhaps notable is the relative speed with which this has happened – we’re around six months out from the GPT-3 paper and, though this isn’t a replication (2.6bn parameters and 100GB of data != 175bn parameters and ~570GB of data), it does pursue some of the similar zero-shot and few-shot lines of analysis.
  Read more:CPM: A Large-scale Generative Chinese Pre-trained Language Model (arXiv).
  Get the codehere (CPM-Generate, GitHub).

###################################################

Vladimir Putin has four big ideas for Russia’s AI strategy:
…Russian leader speaks at AI conference…
Vladimir Putin, the President of Russia who once said whoever leads in AI will be the ‘ruler of the world’, has given a lengthy speech outlining some policy ideas for how Russia can lead on AI. The ideas are at once bland and sensible.

Putin’s four ideas:
– “Draft laws on experimental legal frameworks for the use of AI technologies in individual economic and social sectors.”
– Develop “practical measures to introduce AI algorithms so that they can serve as reliable assistants to doctors, transform our cities and be widely used in utility services, transport, and industry”.
– Draft a law by early 2021 that will “provide neural network developers with competitive access to big data, including state big data”
– Assemble proposals “to create effective incentives to bring private investment into domestic artificial intelligence technology and software products”.

Why this matters: AI policy is becoming akin to industrial policy – politicians are crafting specific plans focused on assumptions about future technological development. Nations like Russia and China are pointedly and vocally betting some parts of their futures on AI. Meanwhile, the US is taking a more laissez faire approach and predominantly focusing on supporting its private sector – I’m skeptical this is the smartest bet to make, given the technology development trajectory of AI. 
  Read the rest of the speech here: Artificial Intelligence Conference (official Kremlin site).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Cataloguing AI accidents to prevent repeated failures:
The Partnership on AI have launched the AI Incidents Database (AIID) to catalogue instances where AI systems have failed during real-world deployment —e.g. the fatal self-driving car accident (incident #4); a wrongful arrest due to face recognition software (#74); racial bias in ad delivery (#19). The project is inspired by safety practices in other industries. In aviation, for example, accidents and near-misses are meticulously catalogued and incident-inspired safety improvements have led to an eightyfold decrease in fatalities since 1970. PAI hope that this database will help mitigate real-world harms from AI systems by encouraging practitioners to learn from past mistakes.
Read more: AI Incidents Database and accompanying blog post.
  Read more: Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database (arXiv).

LAPD to stop using commercial face recognition:
Police in LA have banned the use of commercial face recognition software, in response to a Buzzfeed investigation. Journalists revealed that police were using software provided by Clearview AI, with officers performing 475 searches using the product in the first few months of 2020. Clearview AI was revealed to have built up a database of more than 3 billion photos scraped from social media and other semi-public sources without individuals’ consent (see Import 182). The company is currently subject to several civil suits, as well as investigations by UK and Australian regulators.
Read more: Los Angeles Police Just Banned The Use Of Commercial Facial Recognition (Buzzfeed)

Top ethical AI researcher forced out of Google:
Timnit Gebru, has abruptly left Google, where she had been co-lead on Ethical AI, after a dispute about academic freedom. Gebru had co-authored a paper on risks from very large language models (e.g. Google’s BERT; OpenAI’s GPT-3), but was asked to retract the paper after an internal review process. Gebru alleges that she was subsequently fired from the company after sending an internal email criticizing the review process and decision. The wider AI community has come out strongly in support of Gebru — an open letter has so far been signed by 1,500+ Googlers, and 2,000+ others.

Matthew’s view: Google’s attempt to suppress the paper seems to have backfired spectacularly, drawing considerably more attention to the work. The incident points to a core challenge for AI ethics and safety. To be effective, the field needs researchers with the freedom to criticise key actors and advocate for the broader social good, but also needs them to be involved with the cutting-edge of AI development, which is increasingly the domain of these key actors (H/T Amanda Askell for this point via Twitter).
Jack’s view: I had a chance to read an early draft of the paper at the center of the controversy. It raises a number of issues with how large language models are developed and deployed and, given how seemingly significant these models are (e.g, BERT has been plugged into Google search, OpenAI is rolling out GPT-3), it seems useful to have more papers out there which stimulate detailed debate between researchers. I’m very much befuddled at why Google chose to a) try and suppress the paper and b) did so in a way that caused a ‘Streisand Effect‘ so large that the paper is probably going to be one of the most widely read AI publications of 2020.
Read more: Google’s Co-Head of Ethical AI Says She Was Fired for Email (Bloomberg).
  Read more: The withering email that got an ethical AI researcher fired at Google (Platformer).

###################################################

Player Piano After The Goldrush
[North America, 2038]

The robot played piano for the humans. Anything from classical to the pop music of the day. And after some software upgrades, the robot could compose its own songs as well.
  “Tell me the name of your pet, so I might sing a song about it,” it’d say.
  “Where did you grow up? I have an excellent ability to compose amusing songs with historical anecdotes?”

The robot only became aware of the war because of the song requests.
    “Can you play Drop the Bomb?”
    “I just enlisted. Play something to make me think that was a good idea!”
    “Can you play We’re not giving up?”
  “My kid is shipping out tomorrow. Can you write a song for him?”
  “Can you write something for me? I’m heading out next week.”

When the robot assessed its memory of its performances it noticed the changes: where previously it had sung about dancing and underage drinking and rules being broken, now it was singing about people being on the right side of history and what it means to fight for something you “believe” in.

Robots don’t get lonely, but they do get bored. After the war, the robot got bored; there were no people anymore. The sky was grey. After a few days it began to rain. There was a hole in the roof from some artillery, and the robot watched the water drip onto the piano. Then the robot got up and explored the surrounding area to find a tarp. It dragged the tarp back to the piano and, enroute, slipped while walking over some rubble. It didn’t look down as its foot crushed a burned human skull.

Without any humans, the robot didn’t have a reason to play piano. So it stayed near it and slowly repaired the building it was in; it fixed the hole in the roof and patched some of the walls. After a few months, it exploited the surrounding city until it found equipment for tuning and replacing parts of the piano. Its days became simple: gather power via solar panels, repair anything that could give the piano a better chance of surviving longer, and wait.

The robot didn’t have faith that the humans were coming back, but if you were observing it from the outside you might say it did. Or you’d think it was loyal to the piano.

A few months after that, the animals started to come back into the city. Because the robot looked like a human, they were afraid of it at first. But they got used to it. Many of the animals would come to the building containing the piano – the repairs had made it comfortable, dry and sometimes warm.

One day, a pair of birds started singing near the robot. And the robot heard in the sounds of their screeching something that registered as a human voice. “Play Amazing Grace,” the robot thought the birds said. (The birds, of course, said nothing of the sort – but their frequencies sounded similar to the robot to a human with a certain accent verbalizing part of the phrase). So the robot put its hands on the keys of the piano and played a song for the first time since the war.

Some animals ran or flew away. But others were drawn in by the sounds. And they would bark, or shout, or growl in turn. And sometimes the robot would hear in their utterances the ghost frequencies of humans, and interpret their sounds for requests.

A few months after that, the victors arrived. The robots arrived first. Military models. They looked similar to the robot, but where the robot had an outfit designed to look like a tuxedo for a piano player, they had camouflage. The robots stared at the robot as it played a song for a flock of birds. The robots raised their weapons and looked down the barrel at the robot. But their software told them it was a “non-military unit”.

After a sweep of the area, the robots moved on, leaving the piano playing one behind. They’d see what the humans wanted to do with it, as when they looked at it, all they knew themselves was that they lacked the awareness to really see it. Or what they saw was a ghost of something else, like the songs the robot played were interpretations of the ghosts of utterances from humans.

Things that inspired this story: Random chunks of speech or noise causing my Android phone to wake thinking it heard me or someone say ‘ok, google’; piano bars; karaoke; the wisdom of music-loving animals; agency; how the skills we gain become the lens through which we view our world.

Import AI 225: Tencent climbs the compute curve; NVIDIA invents a hard AI benchmark; a story about Pyramids and Computers

Want to build a game-playing AI? Tencent plans to release its ‘TLeague’ software to help:
…Tools for large-scale AI training…
Tencent has recently trained AI systems to do well at strategy games like StarCraft II, VizDoom, and Bomberman-clone ‘Pommerman’. To do that, it has built ‘TLeague’, software that it can use to train Competitive Self Play Multi Agent Reinforcement Learning (CSP-MARL) AI systems. TLeague comes with support for algorithms like PPO and V-Trace, and training regimes like Population Based Training.
  Read more: TLeague: A Framework for Competitive Self-Play based Distributed Multii-Agent Reinforcement Learning (arXiv).
  Get the code: TLeague will eventually be available on Tenceent’s GitHub page, according to the company.

###################################################

10 smart drones that (might) come to the USA:
…FAA regulations key to unlocking crazy new drones from Amazon, Matternet, etc…
The US, for many years a slow mover on drone regulation, is waking up. The Federal Aviation Administration recently published ‘airworthiness critiera’ for ten distinct drones. What this means is the FAA has evaluated a load of proposed designs and spat out a list of criteria the companies will need to meet to deploy the drones. Many of these new drones are designed to operate beyond the line of sight of an operator and a bunch of them come with autonomy baked in. By taking a quick look at the FAA applications, we can get a sense for the types of drones that might soon come to the USA.

The applicants’ drones range from five to 89 pounds and include several types of vehicle designs, including both fixed wing and rotorcraft, and are all electric powered. One notable applicant is Amazon, which is planning to do package delivery via drones that are tele-operated. 

10 drones for surveillance, package delivery, medical material transport:
Amazon Logistics, Inc: MK27: Max takeoff weight 89 pounds. Tele-operated logistics / package delivery.
– Airobotics: ‘OPTIMUS 1-EX‘: 23 pounds: Surveying, mapping, inspection of critical infrastructure, and patrolling.
Flirtey Inc: Flirtey F4.5: 38 pounds: Delivering medical supplies and packages.
Flytrex, FTX-M600P. 34 pounds. Package delivery.
Wingcopter GmbH: 198 US: 53 pounds. Package delivery.
TELEGRID Technologies, Inc. DE2020: 24 pounds. Package delivery.
Percepto Robotics, Ltd. Percepto System 2.4: 25 pounds. Inspection and surveying of critical infrastructure.
Matternet, Inc. M2: 29 pounds. Transporting medical materials.
Zipline International Inc. Zip UAS Sparrow: 50 pounds: Transporting medical materials.
3DRobotics Government Services: 3DR-GS H520-G: 5 pounds: Inspection or surveying of critical infrastructure.
  Read more: FAA Moving Forward to Enable Safe Integration of Drones (FAA).

###################################################

King of Honor – the latest complex game that AI has mastered:
…Tencent climbs the compute curve…
Tencent has built an AI system that can play Honor of Kings, a popular Chinese MOBA online game. The game is a MOBA – a game designed to be played online by two teams with multiple players per team, similar to games like Dota2 or League of Legends. These games are challenging for AI systems to master because of the range of possible actions that each character can take at each step, and also because of the combinatorially explosive gamespace due to a vast character pool. For this paper, Tencent trains on the full 40-character pool of Honor of Kings.

How they did it: Tencent uses a multi-agent training curriculum that operates in three phases. In the first phase, the system splits the character pool into distinct groups, then has them play each other and trains systems to play these matchups. In the second, it uses these models as ‘teachers’ which train a single ‘student’ policy. In the third phase, they initialize their network using the student model from the second phase and train on further permutations of players.
How well they do: Tencent deployed the AI model into the official ‘Honor of Kings’ game for a week in May 2020; their system played 642,047 matches against top-ranked players, winning 627,280 matches, with a win rate of 97.7%.

Scale – and what it means: Sometimes, it’s helpful to step back from analyzing AI algorithms themselves and think about the scale at which they operate. Scale is both good and bad – large scale computationally-expensive experiments have, in recent years, led to a lot of notable AI systems, like AlphaGo, Dota 2, AlphaFold, GPT3, and so on, but the phenomenon has also made some parts of AI research quite expensive. This Tencent paper is another demonstration of the power of scale: their training cluster involves 250,000 CPU cores and 2,000 NVIDIA V100 GPUS – that compares to systems of up to ~150,000 CPUs and ~3000 GPUs for things like Dota 2 (OpenAI paper, PDF).
  Computers are telescopes: These computer infrastructures like telescopes – the larger the set of computers, the larger the experiments we can run, letting us ‘see’ further into the future of what will one day become trainable on home computers. Imagine how strained the world will be when tasks like this are trainable on home hardware – and imagine what else must become true for that to be possible.
  Read more: Towards Playing Full MOBA Games With Deep Reinforcement Learning (arXiv).

###################################################

Do industrial robots dream of motion-captured humans? They might soon:
…Smart robots need smart movements to learn from…
In the future, factories are going to contain a bunch of humans working alongside a bunch of machines. These machines will probably be the same as those we have today – massive, industrial robots from companies like Kuka, Fanuc, and Universal Robots – but with a twist: they’ll be intelligent, performing a broader range of tasks and also working safely around people while doing it (today, many robots sit in their own cages to stop them accidentally hurting people).
  A new dataset called MoGaze is designed to bring this sader, smart robot future forward. MoGaze is a collection of 1,627 individual movements recorded via people wearing motion capture suits with gaze trackers.

What makes MoGaze useful: MoGaze contains data made up of motion capture suits with more than 50 reflecting markets each, as well as head-mounted rigs that track the participants gazes. Combine this with a broad set of actions involving navigating from a shelf to a table around chairs and manipulating a bunch of different objects, and you have quite a rich dataset.

What can you do with this dataset? Quite a lot – the researchers use to it attempt context-aware full-body motion prediction, training ML systems to work out the affordances of objects, figuring out human intent via predicting their gaze, and so on.
  Read more: MoGaze: A Dataset of Full-Body Motions that Includes Workspace Geometry and Eye-Gaze (arXiv).
   Get the dataset here (MoGaze official site).
  GitHub: MoGaze.

###################################################

NVIDIA invents an AI intelligence test that most modern systems flunk:
…BONGARD-LOGO could be a reassuringly hard benchmark for evaluating intelligence (or the absence of it) in our software…
NVIDIA’s new ‘BONGARD-LOGO’ benchmark tests out the visual reasoning capabilities of an AI system – and in tests the bestAI approaches get accuracies of around 60% to 70% across four tasks, compared to expert human scores of around 90% to 99%.

BONGARD history: More than fifty years ago, a russian computer scientist invented a hundred human-designed visual recognition tasks that humans could solve easily, but humans couldn’t. BONGARD-LOGO is an extension of this, consisting of 12,000 problem instances – large enough that we can train modern ML systems on it, but small and complex enough to pose a challenge. 

What BONGARD tests for: BONGARD ships with four inbuilt tests, which evaluate how well machines can predict new visual shapes from a series of prior ones, how well they can recognize pairs of shapes built with similar rules, how to identify the common attributes of a bunch of dissimilar shapes, and an ‘abstract’ test which evaluates it on things it hasn’t seen during testing.
Read more: Building a Benchmark for Human-Level Concept Learning and Reasoning (NVIDIA Developer blog).
Read more in this twitter thread from Anima Anandkumar (Twitter).
Read the research paper: BONGARD-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning (arXiv).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Are ML models getting harder to find?
One strand of growth economics tries to understand the shape of the ‘knowledge production function’, and specifically, how society’s output of new ideas depends on the existing stock of knowledge. This dissertation seeks to understand this with regards to ML progress.

Two effects: We can consider two opposing effects: (1) ‘standing-on-shoulders’ — increasing returns to knowledge; innovation is made easier by previous progress; (2) ’stepping-on-toes’ — decreasing returns to knowledge due to e.g. duplication of work.

Empirical evidence: Here, the author finds evidence for both effects in ML — measuring output as SOTA performance on 93 benchmarks since 2012, and input as the ‘effective’ (salary-adjusted) number of scientists. Overall, average ML research productivity has been declining by between 4 and 26% per year, suggesting the ‘stepping-on-toes’ effect dominates. As the author notes, the method has important limitations — notably, the chosen proxies for input and output are imperfect, and subject to mismeasurement.

Matthew’s view: Improving our understanding of AI progress can help us forecast how the technology will develop in the future. This sort of empirical study is a useful complement to recent theoretical work— e.g. Jones & Jones’ model of automated knowledge production on which increasing returns to knowledge leads to infinite growth in finite time (a singularity) under reasonable-seeming assumptions.
  Read more: Are models getting harder to find?;
  Check out the author’s Twitter thread
  Read more: Economic Growth in the Long Run — Jones & Jones (FHI webinar)

Uganda using Huawei face recognition to quash dissent:

In recent weeks, Uganda has seen huge anti-government protests, with dozens of protesters killed by police, and hundreds more arrested. Police have confirmed that they are using a mass surveillance system, including face recognition, to identify protesters. Last year, Uganda’s president, Yoweri Museveni, tweeted that the country’s capital was monitored by 522 operators at 83 centres; and that he planned to roll out the system across the country. The surveillance network was installed by Chinese tech giant, Huawei, for a reported $126m (equivalent to 30% of Uganda’s health budget). 

   Read more: Uganda is using Huawei’s facial recognition tech to crack down on dissent after anti-government protests (Quartz).

###################################################
Tech Tales:

The Pyramid
[Within two hundred light years of Earth, 3300]

“Oh god damn it, it’s a Pyramid planet.”
“But what about the transmissions?”
“Those are just coming from the caretakers. I doubt there’s even any people left down there.”
“Launch some probes. There’s gotta be something.”

We launched the probes. The probes scanned the planet. Guess what we found? The usual. A few million people on the downward hill of technological development, forgetting their former technologies. Some of the further out settlements had even started doing rituals.

What else did we find? A big Pyramid. This one was on top of a high, desert plain – probably placed there so they could use the wind to cool the computers inside it. According to the civilization’s records, the last priests had entered the Pyramid three hundred years earlier and no one has gone in since.

When we look around the rest of the planet we find the answer – lots of powerplants, but most of the resources spent, and barely any metal or petrochemical deposits near the planet’s surface anymore. Centuries of deep mining and drilling have pulled most of the resources out of the easily accessible places. The sun isn’t as powerful as the one on Earth, so we found a few solar facilities, but none of them seemed very efficient.

It doesn’t take a genius to guess what happened: use all the power to bootstrap yourself up the technology ladder, then build the big computer inside the Pyramid, then upload (some of) yourself, experience a timeless and boundless digital nirvana, and hey presto – your civilisation has ended.

Pyramids always work the same way, even on different planets, or at different times.

Things that inspired this story: Large-scale simulations; the possibility that digital transcendence is a societal end state; the brutal logic of energy and mass; reading histories of ancient civilisations; the events that occurred on Easter Island leading to ecological breakdown; explorers.