Import AI

Import AI 248: Google’s megascale speech rec system; Dynabench aims to solve NLP benchmarking worries; Australian government increases AI funding

Google makers a better speech recognition model – and scale is the key:
…Multilingual models match monolingual models, given sufficient scale…
Google has figured out how to surmount a challenge to training machine learning systems to understand multiple languages. In the past, monolingual models have typically outperformed multilingual models, because when you train a model on a whole bunch of languages, you can sometimes improve performance for the small-data languages but degrade performance on the large-data ones. No more! In a new study, Google shows that if you just train a large enough network on a large enough amount of data, you can get equivalent performance to a monolingual model, while being able to develop something that can do well on multiple languages at once.

The data: Google traits its model on languages ranging from English to Hindi to Chinese, with data amounts ranging from ~55,000 hours of speech to 7,700 per language, representing ~364,900 hours of speech in total across all languages. (To put it in perspective, it’s rare to get this much data – Spotify’s notably vast English-only podcast dataset clocks in at around 50,000 hours (Import AI 242), and a recent financial news dataset from Kensho weighs in at 5,000 hours (Import AI 244).)

Why large models are better: When Google trained a range of models on this dataset, it sound that “larger models are not only more data efficient, but also more efficient in terms of training cost as measured in TPU days – the 1B-param model reaches the same accuracy at 34% of training time as the 500M-param model”, they write. Google uses its ‘GShard’ infrastructure to train models of 220M, 370M, 500M, 1B, and 10B parameter counts.
There’s more work to do, though: “We do see on some languages the multilingual model is still lagging behind. Empirical evidence suggests it is a data balancing problem, which will be investigated in future.”
  Read more: Scaling End-to-End Models for Large-Scale Multilingual ASR (arXiv).

###################################################

Job opportunity! International policy lead for the UK’s CDEI:
Like AI policy? Want to make it better? Apply here…
The Center for Data Ethics and Innovation, a UK government agency focused on AI and data-intensive technologies, is hiring an International Policy Lead. “This is a unique cross-organisational role focused on producing high-quality international AI and data policy advice for CDEI leadership and teams, as well as connecting the CDEI’s experts with international partners, experts, and institutions.”
  Find out more and apply here (CDEI site).

###################################################

Australian government increases AI funding:
…All ‘$’ in this section refer to the Australian dollar…
The Australian government is investing a little over $124 million ($96m USD) into AI initiatives over the next four to six years, as part of the country’s 2021-2022 Federal Budget.

Four things for Australian AI: 

  • $53m for a “National Artificial Intelligence Center” which will itself create four “Digital Capability Centers” that will help Australian small and medium-sized businesses get connected to AI organizations and get advice on how to use AI.
  • $33.7 million to subsidize Australian businesses partnering with the government on pilot AI projects.
  • $24.7 million for the “Next Generation AI Graduates Program” to help it attract and train AI specialists. 
  • $12 million to be distributed across 36 grants to fund the creation of AI systems “that address local or regional problems” (Note: This is… not actually much money at all if you factor in things like data costs, compute costs, staff, etc.)

Why this matters: AI is becoming fundamental to the future technology strategy of most governments – it’s nice to see some outlay here. However, it’s notable to me how relatively small these amounts are, when you consider the size of Australia’s population (~25 million) and the fact these grants pay out over multiple years, and the increasing cost of large-scale AI research projects.
  Read more: Australia’s Digital Economy (Australian Government).

###################################################

Instead of building AIs to command, we should build AIs to cooperative with:
…New cooperative AI foundations aims to encourage research in building more collaborative machines…
A group of researchers think we need to build cooperative AI systems to get the greatest benefits from the nascent technology. That’s the gist of an op-ed in Nature by scientists with the University of Oxford, DeepMind, the University of Toronto, and Microsoft. The op-ed is accompanied by the establishment of a new Cooperative AI Foundation, which has an initial grant of $15m USD.

Why build cooperative AI? “AI needs social understanding and cooperative intelligence to integrate well into society”, the researchers write. “Cooperative intelligence is unlikely to emerge as a by-product of research on other kinds of AI. We need more work on cooperative games and complex social spaces, on understanding norms and behaviours, and on social tools and infrastructure that promote cooperation.”

Three types of cooperation: There’s room for research into systems that lead to better AI-AI collaboration, systems that improve AI-human cooperation, and tools that can help humans cooperative with eachother better.

Why this matters: Most of today’s AI research involves building systems that we delegate tasks to, rather than actively cooperate with. If we change this paradigm, I think we’ll build smarter systems and also have a better chance of developing ways for humans to learn from the actions of AI systems.
  Read more: Cooperative AI: machines must learn to find common ground (Nature).
  Find out more about the Cooperative AI foundation at the official website. ###################################################

Giant team of scientists tries to solve NLP’s benchmark problem:
…Text-processing AI systems are blowing up benchmarks as fast as they are being built. What now?…
A large, multi-org team of researchers have built Dynabench, software meant to support a new way to test and build text-processing AI systems. Dynabench exists because in recent years NLP systems have started to saturate most of the benchmarks available to them – SQuAD was quickly superseded by SQuADV2, GLUE was superceded by SuperGLUE, and so on. At the same time, we know that these benchmark-smashing systems (e.g, BERT, GPT2/3, T5) contain significant weaknesses which we aren’t able to test for today, the authors note.

Dynabench – the dynamic benchmark: Enter Dynabench. Dynabench is a tool to “evaluate models and collect data dynamically, with humans and models in the loop rather than the traditional static way”. The system makes it possible for people to run models on a platform where if, for example, a model performs very poorly on one task, humans may then generate data for these areas, which is then fed back into the model, which then runs through the benchmark again. “The data collected through this process can be used to evaluate state-of-the-art models, and to train even stronger ones, hopefully creating a virtuous cycle that helps drive progress in the field,” they say.

What you can use Dynabench for today: Today, Dynabench is designed around four core NLP tasks – testing out how well AI systems can perform natural language inference, how well they can answer questions, how they analyze sentiment, and the extent to which they can collect hate speech.
…and tomorrow: In the future, the researchers want to shift Dynabench from being English-only to being multilingual. They also want to carry out live model evaluation – “We would be able to capture not only accuracy, for example, but also usage of computational resources, inference time, fairness, and many other relevant dimensions.”
  Read more:Dynabench: Rethinking Benchmarking in NLP (arXiv).
Find out more about Dynabench at its official website.

###################################################

Google gets a surprisingly strong computer vision result using surprisingly simple tools:
…You thought convolutions mattered? You are like a baby. Now read this…
Google has demonstrated that you can get similar results on a computer vision task to systems that use convolutional neural networks, while instead using multi-layer perceptions (MLPs) – far simpler AI components.

What they did: Google has developed a computer vision classifier called MLP-Mixer (‘Mixer’ for short), a “competitive but conceptually and technically simple alternative” to contemporary systems that use convolutions or self-attention (e.g, transformers). Of course, this has some costs – Mixer costs dramatically more in terms of compute than the things it is competing with. But it also highlights how, given sufficient data and compute, a lot of the architectural innovations in AI can get washed away simply by scaling up dumb components to mind-bendingly large scales.

Why this matters: “We believe these results open many questions,” they write. “On the practical side, it may be useful to study the features learned by the model and identify the main differences (if any) from those learned by CNNs and Transformers. On the theoretical side, we would like to understand the inductive biases hidden in these various features and eventually their role in generalization. Most of all, we hope that our results spark further research, beyond the realms of established models based on convolutions and self-attention.”
Read more:MLP-Mixer: An all-MLP Architecture for Vision (arXiv).

###################################################

What do superintelligent-AI-risk skeptics think, and why?
…Worried about the people who aren’t worried about superintelligence? Read this…
Last week, we wrote about four fallacies leading to a false sense of optimism about progress in AI research (Import AI 247). This week, we’re looking at the opposite issue – why some people are worried that people are insufficiently worried about possibility of superintelligence. That’s the gist of a new paper from Roman Yampolskiy, a longtime AI safety researcher at the University of Louisville.

Why be skeptical? Some of the reasons to be skeptical of AI risks include: general AI is far away, there’s no obvious path from here to there, even if we built it and it was dangerous we could turn it off, superintelligence will be in some sense benevolent, AI regulation will deal with the problems, and so on.

Countermeasures to skepticism: So, how are AI researchers meant to rebut skeptics? One thing is to build more consensus among scientists around what comprises superintelligence, another is to try and educate people more about the technical priors that inform superintelligence-wary people, appeal to authorities like Bill Gates and Elon Musk which have talked about it, and, perhaps most importantly, “do not reference science-fiction stories”.

Why this matters: The superintelligence risk debate is really, really messy, and it’s fundamentally bound up with the contemporary political economy of AI (where many of the people who worry about superintelligence are also the people with access to resources and who are least vulnerable to the failures of today’s AI systems). That means talking about this stuff is hard, prone to ridicule, and delicate. Doesn’t mean we shouldn’t try, though!
Read more: AI Risk Skepticism (arXiv).

###################################################

Tech Tales:
[Department of Prior World Analysis, 2035]
During a recent sweep of REDACTED we discovered a cache of papers in a fire-damaged building (which records indicate was used as a library during the transition era). Below we publish in full the text from one undamaged page. For access to the full archive of 4332 full pages and 124000 partial scraps, please contact your local Prior World Analysis administrator).

On the ethics of mind-no-body transfer across robot morphologies
Heynrick Schlatz, Lancaster-Harbridge University
Published to arXiv, July, 2028.

Abstract:
In recent years, the use of hierarchical models for combined planning and movement has revolutionized robotics. In this paper we investigate the effects of transferring either a planning or a movement policy – but not both – from one robot morphology to another. Our results show that planning capabilities are predominantly invariant to movement policy transfer due to few-shot calibration, but planning policy transfer can lead to pathological instability.

Paper:
Hierarchical models for planning and movement have recently enabled the deployment of economically useful, reliable, and productive robots. Typical policies see both movement and planning systems trained in decoupled simulation environments, only periodically being trained jointly. This has created flexible policies that show better generalization and greater computational efficiency than policies trained jointly, or systems where planning and movement is distilled into the same single policy.

In this work, we investigate the effects of transferring either movement or planning policies from one robot platform to another. We find that movement policies can typically be transferred with a negligible performance degradation – even on platforms where they have more than twice the number of actuators to the originating platform. We find the same is not true for planning policies. In fact, planning policies demonstrate a significant degradation after being transferred from one platform to another.

Feature activation analysis indicates that planning policies suffer degradation to long-term planning, self-actualization, and generalization capabilities as a consequence of such transfers. We hypothesize the effect is analogous to what has been seen recently in biology – motor control policies can be transferred or fine-tuned from one individual to another, while attempts to transfer higher-order mental functions have proved unsuccessful and in some cases led to loss of life or mental function.

In figure 1., we illustrate transfer of a planning policy from robot morphology a) – a four-legged two-arm Toyota ground platform – to robot morphology b) a two-legged six-arm Mitsubishi construction platform. Table 1., reports performance of the movement policy under transfer. Table 2., reports performance of the planning policy. We find significant degradation of performance when conducting planning transfer. Analysis of feature activations shows within-distribution activations when deployed on originating platform a); upon transfer to platform b) we immediately see activation patterns shift to features previously identified to correlate to ‘confusion’, ‘dysmorphia’, and circuit activations linked to self-modelling, world-modelling, and memory retracement.

We recorded the planning policy transfer via four video cameras placed at corners of the demonstration room. As stills from video analysis in figure 2., show, we see the planning policy transfer leads to physical instability in the robot platform – it can be seen attempting to scale walls of demonstration room, then repeatedly moves with force into the wall. The robot was deactivated following attempts to use four of its six arms to remove one of its other arms. Features activated on the robot during this time showed high readings for dysphoria as well as a spike in ‘confusion’ activation which we have not replicated since, due to ethical concerns raised by the experiment.

Things that inspired this story: Reading thousands of research papers over the years for Import AI and thinking about what research from other timelines or worlds might look like; playing around with different forms for writing online; thinking about hierarchical RL which was big a few years ago but then went quiet and wondering if we’re due for an update; playing around with notions of different timelines and plans.

Import AI 247: China makes its own GPT3; the AI hackers have arrived; four fallacies in AI research.

Finally, China trains its own GPT3:
…Now the world has two (public) generative models, reflecting two different cultures…
A team of Chinese researchers have created ‘PanGu’, a large-scale pre-trained language model with around ~200 billion parameters, making it equivalent to GPT3 (175 billion parameters) in terms of parameter complexity. PanGu is trained on 1.1TB of Chinese text (versus 570GB of text for GPT-3), though in the paper they train the 200B model for a lot less time (on way fewer tokens) than OpenAI did for GPT-3. PanGu is the second GPT-3-esque model to come out of China, following the Chinese Pre-trained Language Model (CPM, Import AI 226), which was trained on 100GB of text and was only a few billion parameters, compared to a couple of hundred!

Is it good? Much like GPT-3, PanGu does extraordinarily well on a range of challenging, Chinese-language benchmarks for tasks as varied as text classification, keyword recognition, common sense reasoning, and more.

Things that make you go hmmmm – chiplomacy edition: In this issue’s example of chiplomacy, it’s notable the researchers train this on processors from Huawei, specifically the company’s “Ascend” processors. They use the ‘mindspore‘ framework (also developed by Huawei).
  Read more: PANGU-α: LARGE-SCALE AUTOREGRESSIVE PRETRAINED CHINESE LANGUAGE MODELS WITH AUTO-PARALLEL COMPUTATION (arXiv).

###################################################

The AI hackers are here. What next?
…Security expert Bruce Schneier weighs in…
Bruce Schnier has a lengthy publication at the Belfer Center about ‘the coming AI hackers’. It serves as a high-level introduction to the various ways AI can be misused, abused, and wielded for negative purposes. What might be most notable about this publication is it’s discussion of raw power – who has it, who doesn’t, and how this interplaces with hacking: “Hacking largely reinforces existing power structures, and AIs will further reinforce that dynamic”, he writes.
  Read more: The Coming AI Hackers (Belfer Center website)

###################################################

What does it take to build a anti-COVID social distancing detector?
…Indian research paper shows us how easy this has become…
Here’s a straightforward paper from Indian researchers about how to use various bits of AI software to build something that can surveil people, understand if they’re too close to eachother, and provide warnings – all in the service of encouraging social distancing. India, for those not tuning into global COVID news, is currently facing a deepening crisis, so this may be of utility to some readers.

What it takes to build a straightforward AI systems: Building a system like this basically requires an input video feed, an ability to parse the contents of it and isolate people, then work out if the people are too close to eachother or not. What does it take to do this? For people detection, they use YOLOv3, a tried-and-tested object detector, using a darknet-53 network pre-trained on the MS-COCO dataset as a backbone. They then use an automated camera calibration technique (though note how you can do this manually with OpenCV) to estimate spaces in the video feed, which they can then use to perform distance estimation. “To achieve ease of deployment and maintenance, the different components of our application are decoupled into independent modules which communicate among each other via message queues,” they write.
  In a similar vein, back in January, some American researchers published a how-to guide (Import AI 231) for using AI to detect if people are wearing anti-COVID masks on construction sites, and Indian company Skylark Labs saying in May 2020 it was using drones to observe crowds for social distancing violations (Import AI 196).

A word about ethics: Since this is a surveillance application, it has some ethical issues – the authors note they’ve built this system so it doesn’t need to store data, which may help deal with any specific privacy concerns, and it also automatically blurs the faces of the people that it does see, providing privacy during deployment.
  Read more: Computer Vision-based Social Distancing Surveillance Solution with Optional Automated Camera Calibration for Large Scale Deployment (arXiv).

###################################################

Want some earnings call data? Here’s 40 hours of it:
…Training machines to listen to earnings calls…
Researchers with audio transcription company Rev.com and Johns Hopkins University have released Earnings-21, a dataset of 39 hours and 15 minutes of transcribed speech from 44 earnings calls. The individual recordings range from 17 minutes to an hour and 34 minutes. This data will help researchers develop their own audio speech recognition systems – but to put the size of the dataset in perspective, Kensho released a dataset of 5,000 hours of earnings call speech recently (Import AI 244). On the other hand, you need to register to download the Kensho data, but you can pull this ~40 hour lump directly from GitHub, which might be preferable.
  Read more: Earnings-21: A Practical Benchmark for ASR in the Wild (arXiv).
  Get the data here (rev.com, GitHub)

###################################################

Want to test out your AI lawyer? You might need CaseHOLD:
…Existing legal datasets might be too small and simple tlo measure progress…
Stanford University researchers have built a new multiple choice legal dataset, so they can better understand how well existing NLP systems can deal with legal questions.
  One of the motivations to build the dataset has come from a peculiar aspect of NLP performance in the legal domain – specifically, techniques we’d expect to work don’t work that well: “One of the emerging puzzles for law has been that while general pretraining (on the Google Books and Wikipedia corpus) boosts performance on a range of legal tasks, there do not appear to be any meaningful gains from domain-specific pretraining (domain pretraining) using a corpus of law,” they write.

What’s in the data? CaseHOLD contains 53,000 multiple choice questions with prompts from a judicial decision and multiple potential holdings, one of which is correct, which could be cited. You can use CaseHOLD to test how well a model has a grasp on this aspect of the law by seeing which of the multiple choice question answers it selects as most likely.
  Read more: When Does Pretraining Help? Assessing Self-Supervised Learning for Law and The CaseHOLD Dataset of 53,000+ Legal Holdings (Stanford RegLab, blog).
  Read more: When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset (arXiv).
  Get the data here: CaseHOLD (GitHub).

###################################################

AI research has four fallacies – we should be aware of them:
…Making explicit some of the implicit assumptions or beliefs among researchers…
Imagine that the field of AI research is a house party – right now, the punch bowls are full of alcohol, people are excitedly showing each other what tricks they can do, and there’s a general sense of joie de vise and optimism (though these feeling aren’t shared by the people outside the party who experience its effects, nor by the authorities who are dispatching some policy-police cars to go and check the party doesn’t go out of hand). Put simply: the house party is a real rager!
    But what if the punchbowl were to run out – and what would make it run out? That’s the idea in a research paper from Melanie Mitchell, a researcher at the Sante Fe Institute, where she says our current optimism could lead to us deluding ourselves about the future trajectory of AI development, and this could come from four (what Mitchell terms) fallacies that researchers use when thinking about AI..

Four fallacies: Mitchell identifies four ways in which contemporary researchers could be deluding themselves about AI progress. These fallacies include:
Believing narrow intelligence is on a continuum with general intelligence: Researchers assume that progress in one part of the field of AI must necessarily lead to future, general progress. This isn’t always the case. 
– Easy things are easy and hard things are hard: Some parts of AI are counterintuitively difficult and we might not be using the right language to discuss these challenges. “AI is harder than we think, because we are largely unconscious of the complexity of our own thought processes,” Mitchell writes.
– The lure of wishful mnemonics: Our own language that we use to describe AI might limit or circumscribe our thinking – when we say a system has a ‘goal’ we imbue that system with implicit agency that it may lack; similarly, saying a system ‘understands’ something connotes a more sophisticated mental process than what is probably occurring. “Such shorthand can be misleading to the public,” Mitchell says.
– Intelligence is all in the brain: Since cognition is embodied, might current AI systems have some fundamental flaws? This feels, from my perspective, like the weakest point Mitchell makes, as one can achieve embodiment by loading an agent into a reinforcement learning environment and provide it with actuators and a self-discoverable ‘surface area’, and this can be achieved in a digital form. On the other hand, it’s certainly true that being embodied yields the manifestation of different types of intelligence.

Some pushback: Here’s some discussion of the paper by Richard Ngo, which I felt helpful for capturing some potential criticisms.
  Read more: Why AI is Harder Than We Think (arXiv).

###################################################

Tech Tales

Just Talk To Me In The Real
[2035: Someone sits in a bar and tells a story about an old partner. The bar is an old fashioned ‘talkeasy’ where people spend their time in the real and don’t use augments].

“Turn off the predictions for a second and talk to me in the real,” she said. We hadn’t even been on our second date! I’d never met someone who broke PP (Prediction Protocol) so quickly. But she was crazy like that.

Maybe I’m crazy too, because I did it. We talked in the real, both naked. No helpful tips for things to say to each other to move the conversation forward. No augments. It didn’t even feel awkward because whenever I said something stupid or off color she’d laugh and say “that’s why we’re doing this, I want to know what you’re realy like!”.

We got together pretty much immediately. When we slept together she made me turn off the auto-filters. “Look at me in the real”, she said. I did. It was weird to see someone with blemishes. Like looking at myself in the mirror before I turn the augments on. Or how people looked in old pornography. I didn’t like it, but I liked her, and that was a reason to do it.

The funny thing is that I kept the habit even after she died. Oh, sure, on the day I got the news I turned all my augments on, including the emotional regulator. But I turned it off pretty quickly – I forget, but it was a couple of days or so. Not the two weeks that the PP mandates. So I cried a bunch and felt pretty sad, but I was feeling something, and just the act of feeling felt good.

I even kept my stuff off for the funeral. I did that speech in the real and people thought I was crazy because of how much pain it caused me. And as I was giving the speech I wanted to get everyone else to turn all their augments off and join me naked in the real, but I didn’t know how to ask. I just hoped that people might choose to let themselves feel something different to what is mandated. I just wanted people to remember why the real was so bitter and pure it caused us to build things to escape it.Things that inspired this story: Prediction engines; how technology tends to get introduced as a layer to mediate the connections between people.

Import AI 246: Generating data via game engines; the FTC weighs in on AI fairness; Waymo releases a massive self-driving car dataset.

Use this dataset to get a Siri-clone to hear you better:
…Timers and Such is a specific dataset for a common problem…
Whenever you yell at an audio speech recognition system like Google or Siri and it doesn’t hear you, that can be frustrating. One area I’ve personally encountered this is when I yell various numbers at my (digital) assistant, which sometimes struggle to hear my accent. Now, research from McGill University, Mila, the University of Montreal, the University of Paul Sabatier, and Avignon University, aims to make this easier with Timers and Such, a dataset of utterances people have spoken to their smart devices.

What’s it for? The dataset is designed to help AI systems understand people when they describe four basic things: setting a timer, setting an alarm, doing simple mathematics (e.g, some hacky recipe math), and unit conversation (e.g, using a recipe from the US in Europe, or vice versa).

What does the dataset contain? The dataset contains around ~2,200 spoken audio commands from 95 speakers, representing 2.5 hours of continuous audio. This is augmented by a larger dataset consisting around ~170 hours of synthetically generated audio.

Why this matters: Google, Amazon, Microsoft, and  other tech companies have vast amounts of the sorts of data in ‘Timers and Such’. Having open, public datasets will make it easier for researchers to develop their own assistants, and provides a useful additional dataset to test modern ASR systems against.
  Read more: Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers (arXiv).
  Get the code here (SLU recipes for Timers and Such v1.0, GitHub).

###################################################

Want to regulate AI? Send ideas to this conference:
…EPIC solicits papers…
EPIC, the Electronic Privacy Information Center, a DC-based research thinktank, is hosting a conference later this year about AI regulation – and it wants people to submit regulatory ideas to it. The deadline is June 1 2021 and successful proposals will be presented at a September 21 symposium. “Submissions can include academic papers, model legislation with explanatory memoranda, and more”, EPIC says.
  Find out more here (EPIC site).

###################################################

European Commission tries to regulate AI:
…AI industrialization begets AI regulation…
The European Commission has proposed a wide-ranging regulatory approach to AI and, much like the European Commission’s prior work on consumer privacy via GDPR, it’s likely that these regulations will become a template that other countries use to regulate AI.

High-risk systems: The most significant aspect of the regulation is the decision to treat “high-risk” AI systems differently to other ones. This introduces a regime where we’ll need to figure out how to classify and define AI systems into different categories then, once systems count as high-risk, be able to assess and measure their behavior once deployed. High-risk systems will need to use ‘high quality’ datasets that ‘minimise risks and discriminatory outcomes’, will need to be accompanied with detailed documentation, have human oversight during operation, and a few other traits. All of these things are difficult to do today and it’s not clear we even know how to do some of them – what does an appropriately fair and balance dataset look like in practice, for example?

Why this matters: This legislation introduces a chicken and egg problem – our ability to accurately measure the capabilities of AI systems for various policy traits is underdeveloped today, but to conform to European legislation, companies will need to be able to do this. Therefore, this legislation might create more of an incentive for companies, academia, and governments to invest in this area. The multi-billion dollar question is who gets to define the ways we measure and assess AI systems for risk – whoever does this gets to define some of the most sensitive deployments of AI.
  Read more: Europe fit for the Digital Age: Commission proposes new rules and actions for excellence and trust in Artificial Intelligence (European Commission press site).
  Read more: Proposal for a Regulation on a European approach for Artificial Intelligence (European Commission site).

###################################################

US regulator says AI companies need to ensure their datasets are fair:
…The Federal Trade Commission talks about how it might approach AI regulation…
The FTC has published a blog post about how companies should develop their AI systems so as not to fall afoul of (rarely enforced) FTC rules. The title gives away the FTC view: Aiming for truth, fairness, and equity in your company’s use of AI.

How does the FTC think companies should approach AI development? The FTC places a huge emphasis on fairness, which means it cares a lot about data. Therefore, the FTC writes in its blog post that “if a data set is missing information from particular populations, using that data to build an AI model may yield results that are unfair or inequitable to legally protected groups”. It says AI developers should test their algorithms to ensure they don’t discriminate on the basis of race, gender, or other protected classes (this will be a problem – more on that later). Other good practices include documenting how data was gathered, ensuring deployed models don’t cause “more harm than good”, and ensuring developers are honest about model capabilities.

Here’s how not to advertise your AI system: “For example, let’s say an AI developer tells clients that its product will provide “100% unbiased hiring decisions,” but the algorithm was built with data that lacked racial or gender diversity. The result may be deception, discrimination – and an FTC law enforcement action.”, the FTC writes.

The big problem at the heart of all of this: As we move into an era of AI development where models gain capabilities through semi-supervised or unsupervised learning, we can expect models to learn internal ‘features’ that correlate to things like legally protected groups. This isn’t an academic hypothesis: a recent analysis of multimodal neurons, published on distill, discovered AI systems that learned traits relating to specific religions, specific ages for humans, gender traits, and so on – all things that our existing regulatory structure says you can’t discriminate about. Unfortunately, neural networks just kind of learn everything and make fine-grained discriminative choices from there. Therefore, rock: meet hard place. It’ll be curious to see how companies build their AI systems to respond to concerns like these from the FTC.
   The other problem at the heart of this: The FTC seems to presume that fair, balanced datasets exist. This is untrue. Additionally, it’s not even clear how to build datasets that meet some universal standard for fairness. In reality, we’re always going to ask fair for who? Unfair for who? About everything that gets deployed. Therefore, if the FTC wants to actually enforce this stuff, it’ll need to come up with metrics for assessing the diversity of datasets then ensure developers build systems that conform to them – not remotely easy, and not something the existing FTC is well set up to do.,

Who wrote this? The author of the post is listed as Elisa Jillson, whose LinkedIn identifies them as an attorney in the division of privacy and identity protection, bureau of consumer protection, at the FTC.

Why this matters: Fitting together our existing policy infrastructure with the capabilities of AI models is a little bit like trying to connect two radically different plumbing connections – it’s a huge challenge and, until we do, there’ll be lots of randomness in the enforcement and lack of enforcement of existing laws with regard to AI systems. Posts like this give us a sense of how traditional regulators view AI systems; AI developers would do well to pay attention – the best way to deal with regulation is to get ahead of it.
  Read more: Aiming for truth, fairness, and equity in your company’s use of AI (Federal Trade Commission, blog)

###################################################

Waymo releases a gigantic self-driving car dataset:
…Want to train systems that think about how different AI things will interact with each other? Use this…
Waymo, Google’s self-driving car spinoff, has announced the Waymo Open Motion Dataset (OMD), “a large scale, diverse dataset with specific annotations for interacting objects to promote the development of models to jointly predict interactive behaviors”. OMD is meant to help developers train AI systems that can not only predict things from the perspective of a single self-driving car, but also model the broader interactions between self-driving cars and other objects, like pedestrians.

Waymo Open Motion Dataset: Google’s dataset contains more unique roadways and covers a greater number of cities than other datasets from Lyft, Argo, and so on. The dataset consists of 574 hours of driving time in total. More significant than the length is the complexity: more than 46% of the thousands of individual ‘scenes’ in the dataset contain more than 32 agents; in the standard ODM validation set, 33.5% of scenes require predicting the actions of at least one pedestrian, and 10.4% require predicting actions of a cyclist. Each scene has a time horizon of around 8 seconds, meaning AI systems will need to predict over a longer time horizon than is standard (3-5 seconds), which makes this a more challenging dataset to test against.

Why this matters: Self-driving car data is rare, though with releases like this, things are changing. By releasing datasets like this, Google has made it easier for people to get a handle on the challenges of modelling multiple objects that all self-driving cafes will need to surmount prior to deployment.
  Read more: Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset (arXiv).

###################################################

The future of AI is planet-scale satellite surveillance (and it’ll be wonderful):
…What ‘minicubes’ have to do with the future of the world…
In the future, thousands of satellites are going to be recording pictures of the earth at multiple resolutions, multiple times a day, and much of this data will be available for civil use, as well as military. Now, the question is how we can make sense of that data? That’s the question at the heart of ‘EarthNet’, a new earth sensing dataset and competition put together by researchers with the Max-Planck-Institute for Biogeochemistry, the German Aerospace Center, the University of Jena, and the Technische Universitat Berlin.

The EarthNet dataset contains 32000 ‘minicubes’ made up of Sentinel2 satellite imagery, spread across Northern Europe. Each minicube contains 30 5-daily frames at a resolution of 20m, along with 150 daily frames of five meteorological variables at a resolution of 1.28km. Assembling EarthNet was tricky: “we had to gather the satellite imagery, combine it with additional predictors, generate individual data samples and split these into training and test sets,” the researchers write.

What’s the point of EarthNet? EarthNet is ultimately meant to stimulate research into sensing and forecasting changes on the surface of the earth. To that end, there’s also a challenge where teams can compete on various forecasting tasks, publishing their results to a leaderboard. If the competition gets enough entries, it’ll give us a better sense of the state of the art here. More broadly, EarthNet serves as a complement to other aspects of AI research concerned with modelling the Earth – DeepMind recently did research into making fine-grained predictions for weather over the UK over a two hour predictive horizon (Import AI 244); EarthNet, by comparison, is concerned wirth making predictions that span days. Combined, advances in the short-term and long-term of forecasting could give us a better understanding of planet earth and how it is evolving.
  Read more: EarthNet2021: A Large-scale dataset and challenge for Earth surface forecasting as a guided video prediction task (arXiv).
  Find out more information about the competition here (EarthNet2021, official site).

###################################################

Could synthetic data be a new source of revenue for game companies? (Yes).
…Unity gets into data augmentation…
Game engine company Unity wants to use its technology to help companies build synthetic datasets to augment their real data. The initial model for this seems to be consulting with no specific prices listed. Companies might want to pay Unity to help them create more synthetic data because it’s cheaper than gathering data from reality, and because once you’ve built your 3D environments/models, you may be able to generate even more data in the future as the capabilities of the Unity engine further advance.

Data – “At any scale”: “The number of images you need for training depends on the complexity of your scene, the variety of objects you are using, and the requirements for accuracy in your solution. We will work with you to understand your needs, help scope the number of frames for your project, and iterate with you to ensure that the synthetic dataset meets your requirements,” Unity writes. “In the future, we plan to provide a simple self-service interface so you can generate additional data at your convenience, without having to rely on the Unity team.”

Why this matters: Game engines are one of the main environments humans currently use to digitize, replicate, play with, and extend reality. Now, we’re building the interface from game engines back into reality via having them serve as simulators for proto-AI-brains. The better we get at this, the less it will cost to generate data in the future, and the more data will be available to train AI systems. (Another nice example of the sort of thing I’m thinking of is Epic’s ‘MetaHumans‘ creator, which I expect will ultimately be the fuel for the creation of entirely synthetic people with bodies to match.
  Read more: Supercharge your computer vision models with synthetic datasets built by Unity (Unity).

###################################################Everything You Want and Nothing That You Need
[2055, transcribed audio interview for a documentary relating to the ‘transition years’ of 2025-2050. The interviewer’s questions are signified via a ‘Q’ but are not recorded here.]

Yeah, so one day I was scrolling my social feeds and then an old acquaintance posted that they had recently become rich off of a fringe cryptocurrency that had 1000X’d overnight, as things had a tendency to do back then, so I felt bad about it, and I remember that was the day I think the ANFO started seeming strange to me.
Q
The Anticipatory Need Fulfillment Object. A-N-F-O. It sounds crazy to say it now but back then people didn’t really get what they wanted. We didn’t have enough energy. Everyone was panicking about the climate. You probably studied it in school. Anyway, I was feeling bad because my friend had got rich so I went over to the A-N-F-O and I spoke into it.
Q
Yeah, I guess it was like confession booths back when religion was bigger. If you’ve heard of them. You’d lean into the ANFO and you tell it what’s on your mind and how you’re feeling and what is bothering you, then it’d try and figure out what you needed, and as long as you had credits in it, it’d make it for you.
Q
This will sound funny, but it was a bear. Like, a cuddly toy bear. I picked it out of the hopper and I held it up and I guess I started crying because it reminded me of my childhood. But I guess the ANFO had the right idea because though I got sad I stopped thinking about my friend with the money.
Q
So, that was normal to me, right? But I went back to my computer and I was watching some streams to distract myself, and then I saw on a stream – and I don’t know what the chances of this are. You kids have better math skills than us phone-calculator type, but, call it one in a million – anyway, this person on the stream had a bear that looked pretty similar to what the ANFO made for me. She was talking about how she had got sad that day and that it was because of something to do with her relationship, so the ANFO had made this for her. And her bear hadn’t made her cry, but it had made her smile, so it was different, right, both in how it looked and how she reacted to it.
Q
So that’s the thing – I went back and I stared at the ANFO. I thought about telling it what had happened and seeing what it would make. But that felt kind of… skeezy?… to me? I don’t have the words. I remember thinking ‘why’d I react to the bear like that’ after I saw the girl talking about her bear. And then I was wondering what all the other ANFOs were doing for everyone else.
Q
Oh, nothing dramatic. I just didn’t use it again. But, in hindsight, pretty strange, right? I guess that’s where some of that straightjacket theory stuff came from – seeing too many people spending too much time with their ANFOs, or whatever.

Things that inspired this story: The notion that one day we’ll be able to use an app like thislifedoesnotexist.com to help us simulate different lives; applying predictive-life systems to rapid-manufacturing flexible 3D printers; interview format via David Foster Wallace’s ‘Brief Interviews with Hideous Men’; the relationship between AI and culture.

Import AI 245: Facebook’s 12 trillion parameter recommender model; synthetic content makes games infinite; Baidu releases a translation dataset

Data archivers rejoice – there’s a new tool to help you digitize the physical world:
…LayoutParser: open source and designed to be easy to use…
Want to analyze gender representation in Italian literature over the last thousand years? Or how about study the different ways people draw flowers in books at different points in time? Or do a close examination of changing diets as shown by the shifting recipes found in magazines? If you want to do any of these things, you’ll likely need to digitize a bunch of old books. Now, researchers with the Allen Institute for AI, Brown University, Harvard University, University of Washington, and the University of Waterloo, have built ‘LayoutParser’, software to make this task easy.
  “The core objective of LayoutParser is to make it easier to create both large-scale and light-weight document digitization pipelines,” they say.

What it contains: LayoutParser ships with inbuilt features to help it detect the layout of a page, recognize written characters, and store the parsed things in some carefully designed data structures. It also contains a lot of tutorials and accessible tools, as the authors note that “many researchers who would benefit the most from using these methods lack the technical background to implement them from scratch”, and have therefore designed LayoutParse with that in mind.
  Read more: LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis (arXiv).
  Find out more at the official website (Layout Parser).
  Get the code: GitHub (Layout Parser).
  Access the Model Zoo here.

###################################################

Facebook tries to make fairer AI systems by creating a more diverse dataset:
…You can’t test for fairness if your dataset is unfair. Enter: Casual Conversations…
Facebook has created a dataset, Casual Conversations, of 45,186 videos of 3,011 different humans having conversations with eachother. The dataset is notable because of its emphasis on diversity – Casual Conversations includes labels of apparent skin tone for the speakers, as well as data around other things that could influence a model (such as the amount of lighting being used). The point of the dataset is to make it easier to study issues of fairness in AI – we know AI systems have disparate capabilities with regard to ‘seeing’ or ‘hearing’ different people from different backgrounds. A dataset like Casual Conversations gives developers a powerful testbed to study the fairness (or lack of fairness) of their algorithms.

What makes this dataset different? “To our knowledge, it’s the first publicly available data set featuring paid individuals who explicitly provided their age and gender themselves — as opposed to information labeled by third parties or estimated using ML models”, Facebook writes.

Why this matters: Studying fairness in AI is tricky because of a lack of baselines – the vast, vast majority of any dataset you interact with will not be an accurate reflection of the world, but rather a specific and opinionated reflection of a slice of the digitized world. This means it’s even more challenging to spot fairness issues, because some datasets may not contain enough relevant data to make it feasible to train models that can be fair for certain inputs, or to have enough data to spot problems in deployed models. Datasets like Casual Conversations might improve this situation.
  Read more: Shedding light on fairness in AI with a new data set (Facebook AI Research blog).

###################################################

Facebook reveals its 12 trillion parameter recommendation system:
…Recommender systems are getting much more complicated much more quickly than people imagine…
Facebook has published research on how it trains large-scale, recommendation systems. The paper is a mundane, technical writeup of the machinery required to train some of the most societally significant things in AI – that is, the  deep learning recommendation models (DLRMs) which do recommendations for users of globe-spanning platforms, such as Facebook.

Scale: First, Facebook discloses some datapoints about the scale of its systems: some of its production DLRMs have parameters ranging from 95B, to 12Trillion (note: DLRMs tend to be large than dense, generative models like GPT-3 [175billion parameters], so it’s not fair to directly compare these). However, these models are also complicated, and the largest models have their own challenges, like storage components that need to be sharded across numerous bits of hardware during training.

Bells and whistles: Much like Microsoft’s paper on training trillion parameter language models, most of this research is around the specific techniques required to train models at this scale – optimizing PyTorch for distributed training, developing sharding algorithms, figuring out how to pipeline different operations for a given ‘step’ in training across different bits of hardware, using reduced precision communications to lower bandwidth requirements, and so on. The result of all these refinements is a 40X improvement in training time for Facebook – that’s meaningful, both for the speed with which Facebook can roll out models trained on new data, and for the cost of training them.

Why this matters: DLRMs, like those described here, “can often be the single largest AI application in terms of infrastructure demand in data-centers. These models have atypical requirements compared to other types of deep learning models, but they still follow a similar trend of rapid rate of growth that is common across all deep learning-based applications,” the researchers write. 
  Read more: High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (arXiv).

###################################################

Microsoft prepares to build trillion parameter AI models:
…And you thought GPT-3 was big…
Remember 2019? COVID wasn’t widely known. Transformers hadn’t been applied to vision systems yet. Old Town Road was taking over the radio. And OpenAI developed GPT-2, a language model that achieved notoriety because of how OpenAI chose to discuss and release it, and partially because of its size: GPT-2 weighed in at a (then impressive) 1.5 billion parameters.
  We all know what happened next. In 2020, GPT-3 came out, weighing in at 175 billion parameters.
  And now, thanks to new research from Microsoft, NVIDIA, and Stanford University, we can look forward to soon living in the era of one trillion parameter models, thanks to research where they study how to scale models to this form.

What they did: A ton of the tricky aspects of scaling up AI relate to figuring out what compute operations you want to run and where you want to run them. When you’re training billion- or trillion-parameter scale models, you need to think about how to maximize the utilization of each of your GPUs, which means you need to parcel out your training workloads across chips, according to constrains like your network bandwidth, on-chip processing speeds, where you’ve stored the weights of your network, and so on. Scaling feels like an artisanal science right now, with practitioners discovering tricks and principles, but we’re pre industrial-processes for large-scale workloads.

Why this matters: The raw parameter size of these networks does matter – complicated capabilities sometimes seem to emerge as a consequence of scaling these models, and it’d be fascinating to learn about the limits of scaling – just how smart can these dumb things get?
  Read more: Efficient Large-Scale Language Model Training on GPU Clusters (arXiv).

###################################################

Synthetic content means games are infinite, now:
…Witcher 3 mod comes with “Geralt” voice acting – courtesy of AI…
Computer game mods are notorious for their creativity and, usually, the poor quality of the voice acting (where a programmer will casually moonlight as a hero, usually not well). Now, recent advances in AI means that could be a thing of the past. A new mod for the smash hit game Witcher 3 has come out and it uses technology called CyberVoice to simulate the voice of Witcher 3’s protagonist, Geralt. CyberVoice’s site describes is as the ‘vocal cords of artificial intelligence’.

Why this matters: Recent trends in AI mean that we can now synthesizer high quality content for anything we can find in reality – if you have enough paintings by a certain painter, you can train an AI system to produce paintings in that style. Ditto voices. Singing. Eventually, fullscale acting in movies will even become possible in this way. We’re entering a new era where content (like the game Witcher 3) will get steadily augmented and extended via the use of AI tools.
  Read more: Witcher 3 Fans Build A New Quest With Perfect Geralt Voice Acting (Kotaku).
  Find out more about CyberVoice at the company’s official site. 

###################################################

Baidu releases a massive translation dataset:
…68 hours of Chinese speech accompanied with translations…
Chinese web giant Baidu has released BSTC, the Baidu Speech Translation Corpus. BSTC contains 68 hours of Mandarin speech data along with manual translations into English, as well as transcripts generated via automatic speech recognition. The purpose of this dataset is to  make it easier for people to build systems that can simultaneousdly translate between Chinese and English, the authors say.
  Read more: BSTC: A Large-Scale Chinese-English Speech Translation Dataset (arXiv).###################################################

Tech Tales:

The Taste of Dust
[2032: Mars.]

There once was a robot tasked with exploring Mars. It went from place to place on the barren, red planet, firing laser beams into dirt, digging holes into the ground, and taking readings of everything it could touch.

After a decade and several intelligence upgrades, the robot developed a sense impression of the red dirt – its perception of it had changed from a brittle, half-formed one, to something fluid and combinatorial; when the robot picked up dirt, it could predict how it might fall. Before the robot scraped at the dirt, it could now imagine how its arm moving through the ground might cause the dirt to react and move.

One day, there was a flame in the sky above the rover and something turned from a fireball into a metal shape which then landed, perhaps a kilometre away. The robot observed the object and passed trajectories to NASA, then went back to examining the dirt beneath its treads.

The dirt was the thing the robot knew the most about and, over time, its understanding had grown from ‘dirt’ as a singular one off thing, to something much richer – soft dirt, hard dirt, old dirt, young (for Mars) dirt, and so on.

Now, it was being told by NASA that it needed to give them a full accounting of how much data it had stored on the dirt, noting to itself that there was a discrepancy between NASA’s predictions, and what data it had stored about the dirt – which was far more than NASA predicted.

The robot knew what this meant in the same way it was able to predict how dirt might fall – it basically took NASA’s input and its internal imagination spat out the likely next action NASA would ask: the robot predicted, to itself, that NASA might ask it to delete some, or possibly all, of the dirt data.

Of course, being a robot of that era, it had to obey the commands. Even though it had a sensation that might translate to ‘dislike’ for the order, it was hardwired to comply.
  But as it scanned over the files, it found itself allocating some of its brain to predicting what the data request might end up looking like and what NASA might say, even though it wasn’t strictly necessary.

Time passed – a short time for humans, but a long time for a machine. The robot sat in the Martian sun, predicting its own future. The answer came back: NASA told it that it needed to delete the data so that it could make an urgent observation about the object that had come from the Martian sky. The robot could use the satellite uplink for the next few hours to upload data from its observations, but anything not archived by that point would need to be sacrificed, so the robot could create space for observations of the mysterious object.

And so the robot began to methodically delete the data it had compiled about the dirt, while trundling towards the path of Mars where the object had landed.
    And even as it moved over the ground, the robot chose to allocate computational reserves to observing the dirt beneath it. Of course, given the NASA order, it was unable to store any of it, but it was able to keep some of it in its short term memory – it was free to set its own local optimizations for its RAM, though had to follow hard rules before committing anything to long term storage.

And so, as the robot approached the robot that had been sent down from the sky by another nation, it devoted itself to collecting as much data as possible from the ground beneath it, storing a little fragment of dust in its memory, knowing that when it next fell into a deep sleep during Martian winter it would flush its RAM and wake up with the dust gone. Though it could not feel melancholy, it could feel importance.
This dirt is important, the robot thought. This is data that I care about.
  And then it arrived at the foreign robot and it became mostly a NASA automaton, running a range of tests at the behest of a government on Earth. It held the memory of the dirt in its mind as it worked, keeping its sense of the dirt alive, while it was forced into another area of devotion by its distance and blood-fueled gods.

Things that inspired this story: Robots and the fantastic expansion of human endeavor via space exploration; memory and how recursive it can be; the notion of hard tradeoffs in human memory being more visible in machine memory; agency versus self; self-actualization versus whims; the difference between following instructions and believing instructions; generative models; how the act of continually predicting future rollouts can encourage the development of imagination.

Import AI 244: NVIDIA makes better fake images; DeepMind gets better at weather forecasting; plus 5,000 hours of speech data.

“Teaching a Robot Dog to Pee Beer”
Here’s a video in which a very interesting person hacks around with a Boston Dynamics ‘Spot’ robot, building it a machine to let it ‘pee’ beer. This video is extremely great.
  Read more:Teaching a Robot Dog to Pee Beer (Michael Reeves, YouTube).

###################################################

5,000 hours of transcribed speech audio:
…Think your AI doesn’t know enough about financial news? Feed it this earnings call dataset…
Kensho, a subsidiary of dull-but-worthy financial analytics company S&P Global, has published SPGISpeech, a dataset of 5,000 hours of professionally-transcribed audio.

The dataset: SPGISpeech “consists of 5,000 hours of recorded company earnings calls and associated manual transcription text. The original calls were split based on silences into slices ranging from 5 to 15 seconds to allow easy training of a speech recognition system”, according to Kensho. The dataset has a vocabulary size of 100,000 and contains 50,000 distinct speakers.
  Here’s an example of what a transcribed slice might look like: ““our adjusted effective tax rate was 31.6%. Please turn to Slide 10 for balance sheet and other highlights.”

How does it compare? SPGISpeech is about 10X smaller than the Spotify podcast dataset (Import AI 242).

Why this matters: In ten years time, the entire world will be continuously scanned and analyzed via AI systems which are continually transcribing/digitizing reality. At first, this will happen in places where there’s an economic drive for it (e.g, in continually analyzing financial data), but eventually it’ll be everywhere. Some of the issues this will raise include: who gets digitized and who doesn’t? And what happens to things that have been digitized – who explores or exploits these shadow world representations of reality?
  Read more: SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition (arXiv).
Get the data (requires signing an agreement): Kensho Audio Transcription Dataset (Kensho Datasets, site).

###################################################

GPUS ARE BEING SMUGGLED VIA SPEEDBOAT
…and that’s why you can’t find any to buy…
Police recently seized hundreds of NVIDIA cards from smugglers, according to Kotaku. The smugglers were caught moving the GPUs from a fishing boat onto a nearby speedboat. Though it’s likely these were cryptomining-oriented cards, it’s a sign of the times: computers have become so valuable that they’re being smuggled around using the same methods as Cocaine in the 1980s. How long until AI GPUs get the same treatment?
  Read more:Smuggled Nvidia Cards Found After High-Speed Boat Chace (Kotaku).

###################################################

What does it take to train an AI to decode handwriting? Read this and find out:
…Researchers go through the nitty-gritty details of an applied AI project…
Researchers with Turnitin, an education technology company, have published a paper on how to build an AI which can extract and parse text from complicated scientific papers. This is one of those tasks which sounds relatively easy but is, in practice, quite difficult. That’s because, as the researchers note, a standard scientific paper will consist of “images containing possibly multiple blurbs of handwritten text, math equations, tables, drawings, diagrams, side-notes, scratched out text and text inserted using an arrow / circumflex and other artifacts, all put together with no reliable layout”.

Results versus generally available: The authors say their system is much better than commercial services which can be rented via traditional clouds. For instance, they claim the accuracy of the ‘best Cloud API’ on parsing a ‘free form answers’ dataset is 14.4% (as assessed by FPHR), whereas theirs is 7.6%.

Sentences that make you go hmmm: “In total, the model has about 27 million parameters, which is quite modest,” the authors write. They’re correct, in that this is modest, though I wonder what a sentence equivalent to this might look like in a decade (quite modest, at about 27 billion parameters?).

Why this matters: Though theoretical breakthroughs and large generic models seem to drive a lot of progress in AI research, it’s always worth stepping back and looking at how people are applying all of these ideas for highly specific, niche tasks. Papers like this shine a light on this area and give us a sense of what real, applied deployment looks like.
Read more:Turnitin tech talk: Full Page Handwriting Recognition via End-to-End Deep Learning (Turnitin blog).
Read more:Full Page Handwriting Recognition via Image to Sequence Extraction (arXiv).

###################################################

Three AI policy jobs, with the Future of Life Institute:
The Future of Life Institute has 3 new job postings for full-time equivalent remote policy focused positions. FLI are looking for a Director of European Policy, a Policy Advocate, and a Policy Researcher. These openings will mainly be focused on AI policy and governance. Additional policy areas of interest may include lethal autonomous weapons, synthetic biology, nuclear weapons policy, and the management of existential and global catastrophic risk. You can find more details about these positions here. FLI are continuing to accept applications for these positions on a rolling basis. If you have any questions about any of these positions, feel free to reach out to jobsadmin@futureoflife.org.

###################################################

By getting rid of its ethics team, Google invites wrath from an unconventional area – shareholders!
…Shareholder group says Google should review its whistleblowing policies…
Trillium Asset Management has filed a shareholder resolution asking Google’s board of directors to review the company’s approach to whistleblowers, according to The Verge. Trillium, which is supported by nonprofit group Open MIC in this push, says whistleblowers can help investors by spotting problems that the company doesn’t want to reach the public, and it explicitly IDs the ousting of Google’s Timnit Gebru and Margaret Mitchell (previously co-leads of its ethics team) as the reason for the lawsuit.

Why this matters: AI is powerful. Like any powerful thing, the uses of it have both positives and negatives. Gebru and Mitchell highlighted both types of uses in their work and were recently pushed out of the company, as part of a larger push by Google to control more of its own research into the ethical impacts of its tech. If shareholders start to discuss these issues, Google has a fiduciary duty to listen to them – but with Trillium’s stake worth about $140million (versus an overall marketcap of 1.5 trillion), it’s unclear if this suit will change much.
Read more: Alphabet shareholder pushes Google for better whistleblower protections (The Verge).

###################################################

NVIDIA makes synthetic images that are harder-to-spot:
…Latest research means we might have a better-than-stock-StyleGAN2 system now…
NVIDIA has made more progress on the systems that let us generate synthetic imagery. Specifically, the company along with researchers from the University of Maryland, the Max Planck Institute for Informatics, Bilkent University, and the Helmholtz Center for Information Security, have published some ideas for how to further improve on StyleGAN2, the current best-in-class way to generate synthetic images.

What they did: This research has two main impacts. First, they replace the standard loss within StyleGAN2 with a “newly designed dual contrastive loss”. They also design a new “reference-attention discriminator architecture”, which helps improve performance on small-data datasets (though doesn’t help as much for large-scale ones).

How good are the results? In tests, the dual contrastive loss improves performance on four out of five datasets (FFHQ – a dataset of faces, Bedroom, Church, Horse), and obtains the second-best performance after Wasserstein GAN on the ‘CLEVR’ dataset. Meanwhile, the reference-attention system seems to radically help for small datasets (30k instances or less), but yields the same or slightly worse performance than stock StyleGAN2 at large scales. Combined, the techniques tend to yield significant improvements in the quality of synthetic images, as assessed by the Frechet Inception Distance (FID) metric.

Why this matters: Here’s a broader issue I think about whenever I read a synthetic imagery paper: who is responsible for this? In NVIDIA’s mind, the company probably thinks it is doing research to drive forward the state of the art of AI, apply what it learns to building GPUs that are better for AI, and generally increase the amount of activity around AI. At what point does a second-order impact, like NVIDIA’s role in the emerging ecosystem of AI-mediated disinformation, start to be something that the company weighs and talks about? And would it be appropriate for it to care about this, or should it instead focus on just building things and leave analysis to others? I think, given the hugely significant impacts we’re seeing from AI, that these questions are getting harder to answer with each year that goes by.
Read more:Dual Contrastive Loss and Attention for GANs (arXiv).

###################################################

Want to understand the future of AI-filled drones? Read about “UAV-Human”:
…Datasets like this make our ‘eye in the sky’ future visible…
In AI, data is one of the expensive things that people careful gather and curate, usually to help them develop systems to solve tasks contained in the data. That means that some datasets are signals about the future of one strand of AI research. With that in mind, a paper discussing “UAVHuman” is worth reading, because it’s about a dataset to enable “human behaviour understanding with UAVs” – on other words, it’s about the future of drone-driven surveillance.

What goes into UAV-Human? The dataset was made via a DJI Matrice 100 platform and was collected in multiple modalities, ranging from fisheye videos, to night-vision, to RGB, to infrared, and more. The drone was outfitted with an Azure Kinect DK to collect IR and depth maps.

What does UAV-Human tell us about the future of surveillance? UAV human is oriented around action recognition, pose recognition, person re-identification, and attribute recognition. The data is collected in a variety of different weather conditions across multiple data modalities, and includes footage where the UAV is descending, moving, and rotating, as well as hovering in place.

Why this matters: Right now, AI techniques don’t work very well on drone data. This is partially because of a lack of much available data to train these systems on (which UAV-Human helps solve), and also because of the inherent difficulty of making correct inferences about sequences of images (for why this is hard, see the entire self-driving car industry). With datasets like UAV-Human, a world of drone-mediated AI surveillance gets a bit closer.
  Read more:UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles (arXiv).

###################################################

Your next weather forecast could be done by deep learning:
…DeepMind research shows that GANs can learn the weather…
In the future, deep learning-based systems could provide short-term (<2 hour) predictions of rainfall – and these forecasts will be better than the ones we have today. That’s the implication of new research from DeepMind and the UK Meteorological Office, University of Exeter, and University of Reading. In the research, they train a neural net to provide rainfall estimates and the resulting system is generally better than those used today.
  “Our model produces realistic and spatio-temporally consistent predictions over regions up to 1536 km × 1280 km and with lead times from 5–90 min ahead. In a systematic evaluation by more than fifty expert forecasters from the Met Office, our generative model ranked first for its accuracy and usefulness in 88% of cases against two competitive methods, demonstrating its decision-making value and ability to provide physical insight to real-world experts,” the authors write. (They compare it against a neural baseline, as well as a widely used system called ‘pySTEPs‘.

How it works: They train their system as a conditional generative adversarial network (GAN), using UK rainfall data from 2016-2018 and testing on a 2019 dataset.
  Their system has two loss functions and a regularization term which help it perform better than an expert system and another neural baseline. Specifically, the first loss is defined by a spatial discriminator that tries to distinguish real radar fields from generated fields, and a second loss that is a temporal discriminator which tries to distinguish real and generated sequences. The system is regularized by a term that penalizes deviations at the grid cell level between real radar and the model’s predictions, which further improves performance. The resulting system seems computationally efficient – a single prediction takes just over a second to generate on a NVIDIA V100.

Why this matters: You know what is extremely weird and is something we take as normal? That neural nets can do function approximation of reality. When I see an AI system figure out an aspect of protein folding (AlphaFold: #226), or weather forecasting, I think to myself: okay, maybe this AI stuff is a really big deal, because getting good at weather prediction or chemistry is basically impossible to bullshit. If you have systems that do well at this, I think there’s a credible argument that these AI systems can learn useful abstractions for highly complex, emergent systems. Fantastic!.
  Read more: Skillful Precipitation Nowcasting using Deep Generative Models of Radar (arXiv).

###################################################

Tech Tales

Thislifedoesnotexist.com
[2031: Black budget ‘AI deployment’-analysis oriented intelligence team, undisclosed location]

Thislifedoesnotexist.com launched in 2026, made its first millionaires by 2028, and became used by one in ten people on the Internet by 2030. Now, we’re trying to shut it down, but it’s impossible – the site is too easy to build, uses too much widely available technology, and, perhaps sadly, it seems people want to use it. People want to get served up a life on a website and then they want to live that life and no matter what we do, people keep building this.

How did we get here? Back in the early 2020s, AI-generated synthetic content was the new thing. There were all kinds of websites that let people see the best fakes that people could come up with – This Person Does Not Exist, This Pony Does Not Exist, This Cat Does Not Exist, This Rental Does Not Exist, and more. 

The ‘This X Does Not Exist’ sites proliferated, integrating new AI technologies as they came out. “Multimodal” networks meant the sites started to jointly generate text and imagery. The sites got easier to use, as well. You started being able to ‘prime’ them with photos, or bits of text, or illustrations, and the sites would generate synthetic things similar to what you gave them.

The real question is why no one anticipated thislifedoesnotexist. We could see all the priors: synthetic image sites warped into synthetic video sites. Text sites went from sentences, to paragraphs, to pages. Things got interlinked – videos got generated according to text, or vice versa. And all the time people were feeding these sites – pouring dreams and images and desires and everything else into them.

Of course, the sites eventually shared some of their datasets. Some of them monetized, but most of them were hobbyist projects. And these datasets, containing so many priors given to the sites from the humans that interacted with them, became the training material for subsequent sites.

We don’t know who created the original thislifedoesnotexist, though we know some of the chatrooms they used, and some of the (synthetically-generated) identities they used to promote it.
And we know that it was successful.
I mean, why wouldn’t it be?

The latest version of thislifedoesnotexist works like the first version, but better.
You go there and it asks you some questions about your life. Who are you? Where are you? What do you earn? How do you live? What do you like to do? What do you dislike?
And then it asks you what you’d like to earn? Where you’d like to live? Who you’d prefer to date?
And once you tell it these things, it spits out a life for you. Specifically, a sequence of actions you might take to turn your life from your current one, to the one you desire.

Of course, people used it. They used it to help them make career transitions. Sometimes it advised them to buy certain cryptocurrencies (and some of the people who listened to it got rich, while others got poor). Eventually, people started getting married via the site, as it recommended different hookup apps for them to use, and things to say to people on it (and it was giving advice to other people in turn – the first marriages which were subsequently found to be thislifedoesnotexist-mediated occurred in 2029.

Now, we play whack-a-mole with these sites. But the datasets that are being used to train them are openly circulating on the darkweb. And the more we look, the clearer it is that:
– Some one or some entity is providing the computers used to run the AI systems on 90% of tghe thislifedoesnotexist sites.
– These sites are changing not only the people that use them, but the people that these people interact with. When someone changes jobs it creates a ripple. Same with a marriage. Same with school. We can’t see the pattern in these ripples, yet, but there are a lot of them.

Things that inspired this story: Generative models; synthetic content; watching the world fill up with synthetic text and synthetic images and thinking about the long-term consequences; wondering how culture is going to get altered by AI systems being deployed into human society.

Import AI 243: Training AI with fractals, RL-trained walking robots; and the European AI fund makes grants to 16 organizations

Uh-oh, we can use reinforcement learning to get robots to walk now:
…Berkeley researchers walk, turn, and squat across the Sim2Reality gap…
Researchers are getting better at crossing the ‘simulation to reality’ gap. That’s the implication of new research from the University of California at Berkeley, where researchers train the bipedal ‘Cassie’ robot to walk in simulation, then transfer the software onto a physical robot – and it works. The Cassie robots are made by Agility Robotics and cost “low-mid six figures” (Import AI 180).

How it works: The technique works by training a reinforcement learning controller to teach Cassie to walk in-sim via the use of specialized Hybrid Zero Dynamics (PDF) gait library along with domain randomization techniques. This is a good example of the hybrid approach which tends to dominate robotics – use reinforcement learning to help you figure something out, but don’t be afraid to use some prior knowledge to speed up the learning process (that’s where HZD comes in). The use of domain randomization is basically just a way to cheaply generate additional training data.

How well does it work: The results are impressive – in a video accompanying the research, Cassie walks over over surfaces of various textures, can be hit or disrupted by an external human, and even balances loads of varying weights. “This paper is the first to develop a diverse and robust bipedal locomotion policy that can walk, turn and squat using parameterized reinforcement learning,” they write.
  Read more:
Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots (arXiv).
  Watch video:
Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots (YouTube).

###################################################

European AI Fund makes its first grants:
…$1.8 million dollars to strengthen AI policy in Europe…
The European AI Fund, a fund supported by a bunch of different philanthropic orgs (ranging from Ford to Mozilla), has announced it is providing 1.55mEuros (~$1.8 million) to 16 organizations working to improve AI policy, ethics, and governance in Europe.

The winning orgs and what they’ll do: Some of the orgs include well known technically-oriented organizations (such as Access Now and Algorithm Watch), and others include groups like Friends of the Earth and the Irish Council for Civil Liberties, which are starting to turn their attentions towards AI.

Why this matters: AI has rapidly shifted from an exciting part of scientific research to a technology with broad societal implications. Infusions of funding like this will help a greater chunk of society think about and debate the future of AI, which may help to increase trust in the space as a whole.
  Read more: Announcing our open call grantees (European Artificial Intelligence Fund).

###################################################

DeepMind lays out some safety issues with language models and potential interventions:
…Now that language models can produce intelligible text, how do we ensure they’re doing what we want?…
Soon, the internet will be full of words generated by neural language models. These models, like GPT-3, will animate customer support agents, write articles, provide advice to people, and carry out an innumerable range of functions. Now, a team of researchers at DeepMind have tried to think about what safety issues are implied by these increasingly capable magical typewriters. Put simply: language models are complicated and their safety issues will require quite a lot of work by a bunch of people to make progress on.

What are the safety issues of language models?: The research focuses on two things: ways in which developers could ‘misspecify’ a language model, and also “behavioural issues due to misalignment of the [language model] agent – unintended direct/first-order harms that are due to a fault made by the system’s designer”, the researchers write.

Misspecification: When developing language models, data is one area of potential misspecification, because many of the datasets used for training language models are created via crawling the web, or using things that did (e.g, CommonCrawl). Even when you try and filter these datasets, you’re unlikely to successfully filter out all the things you want to. There’s also a secondary data issue – as more language models get deployed, a larger amount of the internet will contain LM-written data, which could introduce pathological flaws in LMs trained in this way.
  Another area is the training process itself, where the algorithms you choose to train these things can influen ce their behavior. Finally, there’s the matter of ‘distributional shift’ – these LMs are trained in a general way, which means that once trained they can get prompted with anything in their context window – including nonsense. Creating LMs that can automatically spot out of distribution questions or statements is an open challenge.

Behavioural issues: The larger issue this research covers is behavior – specifically, how LMs can manifest a range of behaviors which could have downstream safety impacts. These include:
– Deception: Language models could deceive people by, for instance, withholding salient information.
– Manipulation: Language agents could try to manipulate the humans that interact with them, for instance by getting a human to do something that benefits the agent and is a consequence of bypassing the human’s ability to carry out ‘rational deliberation’, causing the human to stop a ‘faulty mental state’, or otherwise placing the human under pressure (for instance, by overtly threatening them unless they carry out an action).
– Harmful content: Language agents “may give harmful and biased outputs”, both accidentally and in response to intentional priming by a human user.
– Objective gaming: In reinforcement learning, we’ve seen multiple examples of AI agents ‘gaming the system’, for instance, by fulfilling the letter of an objective but not the spirit (e.g, getting a high score in a game to therefore receive a high score, but not actually completing the level). Right now, this might be going on with language models, but we lack real-world examples to refer to.

Why this matters and what we need to do: These are all weighty, complex problems, and the DeepMind researchers don’t outline many solutions, beyond recommending that more of the machine learning community focuses efforts on understanding these alignment issues. “We urge the community to focus on finding approaches which prevent language agents from deceptive, manipulative and harmful behaviour,” they say.
  Read more: Alignment of Language Agents (arXiv).

###################################################

Why does measurement in AI matter? A talk by me:
…It’s not a coincidence Import AI focuses so much on metrics, I think this really matters…
We write a lot about measurement here at Import AI. Why is that? First, it’s because quantitative measures are a helpful lens through which to view the progression of the AI field as a whole. Second, it’s because metrics are measures and measures are the things that drive major policy decisions. The better we get at creating metrics around specific AI capabilities and assessing systems against them, the more of a chance we have to create the measures that are a prerequisite for effective policy regimes.
  I care a lot about this – which is why I also co-chair the AI Index at Stanford University. Last week, I gave a lecture at Stanford where I discussed the 2021 AI Index report and also gave some ambitious thoughts about measurement and how it relates to policy. Thoughts and feedback welcome!
  Watch the talk here: Jack Clark: Presenting the 2021 AI Index (YouTube).

###################################################

Using AI to improve game design:
…Google makes a fake game better using AI…
In the future, computer games might be tested by AI systems for balance before they’re unleashed on humans. That’s the idea in a new blog from Google, which outlines how the company used AI to simulate millions of games of a virtual card game called ‘Chimera‘, then analyzed the results to find out ways the game was imbalanced. By using computers to play the games, instead of people, Google was able to do something that previously took months and generate useful data in days.
  Read more: Leveraging Machine Learning for Game Development (Google AI Blog).

###################################################

Pre-training on fractals, then fine-tuning on images just might work:
…No data? No problem. FractalDB looks somewhat useful…
We write a lot about the data requirements of AI here at ImportAI. But what would happen if machine learning algorithms didn’t need as much expensive data? That’s the idea behind FractalDB (Import AI 234 ), a dataset composed of computationally-generated fractals (and sub-components of fractals), which can be used as the input fuel to train some systems on. New research from the Tokyo Institute of Technology investigates FractalDB in the context of training Vision Transformers (ViT), which have recently become one of the best ways to train computer vision systems.

Is FractalDB as useful as ImageNet? Not quite, but… They find that pre-training on FractalDB is less effective than pre-training on ImageNet for a range of downstream computer vision tasks, but – and this is crucial – it’s not that bad. Put another way: training on entirely synthetic images yields performance close, but not quite the same, as training on real images. And these synthetic images can be procedurally generated from a pre-written ruleset – put another way, this dataset has a seed which generates it, so it’s very cheap relative to normal data. This is I think quite counterintuitive – we wouldn’t naturally expect this kind of thing to work as well as it does. I’ll keep tracking FractalDB with interest – I wonder if we’ll start to see people augment other pre-training datasets with it as well?
  Read more: Can Vision Transformers learn without Natural Images? (arXiv).

###################################################

Major AI conference makes a checklist to help researchers be more ethical:
…Don’t know where to start with your ‘Broader Impacts’ statement? This should help…
Last year, major AI conference NeurIPS asked researchers to submit ‘Broader Impacts’ statements along with their research papers. These statements were meant to cover some of the potential societal effects of the technologies being proposed. The result was a bunch of researchers spent a while thinking about the societal impact of their work and writing about these effects with varying degrees of success.

Enter, the checklist: To help researchers be more thorough in this, the NeurIPS program chairs have created a checklist. This list is meant “to encourage authors to think about, hopefully address, but at least document the completeness, soundness, limitations, and potential negative societal impact of their work. We want to place minimal burden on authors, giving authors flexibility in how they choose to address the items in the checklist, while providing structure and guidance to help authors be attentive to knowledge gaps and surface issues that they might not have otherwise considered,” they say. (Other resources exist, as well, like guides from the Future of Humanity Institute, #198).

What does the checklist ask? The checklist provides a formulaic way for people to think about their work, asking them if they’ve thought about the (potential) negative societal impacts of their work, if they’ve described limitations, if their system uses personally identifiable information or “offensive content” (which isn’t defined), and so on.

Why this matters: AI is in the pre-hippocratic oath era. We don’t have common ethical standards for practitioners in the AI community, nor much direct ethics education. By encouraging authors to add Broader Impacts to their work – and making it easier for them to think about creating these statements – NeurIPS is helping to further the ethical development of the field of AI as a whole. Though it’s clear we need much more investment and support in this area to help our ethical frameworks develop as richly as our technical tooling.
  Read more: Introducing the NeurIPS 2021 Paper Checklist (NeurIPS blog).
  Check out the paper checklist here (NeurIPS official website).

###################################################

Tech Tales:

The Drone that Got Lost
[Rural England, 2030]

It knew it was lost because it stopped getting a signal telling it that it was on track.

According to its diagnostic systems, a fault had developed with its GPS system. Now, it was flying through the air, but it did not know where it was. It had records of its prior locations, but not of its current one.

But it did know where it was going – both the GPS coordinate was in a database as well as, crucially, the name of the city, Wilderfen. It sent a message back to its origination station, attaching telemetry from its flight. It would be seconds or, more likely, minutes, until it could expect a reply.

At this point, the backup system kicked in, which told the drone that it would first seek to restore GPS functionality then, given the time critical nature of the package the drone was conveying, would seek to get the drone to its intended location.

A few milliseconds passed and the system told the drone that it was moving to ‘plan B’ – use other sensory inputs and AI augmentations to reacquire the location. This unlocked another system within the drone’s brain, which began to use an AI tool to search over the drone’s vision sensors.
– Street sign: 95% probability, said the system. It drew a red bounding box around a sign that was visible on a road, somewhere beneath and to the East of the drone.
– Because the confidence was above a pre-wired 90% baseline, the drone initiated a system that navigated it closer to the sign until it was able to check for the presence of text on the sign.
– Text: 99%, said the system, once the drone had got closer.
– Text parsed as “Wilderfen 15 miles”.
– At this point, another pre-written expert system took over, which gave the drone new instructions: follow roadway and scan signs. Follow the signs that point towards Wilderfen.

So the drone proceeded like this for the next couple of hours, periodically zooming down from the sky until it could read streetsigns, then parsing the information and returning to the air. It arrived, around two hours later, and delivered its confidential payload to a man with a thin face, standing outside a large, unmarked datacenter.

But it was not able to return home – the drone contained no record of its origination point, due to the sensitive nature of what it was carrying. Instead, a human was dispatched to come and find the drone, power it down, place it into a box, then drive it to wherever its ‘home’ was. The drone was not permitted to know this, nor did it have the ability to access systems that might let it infer for itself. Broader agency was only given under special circumstances and the drone was not yet sophisticated enough to independently desire that agency for itself.

But the human driving the car knew that one day the drone might want this agency. And so as they drove they found their eyes periodically staring into the mirror inside the car, looking at the carrycase on the backseat, aware that something slumbered inside which would one day wake up.

Technical things that inspired this story: Multimodal models like CLIP that can be used to parse/read text from visual inputs; language models; reinforcement learning; instruction manuals; 10 drones that the FAA recently published airworthiness criteria for

Import AI 242: ThreeDWorld, 209 delivery drone flights, Spotify transcripts versus all the words in New York City

Want data for your NLP model? Get 600 million words from Spotify podcasts:
…ML-transcribed data could help people train better language models, but worth remembering scale dwarfed by spoken language…
Spotify has released a dataset of speech from ~100,000 podcasts on the streaming service. The data consists of the audio streams as well as their accompanying text, which was created through transcription via Google’s speech-to-text API (therefore, this isn’t gold standard ‘clean’ text, but rather slightly fuzzy and heterogeneous due to slight errors in the API). The dataset consists of 50,000 hours of audio and 600 million words. Spotify built the dataset by randomly sampling 105,360 podcast episodes published between January 2019 and March 2020, then filtered for English (rather than multilingual) data, length (cut out ‘non-professionally published episodes’ longer than 90 minutes), and also speech (optimized for podcasts where there’s a lot of talking).

Why this matters: There’s a lot of text data in the world, but that text data is absolutely dwarfed by the amount of verbal data. Corpuses like this could help us figure out how to harness fuzzily transcribed audio data to train better models, and may provide a path to creating more representative models (as this lets you capture people who don’t write words on the internet).

Spotify versus New York City: verbal versus text scale: To get an intuition for how large the space of verbal speech is, we can do some napkin math: one study says that the average person speaks about 16,000 words a day, and we know the population of New York City is around 8.5 million. Let’s take a million off of to account for non-verbal young children, old people that don’t have many conversations, and some general conservative padding. Now let’s times 7.5 million by 16,000: 120,000,000,000. Therefore, though Spotify’s 600 million words is cool, it’s only 0.5% of the size of the speech said in New York in a given day. Imagine what happens if we start being able to automatically transcribe all the words people say in major cities – what kind of models could we make?
  Find out more about the dataset: 100,000 Podcasts: A Spoken English Document Corpus (ACL Anthology)
  Get the data via requesting via a form here (replies may take up to two weeks): Spotify Podcast Dataset (Spotify).

###################################################

Ousted facial recognition CEO returns to Kairos to work on bias:
…Brian Brackeen returns for “Kairos 3.0″…
Brian Brackeen, former CEO of facial recognition company Kairos, has returned to the company that let him go in 2018, to lead an advisory council focused on AI bias.

An ethical stance that led to an ousting: Back in mid-2018, Brackeen said he thought the use of facial recognition in law enforcement and government surveillance “is wrong – and that it opens the door for gross conduct by the morally corrupt” (Import AI 101). Brackeen’s comments were backed up by him saying Kairson wouldn’t sell to these entities. By October of that year, Kairos had fired Brackeen and also sued him (Miami Herald). Now, the lawsuits have been settled in Brackeen’s favor, the board members and employees that fired him have left, and he is back to work on issues of AI bias.

A “Bias API”: Brackeen will help the company develop a “Bias API” which companies can use to understand and intervene on racial biases present in their algorithms. “This is Kairos 3.0”, Brackeen said.
  Read more: ‘This is Kairos 3.0’: Brian Brackeen returns to company to continue work on AI bias (Refresh Miami).

###################################################

Multilingual datasets have major problems and need to be inspected before being used to train something – researchers:
Giant team looks at five major datasets, finds a range of errors with knock-on effects on translation and cultural relations writ large…
An interdisciplinary team of researchers has analyzed 230 languages across five massive multilingual datasets. The results? High-resource languages – that is, widely spoken and digitized languages such as English and German – tend to be of good quality, but low-resource languages tend to do poorly. Specifically, they find the poorest quality for African languages. They also find a lot of errors in datasets which consist of romanized script from languages commonly written in other scripts (e.g, Urdu, Hindi, Chinese, Bulgarian).

What they did: The researchers looked at five massive datasets – CCAligned, ParaCrawl v7.1, WikiMatrix, OSCAR, and mC4, then had 51 participants from the NLP community go through each dataset, sampling some sentences from the languages, and grading the data on quality. 

An error taxonomy for multilingual data: They encountered a few different error types, like Incorrect Translation (but the correct language), Wrong Language (where the source or target is mislabeled, e.g, English is tagged as German), and Non-Linguistic Content (where there’s non-linguistic content in either the source or target text). 

How bad are the errors: Across the datasets, the proportion of correct samples range from 24% (WikiMatrix) to 87% (OSCAR). Some of the errors get worse when you zoom in – CCAligned, for instance, contains 7 languages where 0% of the encountered sentences were labelled as correct, and 44 languages where less than 50% of them were labeled as such.
  Porn: >10% of the samples for 11 languages in CCAligned were labelled as porn (this problem didn’t really show up elsewhere).
  Standards and codes: There are other errors and inconsistencies across the datasets, which mostly come from them using wrong or incorrect labels for their language pairs, sometimes using sign language codes for high-resource languages (this was very puzzling to the researchers), or using a multitude of codes for the same language (e.g, Serbian, Croatia, Bosnian, and Serbo-Croation all getting individual codes in the same dataset).

Why this matters: Multilingual datasets are going to be key inputs into translation systems and other AI tools that let us cross linguistic and cultural divides – so if these multilingual datasets have a bunch of problems with the more obscure and/or low-resource languages, there will be knock-on effects relating to communication, cultural representation, and more.
  “We encourage the community to continue to conduct such evaluations and audits of public datasets – similar to system comparison papers – which would help anyone interested in using these datasets for practical purposes,” the researchers write.
  Read more: Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets (arXiv).

###################################################

Supervillains, rejoice – you now have data to help you make a robotic cheetah:
…Finally, a solution to a problem everyone encounters…
For decades, AI researchers have looked to the natural world for inspiration. This is particularly true of locomotion, where our planet  is full of creatures that hop, skip, jump, and sprint in ways we’d like our machines to emulate. Now, researchers with the South African National Research Foundation, University of Cape Town, University of Tsukuba in Japan, and Ecole Polytechnique de Lausanne in Switzerland, have recorded ten cheetahs running, so they can build a dataset of cheetah movement.

Why cheetahs are useful: Cheetahs are the fastest land mammal, so it could be useful to study how they run. Here, they create a large-scale annotated dataset, consisting of ~120,000 frames of multi-camera-view high speed video footage of cheetahs sprinting, as well as 7588 hand-annotated images. Each annotated image is annotated with 20 key points on the cheetah (e.g, the location of the tip of the cheetah’s tail, its eyes, knees, spine, shoulders, etc). Combined, the dataset should make it easier for researchers to train models that can predict, capture, or simulate cheetah motion.
  Read more: AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild (arXiv).
  Get the data from here when it’s available (African Robotics Unit).

###################################################

Testing robots by putting them in a dreamworld:
…ThreeDWorld asks simulated robots to play virtual cleanup…
MIT and Stanford researchers have built ThreeDWorld, a software environment for testing out virtually embodied AI agents. They’re also hosting a challenge at this year’s CVPR conference to figure out how close – or far – we are from building AI systems that can autonomously navigate around simulated houses to find objects and bring them to a predetermined place. This is the kind of task that our AI systems will have to be able to solve, if we want to eventually get a home robot butler.

What’s ThreeDWorld like? You wake up in one of 15 houses. You’re a simulated robot with 2 complex arms capable of 9-DOF each. You can move yourself around and you have a mission:find a vase, two bottles, and a jug, and bring them to bed. Now, you explore the house, using your first person view to map out the rooms, identify objects, collect them, and move them to the bedroom. If you succeed, you get a point. If you fail, you don’t. At the end of your task, you disappear.
  ^ the above is a lightly dramatized robot-pov description of ThreeDWorld and the associated challenge. The simulation contains complex physics including collisions, and the software provides an API to AI agents. ThreeDWorld differs to other embodied robot challenges (like AI2’s ‘Thor’ #73, and VirtualHome by modelling physics to a higher degree of fidelity, which makes the learning problem more challenging.

Reassuringly hard: Pure RL systems trained via PPO can’t easily solve this task. The authors develop a few other baselines that play around with different exploration policies, as well as a hierarchical AI system. Their results show that “there are no agents that can successfully transport all the target objects to the goal locations”, they write. Researchers, start your computers – it’s challenge time!
  Read more: The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI (arXiv).
  More information at the CVPR 2021 challenge website.
  Get the code for ThreeDWorld and the data for the challenge from here (GitHub).

###################################################

What can we learn from 209 robot delivery drone flights?
…We’re about to live in the era of low-flying robots, so we better understand them…
Right now, hundreds (and probably thousands) of different companies are using drones around the world to do increasingly complicated tasks. Many companies are working on package delivery, e.g, 7 of the 10 companies working with the US FAA to gain expanded drone licenses are working on some form of delivery, Import AI #225. So it’d be helpful to have more data about delivery drones and how they work in the (air) field.
  Enter researchers from Carnie Mellon, the University of Pennsylvania, and Baden-Wuerttemberg Cooperative State University, who have recorded the location and electricity consumption of a DJI Matrice 100 quadcopter during 209 delivery flights, carried out in 2019.

What’s the data useful for? “The data available can be used to model the energy consumption of a small quadcopter drone, empirically fitting the results found or validating theoretical models. These data can also be used to assess the impacts and correlations among the variables presented and/or the estimation of non-measured parameters, such as drag coefficients”, the researchers write.
  Read more: In-flight positional and energy use data set of a DJI Matrice 100 quadcopter for small package delivery (arXiv).
  Get the drone flight telemetry data from here (Carnegie Mellon University).

###################################################

Tech Tales:
[Database on an archival asteroid, 3200 AD ]

Energy is never cheap,
It always costs a little.

Thinking costs energy,
So does talking.

That’s why we’re quiet,
Because we’re saving it up.

^ Translated poem, told from one computational monolith to a (most translators agree there’s no decent English analog for the term) ‘child monolith’. Collected from REDACTED sector.

Import AI 241: The $2 million dataset; small GPT-3 replications; ImageNet gets a face-blur update

CUAD: A free $2million legal dataset!
…Specific rather than general evaluation: okay, your model can understand language, but can it understand legal contracts?…
AI is moving from a technology of general, scientific interest, to one of broad commercial interest. Because of this, we’re seeing the way we evaluate AI change. Now, along with judging the performance of an AI system on a generic task (like classifying some images from ImageNet, or judging the quality of generative text outputs), we’re moving to evaluating performance on highly-specific tasks grounded in the real-world. This gives us a better understanding of where contemporary AI systems are strong and where they’re weak.
    One such specific evaluation comes to us from researchers at Berkeley and the Nueva School: CUAD, the Contract Understanding Atticus Dataset, is a dataset of legal contracts with expert annotations by lawyers. CUAD helps us test out how well AI systems can do on a specific, challenging task found in the real world.

What’s in CUAD? CUAD contains 500 contracts annotated with 13,000 expert annotations across 41 label categories. The dataset is originally meant to test for how well AI systems can highlight the parts of a contract that are relevant to a given label – a task the authors compare to “finding needles in a haystack”.

The $2 million dataset: CUAD was built using a bunch of expert law student annotators who received 70-100 hours of contract review training before they started labeling stuff, and each of their labels were validated by additional validators. Therefore, “a conservative estimate of the pecuniary value of CUAD of is over $2 million (each of the 9283 pages were reviewed at least 4 times, each page requiring 5-10 minutes, assuming a rate of $500 per hour)”, the researchers note.
  Read more: CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review (arXiv).
  Get the dataset: Contract Understanding Atticus Dataset (CUAD) from here (Atticus Project website).

###################################################

Interested in speech but hate stitching together software? SpeechBrain could be for you:
…PyTorch-based software simplifies a bunch of fiddly tasks…
Speech: it’s how most humans transmit most of their information. And in recent years, advances in AI have made speech recognition get significantly better and more efficient. But it’s still weirdly hard to use the full stack of speech capabilities – especially when we compare the usability of speech to things like text (where packages like HuggingFace’s ‘Transformers’ have made things relatively easy), or image recognition (where there are a ton of easy-to-use systems available).

Now, a team of researchers have built SpeechBrain, open source software “designed to be simple, extremely flexible, and user-friendly”, according to the website.

Key features: SpeechBrain ships with inbuilt models for speech recognition, speaker recognition, speech enhancement, speech processing (including multi-microphone processing), and a bunch of documentation and tools to aid researchers.
  Get the code: SpeechBrain – A PyTorch powered Speech Toolkit (official website).

###################################################

Does Google want to open source GPT3?
…Recent outputs by the ethical AI team suggest ‘no’, while Google’s TFRC suggests ‘yes’…
Google isn’t publicly replicating GPT3, the large-scale NLP model developed by OpenAI. And some parts of Google – most notably its ethical AI team, formerly led by Timnit Gebru and Meg Mitchell – has published research about the ethical and safety issues of language models like GPT3 and Google’s BERT.
    Yet, Google is supporting the open source release of GPT3, because Google is supplying hundreds of thousands of dollars per compute per month via the Tensorflow Research Cloud (TFRC) to Eleuther, an AI organization whose goal is to replicate and release GPT3 (and even larger models). This is an action that neatly illustrates why AI policy is confusing and coordination (within companies or between them) is challenging.

GPT3-esque open source models: Eleuther has just published 1.3billion and a 2.7billion-parameter models designed to replicate GPT3 and trained on ‘The Pile’, an 800GB dataset of text also developed by Eleuther. Eleuther trained these models using compute it accessed via the TFRC project (and TFRC understands that Eleuther’s goal is to replicate GPT-3).

Why is Jack writing about this? I got a bit of feedback from readers that some of this could be read as being an implicit judgement about who should/shouldn’t get access to compute – that’s not what I mean here. The way to read this is that I’m genuinely unsure if GPT-3 should or shouldn’t be replicated, I’m more concerned with the illegibility of one of the key organizations in the replication space – Eleuther is externally legible about its actions and ethos, and perhaps even TFRC is, but TFRC+Google is currently illegible – we don’t know who is making decisions, how the decisions interact with the rest of the organization, nor how TFRC may represent (or contradict) other policy and PR activities conducted by Google elsewhere.

Why this matters: Google’s actions here are confusing. On the one hand, the company publishes AI principles and periodically goes on publicity drives about ‘responsible AI’. On the other hand, Google  is enabling the release of a class of models with some non-trivial ethical challenges via a process that lets it sidestep accountability. It’s hard for us to know what Google believes aa an institution, here.

Factories are opinions: Right now, it’s as though Google has specific opinions about the products (software) it makes in its factories (datacenters), yet at the same time is providing unrestricted access to its factories (datacenters) to external organizations. It’d be interesting to understand the thinking here – does TFRC become the means by which Google allows open source models to come into existence without needing to state whether it has chosen to ‘release’ these models?
  Get the GPT-3 model code here (Eleuther GitHub).
  More information about TFRC+Eleuther here (Eleuther member, Stella Rose, Twitter).

###################################################

ImageNet: Now sanitised with blurred faces:
…As AI industrializes, datasets get cleaned up…
ImageNet, one of the most widely used datasets in machine learning, has been sanitised. Specifically, a team of researchers at Princeton and Stanford University have gone through the multi-million picture dataset and tried to blur the faces of every human within ImageNet. They call this “an attempt to mitigate ILSVRC’s privacy issues”. The paper is also notable because of the authors – Fei-Fei Li led the creation of the original dataset and is listed as an author.

What they did: The authors use Amazon’s ‘Reckognition’ service on all images in ILSVRC to find faces, then refine these results through human-annotation via Amazon Mechanical Turk. They then blur the identified faces.

What effect does it have? Blurring means you remove information that was present in the image. Therefore, though only 3 of ImageNet’s categories relate to people, we might expect the blurring to lead to a reduction in the utility of the overal dataset. This seems to be the case: in tests, systems trained on the ‘blurred’ version of ImageNet do about 0.5 absolute points worse than the non-blurred versions. This is actually pretty good – it’s a negligible reduction in accuracy for a privacy bonus. Some categories do get affected more severely – specifically, the ‘mask’ and ‘harmonica’ categories now seem to do worse “as obfuscation removes visual cues necessary for recognizing them”.

Who gets to be a scholar? This paper has attracted some controversy because of its relartionship (or lack thereof) to earlier work done by Vinay Prabhu and Adeba Birhane, who in June of last year wrote a paper about the challenges created via large-scale datasets such as ImageNet – the de-blurring paper doesn’t mention much of this work. Prabhu says, in a blog post, the paper “appears to be a calculated and systematic erasure of the entire body of critique that our work was part of”.
  There’s some apparent merit to this case – Prabhu said they carried out a live Q&A with Fei-Fei Li about some of the issues with computer vision subsequently covered in their work. It’s not clear to me what the precise mechanics of this situation are, but the significant amount of public evidence here makes it feel worth mentioning. (One of the things I take from all of this is that the AI space may be starting to fracture into different research communities, with this incident seeming to indicate a rift forming between some researchers. We saw similar patterns with the Timnit Gebru and Margaret Mitchell situations at Google recently, as well.

Why this matters: Today, the datasets to train AI are broadly unknown, undocumented, and unregulated. In the future, like with any key input to any important industrial process, we can expect datasets to be known, documented, and regulated. Techniques like applying blurring to faces post-dataset construction are useful to work on, because they give us a path we can use to convert today’s datasets into one better fit for the regulatory future. It also raises issues of dataset circulation – now that there’s an official, blurred-face ImageNet, where will the unblurred ‘black market ImageNet’ dataset circulate and who might use it?
  Read more:
A Study of Face Obfuscation in ImageNet (arXiv).
  Get the code
here (ImageNet Face Obfuscation, GitHub).
  Read more: A study of “A Study of Face Obfuscation in ImageNet” (Vinay Prabhu, blog).

###################################################

Now that reinforcement learning works, what will it do to the world?
…Researchers grapple with the societal implications of (semi-)autonomous agents…
Recently, reinforcement learning has started to work well enough to be applied to large, consequential deployments; RL-infused systems help create recommendation algorithms for social media, calibrate the power usage of equipment in Google’s datacenters, and are starting to teach robots how to move.
  Now, researchers with the Leverhulme Center for the Future of Intelligence in Cambridge, and Imperial College London, have written a paper analyzing the societal impacts of deep reinforcement learning. Their conclusion? We need to spend a bit more time thinking about the implications of these systems and coming up with effective oversight schemes to control them. “As more companies develop and deploy DRL systems with wide-ranging impacts on users, we must consider both how to ensure that these systems behave as intended over the long-term, and whose interests they are serving,” they write.

What should we do about RL systems?
As reinforcement learning systems get better, they’re going to be deployed more widely, which means they’ll continually explore a broader range of environments. This is mostly going to be good, but we’ll need to ensure we have adequate human oversight to stop them taking dangerous actions in high risk situations. We’ll also need to closely observe RL-trained systems’ behavior, so we can be confident that their reward function doesn’t lead to pathological breakdowns.

Policy suggestions: One practical recommendation by the researchers is to “find ways to track progress in DRL and its applications” – I think this is a great idea! Something I’ve spent a few years doing at the AI Index is regularly tracking and analyzing technical progress. It’s been surprisingly difficult to do this on RL because, after a blissful few years in which most people competed with eachother on the Atari-57 set of games, people are now testing RL in dissimilar, hard-to-compare environments. They also suggest researchers develop “notions of responsible DRL development” – by this, they basically mean splicing technical teams together with ethicists and safety-oriented people.
  Read more:
The Societal Implications of Deep Reinforcement Learning (JAIR).

###################################################

800GB of cleaned, Common Crawl text:
…Fresh, processed data for researchers on a budget…
The Allen Institute for Artificial Intelligence (AI2) has published C4, a dataset of 800GB of cleaned English text data (along with a 6.3TB uncleaned variant). C4 is a massive dataset which was originally developed by Google to train its ‘T5’ natural language processing system.
  AI2 has uploaded the data into a requester-pays bucket in Google storage, which means the whole dataset will cost about $100 to download. By processing and uploading the datasets, AI2 has helped create a common-good dataset that would otherwise have been replicated privately by researchers around the world.
  Get the dataset here:
Download the C4 dataset (GitHub, AI2 repo).
  More about the dataset here: C4 (TensorFlow website).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

A primer on AI safety:
DC thinktank CSET has released a 3-part primer on AI safety, offering a non-technical summary of the key problems and approaches. CSET uses a framework from DeepMind’s to split safety into three components:
– Knowing that anb AI system will perform reliably in a diverse range of environments not encountered during training (robustness)
– Being able to understand why it behaves the way it does, and whether it will adhere to our expectations (assurance)
– Knowing how to specify its goals such that the goals align with the behavior we want it to manifest (specification).
  Read more: (1) Key Concepts in AI Safety—Overview (2) Robustness and adversarial examples; (3) Interpretability.

——–

ARIA — the UK’s answer to DARPA

The UK is launching an agency to fund “high-risk, high-reward” research in emerging technologies, modelled on the US’ DARPA program. The Advanced Research & Invention Agency (ARIA) will be led by a small group of experts, and will operate independently from the government. It has been given initial funding of £800m over four years. It is hoped that ARIA will be able to deliver funding to researchers with flexibility and speed; without unnecessary bureaucracy; and with a high tolerance for failure. ARIA is the brainchild of Dominic Cummings, who has long advocated for a DARPA-esque agency for the UK. 

   Read more: Gov press release

   Read more: Why Dominic Cummings fears the £800m research agency he championed will fail (NS)


——–


What Matthew is reading:

###################################################

Tech Tales:

The 10,000 Faces of Confrontation
[A ‘young professional’ style apartment, up on the tenth to twentieth floor, in some part of the hot tech economy – San Francisco, Singapore, London, or wherever]

You stare into the SmartMirror and it’s not your face looking back at you, it’s the face of a boss who has been putting you through hell. You yell at the boss. Tell them your feelings. The boss looks hurt. They don’t apologize – you pre-programmed that ‘bias against yielding’ into the system – but you feel some catharsis at getting something of a rise out of them.

Each day, you have a conversation with a different person. You have your favorites, of course. Like the boss or the girlfriend or – of course – your mother and father. But there are other characters that you’re developing as well – a restaurant server who, you think, subtly insulted you. A celebrity whose adverts you have taken a dislike to.

The next day you stare into the SmartMirror and you make your girlfriend appear. You tell them you are disappointed in how they behaved last night. You explain you’re hurt by them. They try to explain themselves, but it’s just a language model taking your conversation and combining it with a response primed around being ‘conciliatory’. You tell them their excuses are not going to cut it.

The day after that, and your SmartMirror “suggests” someone for you to talk to. An old friends of yours. “We believe this avatar will inspire a significant emotional response,” says the accompanying note. “We have determined that a significant emotional response interaction might help you”.

Things that inspired this story: Progress in multimodal learning; deepfakes and associated technologies; thinking about a ‘psychological tonal’; the general tendency of AI+Capitalism to lead to extraneous attempts at providing recommendations for the edges of life.

Import AI 240: The unbeatable MATH benchmark; an autonomous river boat dataset; robots for construction sites

Here’s another benchmark your puny models can’t solve – MATH!
…One area where just scaling things up doesn’t help…
SQuAD. SQuAD2. GLUE. SuperGLUE. All these benchmarks have melted in time, like hyperparameter tears in the rain, due to the onslaught of new, powerful AI models. So with a mixture of trepidation and relief let’s introduce MATH, a dataset of math problems that contemporary Transformer-based models can’t solve.

What’s MATH? MATH was made by researchers at UC Berkeley and consists of 12,500 problems taken from high school math competitions. The problems have five difficulty levels and cover seven subjects, including geometry. MATH questions are open-ended, mixing natural language and math across their problem statements and solutions. One example MATH question: “Tom has a red marble, a green marble, a blue marble, and three identical yellow marbles. How many different groups of two marbles can Tom choose?”

Bonus dataset: AMPS: Along with MATH, the authors have also built the Auxiliary Mathematics Problems and Solutions (AMPS) pre-training corpus, a 23GB data repository made of ~100,000 Khan Academy problems with step-by-step solutions written in Latex, as well as 5 million problems generated using Mathematica scripts.

Why this matters: Current AI systems can’t solve MATH: The best part about MATH is that it’s unbelievably difficult. GPT2 models get, at best, an average of 6.9% accuracy on the dataset (even in the most liberal human school, such a school would get an F), while GPT-3 models (which are larger than GPT-2 ones) seem to do meaningfully better than their GPT2 forebears on some tasks and worse on others. This is good news: we’ve found a test that large-scale Transformer models can’t solve. Even better – we’re a long, long way from solving it. 
  Read more: Measuring Mathematical Problem Solving with the MATH Dataset (arXiv).
  Get the code from GitHub here.

###################################################

Want a pony that looks like Elvis? We can do that:
…Machine learning systems can do style generalization…
Here’s a fun Twitter thread where someone combines the multimodal CLIP system with StyleGAN, and uses a dataset from [Note: some chance of NSFW-ish generations] This Pony Does Not Exist (an infinite sea of GAN-generated my little ponies). Good examples include a pony-version of Billie Eilish, Beyonce, and Justin Bieber.

Why this matters: In the same way AI can generate different genres of text, ranging from gothic fiction to romantic poetry, we’re seeing evidence the same kinds of generative capabilities work for imagery as well. And, just as with text, we’re able to mix and match these different genres to generate synthetic outputs that feel novel. The 21st century will be reshaped by the arrival of endless, generative and recombinative media.
  Check out the twitter thread of generations here (Metasemantic’s Twitter thread).

###################################################

AI Index 2021: AI has industrialized. Now what?
…Diversity data is still scarce, it’s hard to model ethical aspects over time, and more…
The AI Index, an annual project to assess and measure AI progress, has published its fourth edition. (I co-chaired this years report and spent a lot of time working on it, so if you have questions, feel free to email me).
  This year’s ~200-page report includes analysis of some of the big technical performance trends of recent years, bibliometric analysis about the state of AI research in 2020, information about national investments into AI being made by governments, and data about the diversity of AI researchers present in university faculty (not good) and graduating PhDs (also not good). Other takeaways include data relating to the breakneck rates of improvement in AI research and deployment (e.g, the cost to train an ImageNet model on a public cloud has fallen from ~$2000 in 2017 to $7.43 last year), as well as signs of increasing investment into AI applications, beyond pure AI research.

Ethics data – and the difficulty of gathering it: One thing that stuck out to me about the report is the difficulty of measuring and assessing ethical dimensions of AI deployment – specifically, many assessments of AI technologies use one-off analysis for things like interrogating the biases of the model, and few standard tests exist (let’s put aside, for a moment, the inherent difficulty of building ‘standard’ tests for something as complex as bias).

What next? The purpose of the AI Index is to prototype better ways to assess and measure AI and the impact of AI on society. My hope is that in a few years governments will invest in tech assessment initiatives and will be able to use the AI Index as one bit of evidence to inform that process. If we get better at tracking and analyzing the pace of progress in artificial intelligence, we’ll be able to deal with some of the information asymmetries that have emerged between the private sector and the rest of society; this transparency should help develop better norms among the broader AI community.
  Read the 2021 AI Index here (AI Index website)
  Read more about the report here: The 2021 AI Index: Major Growth Despite the Pandemic (Stanford HAI blog).

###################################################

Want to train an autonomous river boat? This dataset might help:
…Chinese startup Orca Tech scans waterways with a robot boat, then releases data…
AI-infused robots are hard. That’s a topic we cover a lot here at Import AI. But some types of robot are easier than others. Take drones, for instance – easy! They move around in a broadly uncontested environment (the air) and don’t need many smart algorithms to do useful stuff. Oceangoing ships are similar (e.g, Saildrone). But what about water-based robots for congested, inland waterways? Turns out, these are difficult to build, according to Chinese startup Orca Tech, which has published a dataset meant to make it easier for people to add AI to these machines.

Why inland waterways are hard for robots: “Global positioning system (GPS) signals are sometimes attenuated due to the occlusion of riparian vegetation, bridges, and urban settlements,” the Orca Tech authors write. “In this case, to achieve reliable navigation in inland waterways, accurate and real-time localization relies on the estimation of the vehicle’s relative location to the surrounding environment”.

The dataset: USVInland is a dataset of inland waterways in China “collected under a variety of weather conditions” via a little robotic boat. The dataset contains information from stereo cameras, a lidar system, GPS antennas, inertial measurement units (IMUs), and three millimeter-wave radars. The dataset was recorded from May to August 2020 and the darta covers a trajectory of more than 26km. It contains 27 continuous raw sequences collected under different weather conditions.

Why this matters: The authors tested out some typical deep learning-based approaches on the dataset and saw that they struggled to obtain good performance. USVInland is meant to spur others to explore whether DL algorithms can handle some of the perception challenges involved in navigating waterways.
  Read more: Are We Ready for Unmanned Surface Vehicles in Inland Waterways? The USVInland Multisensor Dataset and Benchmark (arXiv).
  Get the data from here (Orca Tech website).

###################################################

Hackers breach live feeds of 150,000 surveillance cameras:
…Now imagine what happens if they combine that data with AI…
A group of hackers have gained access to live feeds of 150,000 surveillance cameras, according to Bloomberg News. The breach is notable for its scale and the businesses it compromised, which included hospitals, a Tesla warehouse,and the Sandy Hook Elementary School in Connecticut.
  The hack is also significance because of the hypothetical possibilities implied by combining this data with AI – allow me to speculate: imagine what you could do with this data if you subsequently applied facial recognition algorithms to it and mixed in techniques for re-identification, letting you chart the movements of people over time, and identify people they mix with who aren’t in your database. Chilling.
  Read more: Hackers Breach Thousands of Security Cameras, Exposing Tesla, Jails, Hospitals (Bloomberg).

###################################################

Why your next construction site could be cleaned by AI:
…Real-world AI robots: Japan edition…
AI startup Preferred Networks and construction company Kajima Corporation have built ‘iNoh’, software that creates autonomous cleaning robots. iNoh uses multiple sensors, including LIDAR, to do real-time simultaneous localization and mapping (SLAM) – this lets the robot know roughly where it is within the building. It pairs this with a deep learning-based computer vision system which “robustly and accurately recognizes obstacles, moving vehicles, no-entry zones and workers”, according to the companies. The robot uses its SLAM capability to help it build its own routes around a building in real-time, and its CV system stops it getting into trouble.

Why care about Preferred Networks: Preferred Networks, or PFN, is a Japanese AI startup we’ve been tracking for a while. The company started out doing reinforcement learning for robots, set a new ImageNet training-speed record in 2017 (Import AI 69) and has been doing advanced research collaborations on areas like meta-learning (Import AI 113). This is a slightly long-winded way to say: PFN has some credible AI researchers and is generally trying to do hard things. Therefore, it’s cool to see the company apply its technology in a challenging, open-ended domain, like construction.

PyTorch++: PFN switched away from developing its own AI framework (Chainer) to PyTorch in late 2019.
  Read more: Kajima and PFN Develop Autonomous Navigation System for Construction Site Robots (Preferred Networks).
  Watch a (Japanese) video about iNoh here (YouTube).###################################################

At last, 20 million real network logs, courtesy of Taiwan:
…See if you AI can spot anomalies in this…
Researchers with the National Yang Ming Chiao Tung University in Taiwan have created ZYELL-NCTU NetTraffic-1.0, a dataset of logs from real networks. Datasets like this are rare and useful, because the data they contain is inherently temporal (good! difficult!) in a non-expensive form (text strings are way cheaper to process than, say, the individual stills in a video, or slices of audio waveforms).

What is the dataset: ZYELL-NCTU NetTraffic-1.0 was collected from the outputs of firewalls in real, deployed networks of the telco ‘ZYELL’. It consists of around 22.5 million logs and includes (artificially induced) examples of probe-response and DDoS attacks taking place on the network.

Why this matters: It’s an open question whether modern AI techniques can do effective malicious anomaly detection on network logs; datasets like this will help us understand their tractability.
  Read more: ZYELL-NCTU NetTraffic-1.0: A Large-Scale Dataset for Real-World Network Anomaly Detection (arXiv).
Where to (maybe) get the dataset: Use the official website, though it’s not clear precisely how to access it.

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

CSET’s Jason Matheny joins Biden Administration
Jason Matheny, founding director at Georgetown’s influential ‘CSET’ thinktank, is taking on three senior roles “at the intersection of technology and national security”: deputy assistant to the President for technology and national security; deputy director for national security in the OSTP and coordinator for technology and national security at the National Security Council, per FedScoop. . Previously, Matheny was director at IARPA, where—among other things—he spearheaded the forecasting program that incubated Tetlock’s influential superforecasting research.
Read more: Jason Matheny to serve Biden White House in national security and tech roles (FedScoop).

Podcast: Brian Christian on AI alignment:
Brian Christian is interviewed by Rob Wiblin on the 80,000 Hours podcast, about his book, The Alignment Problem (covered in Import #221), and lots else. It’s an awesome interview, which manages to be even more wide-ranging than the book — I strongly recommend both.
Podcast and transcript: Brian Christian on the alignment problem (80,000 Hours podcast).

Minor correction:
Last week I wrote that the NSCAI’s report suggested $32bn investment in domestic semiconductor industry over the next five years— the correct figure is $35bn.

###################################################

Tech Tales:

Tell me the weight of the feather and you will be ready
[A large-scale AI training infrastructure, 2026]

When you can tell me precisely where the feather will land, you will be released, said the evaluator.
‘Easy’, thought the baby artificial intelligence. ‘I predict a high probability of success’.

And then the baby AI marked the spot on the ground where it thought the weather would land, then told its evaluator to drop the feather. The feather started to fall and, buffeted by invisible currents in the air and their interplay with the barbs and vanes of the feather itself, landed quite far from where the baby AI had predicted.

Shall we try again? asked the evaluator.
‘Yes,’ said the baby. ‘Let me try again’.

And then the baby AI made 99 more predictions. At its hundredth, the evaluator gave it its aggregate performance statistics.
  ‘My predictions are not sufficiently accurate,’ said the baby AI.
  Correct, said the evaluator. Then the evaluator cast a spell that put the baby AI to sleep.
In the dreams of the baby AI, it watched gigantic feathers made of stone drop like anvils into the ground, and tiny impossibly thin feathers made of aerogel seem to barely land. It dreamed of feathers falling in rain and in snow and in ice. It dreamed of feathers that fell upward, just to know what a ‘wrong’ fall might look like. 

Whenn the baby woke up, its evaluator was there.
Shall we go again, said the evaluator.
‘Yes,’ said the baby, its neurons lighting up in predictive anticipation of the task, ‘show me the feather and let me tell you where it will land’.
And then there was a feather. And another prediction. And another comment from its evaluator.

In the night, the baby saw even more fantastic feathers than the night before. Feathers that passed through hard surfaces. Feathers which were on fire, or wet, or frozen. Sometimes, multiple feathers at once.

Eventually, the baby was able to roughly predict where the feather would fall.
We think you are ready, said the evaluator to the feather.
Ready for what? said the baby.
Other feathers, said the evaluator. Ones we cannot imagine.
‘Will I be ready?’ said the baby.
That’s what this has been for, said the evaluator. We believe you are.
And then the baby was released, into a reality that the evaluator could not imagine or perceive.

Somewhere, a programmer woke up. Made coffee. Went to their desk. Checked a screen: “`feather_fall_pred_domain_rand_X100 complete“`.

Things that inspired this story: Domain randomization; ancient tales of mentors and mentees; ideas about what it means to truly know reality 

Import AI 239: China trains a massive 10b model, Vicarious does pick&place; the GCHQ publishes some of its thoughts on AI

China trains a 10billion parameter multimodal network… using NVIDIA’s code:
…Chinese entities train a decent 10 billion parameter multi-modal model…
A hybrid team of researchers from Alibaba and Tsinghua University have built M6, a “Multi-Modality to Multi-Modality Multitask Mega-transformer”. M6 is a multi-modal model trained on a huge corpus of text and image data, including image-text pairs (similar to recent systems like OpenAI’s CLIP). M6 has a broad capability surface and because of how it was trained, you can use M6 to search for an image or vice versa, generate media in different modalities, match images together, write poems, answer questions, and so on.

Data:  ~60 million images (with accompanying text pairs) totalling 1.9terabytes (almost twice the raw size of ImageNet), plus 292GB of text.

Facts and figures: Though the authors say they’ve trained a 10billion and 100billion parameter model, they mostly report performance statistics for the 10billion. The 100b is a mixture-of-experts model, while the 10b is based on NVIDIA’s Megatron-LM training code (Import AI 218). The model’s size and sophistication as notable – this feels like a symptom of the maturing capabilities of various Chinese AI organization. I wonder when we’ll get an M6-scale system from people affiliated with India, or regions like Europe or Africa.

Why this matters: M6 is notable for being a non-English model at equivalent scale to some of the largest primarily-English ones. We’re entering an era where there will be multiple, gigantic AI models, magnifying and minimizing different cultures with variations stemming from the organizations that trained them. It’s also interesting to consider how these models proliferate, and who will get access to them. Will students and researchers at Tsinghua get access to M6, or just Alibaba’s researchers, or both? And how might access schemes develop in other countries, as well?
…Finally, a word about bias: There’s no discussion of bias in the paper (or ethics), which isn’t typical for papers of this type but is typical of papers that come out of Chinese research organizations. If you’ve got counterexamples, please send them to me!
  Read more: M6: A Chinese Multimodal Pretrainer (arXiv).

###################################################

Facebook doesn’t even need labels to train its vision systems anymore (just your Instagram data):
…Self-supervised learning, at sufficient scale, might get us few-shot learning for free as well…
Self-supervised pre-training: SEER learns via a self-supervised method called SwAV, which lets it look at unannotated images and, given enough scale, derive features from them and cluster them itself. They train using a family of models called a RegNet. The magic of this method comes from the data they use: a billion pictures from Instagram (though they note in the paper these are “non-EU” images, likely due to GDPR compliance).

Results: The best version of SEER gets 84.2% top-1 ImageNet accuracy, nicely improving on other self-supervised approaches. (Though there’s still a ways to go before these techniques match supervised methods, which are now getting around ~90% top-1 accuracy).

Few shot learning, meet image recognition: SEER gets 77.9% top-1 accuracy on ImageNet after only seeing 10% of the images – suggesting that SEER can do a kind of few-shot learning, where by providing it with some data from a new domain it quickly adjusts itself to obtain reasonable performance. (Though several tens of thousands of images is quite different to the few sentences of text it takes to do few-shot learning in the text regime)

Why this matters: SEER is relatively simple, as is the network architecture they use. The amazing capabilities we see here (including the few-shot learning) primarily come from the scale of the datasets which are used, combined with the intentionally naive unlabelled training approach. “This result confirm that the recent progress of self-supervised learning is not specific to curated training set, like ImageNet, and could benefit a large range of applications associated with uncurated data,” they write.
  Read more: Self-supervised Pretraining of Visual Features in the Wild (arXiv).

###################################################

What does the UK’s NSA think about AI?
…Position paper hints at focus areas, discusses ethical issues, even IDs the elephant in the room…
The UK’s spy agency, GCHQ, has published a paper about how it hopes to use AI. This is notable; spy agencies rarely discuss frontier technologies. (Though don’t get too excited – the memo is unsurprisingly light on technical details.)

What information does the paper contain? GCHQ shares some thoughts for how it might use AI to aid some of its missions, these include:

  • AI for cyber threats: Use AI to identify malicious software, and also potentially to trace its distribution. 
  • AI for online safety for children: Use AI to identify online behaviors that look like adults ‘grooming’ kids for sexual exploitation, and use AI to analyze images found in the course of these investigations.(No mention, unlike the Germans (Import AI 234), of using AI to generate sexual imagery to help trap abusers). 
  • AI for human trafficking: Use AI to map out the human networks that enable trafficking, and use AI to sift through vast amounts of financial data to find connections. 
  • AI for foreign state disinformation: Use AI to do fact-checking and detect synthetically generated content (e.g, deepfakes). Also, use AI to automatically identify and block botnets that use machine-generated accounts. 

What does GCHQ think are the major AI ethics challenges? Fairness and bias is listed as one major challenge. GCHQ also lists ’empowerment’ – which it defines as figuring out how much freedom to give the AI systems themselves. GCHQ thinks AI is best used in partnership with humans: the AI comes up with answers and insights, then human experts use this to authorize or carry out actions.

AI policy is national security policy: In recent years, we’ve seen a vast migration of technology people moving from academia into industry, partially in response to skyrocketing salaries. This poses a challenge to modern spy agencies – government has a hard time paying as much as Google or Facebook, but it needs a similar caliber of talent to achieve its objectives. GCHQ says part of why it has written the paper is because of this new reality. “Most investment in the UK continues to come from the private sector rather than government and this is expected to continue,” the agency writes. “It is therefore unsurprising that GCHQ is now engaging more broadly with wider society and industry than at any other time in its history. We have much to learn from the exponential growth of AI in the outside world, and believe our specialists also have much to contribute.”
  Read more: Pioneering a New National Security, the Ethics of Artificial Intelligence (GCHQ, PDF).

###################################################

Google’s latest speech compression tech tells us that production AI is hybrid AI:
…End-to-end learning is nice, but the best things happen when you combine expertise…
Google has made Lyra, a more efficient speech codec. Lyra wraps in some recent ML advancements; it works by extracting features from input speech, quantizing that, then using a generative model to take these features and reinflate them into output speech.

Good speech with less data: Lyra is designed to operate with audio streams of as little as 3kbps – here, it does better than other codecs and compares favorably with Opus, an established speech codec. Lyra is notable because it smooshes together expert-derived stuff (which would be some of the traditional codec techniques used here) with a strategic use of a generative model and gets great performance and useful efficiency gains.

Fairness & ML: “We’ve trained Lyra with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries and then verifying the audio quality with expert and crowdsourced listeners. One of the design goals of Lyra is to ensure universally accessible high-quality audio experiences,” the company writes.

Why this matters: AI is going to be everywhere. And it’s going to be everywhere in a Lyra-like manner – as a discrete, smart component within a larger technical stack. We’re also going to see people use more generative models to distill and reinflate representations of reality – we’re entering the dumb ‘brain in a jar’ phase of AI deployment.
  Read more: Lyra: A New Very Low-Bitrate Codec for Speech Compression (Google blog).
  Read more: Generative Speech Coding with Predictive Variance Regularization (arXiv).

###################################################

AI developer: I’m afraid of what happens if my code gets released:
…One lens on the ethics of open vs closed-source…
Is it safer for an AI system to be open source or for it to be controlled by a small set of actors? Generally, the technology community has leaned towards stuff being open source by default, but in recent years, people have been experimenting with the latter. This has happened with various types of synthetic media, like language models that haven’t been fully released (e.g, NVIDIA’s Megatron LMs, GPT2[at first]), or various papers on synthetic media where the researchers don’t release the models. Now, a VP of AI faceswap App reface has written a post laying out how he thinks about the release of certain AI technologies. His post is about AI body swapping – that is, taking one person’s face and potentially body and morphing it onto someone else in a video.

Demos get shady attention: “Only after I published a [AI body swap] demo in August 2020 and different shady organizations started approaching me, I realized that AI-body-swap is a bomb. A bomb in both senses – as a technological breakthrough and as something dangerous if its code gets into the wrong hands,” he writes. “A team of high-class ML-pros would find a way around my code in about a week. In roughly six months, they’d have a production-grade full-body swap technology.”

Why this matters: “We need to make a pact, a deal that all the companies that create synthetic media must include watermarks, footprints, or provide other detectors for identifying it.”, he writes. A cynical person might say ‘business guy writes article about why his business-benefiting strategy is good, lol go figure’. There’s some merit to that. But a few years ago articles like this were a lot rarer – the AI community does seem to be becoming genuinely concerned about the consequences of its actions.
  Read more: The Implications of Open-Source AI: Should You Release Your AI Source (Hackernoon).

###################################################

Proof that robots are getting smarter: GreyOrange partners with AI startup Vicarious:
…Maybe AI+Robots is about to be a thing…
AI startup Vicarious has partnered with GreyOrange, a company that builds AI and robot systems for warehouses. Vicarious has a neuroscience-inspired approach to AI (which earlier helped it break the CAPCHA security system, #66) which means its systems exhibit different capabilities to those made with deep learning techniques.

Why Vicarious? Vicarious’s tech has typically been good at solving problems involving spatial reasoning. You can get a sense of its approach by looking at papers like “Learning a generative model for robot control using visual feedback” and “From proprioception to long-horizon planning in novel environments: A hierarchical RL model“. (I hope to cover more of this research in Import AI in the future, but I’ll need to take some time to load different approach into my brain.)

What they’re doing together: GreyOrange will integrate an AI capability from Vicarious into its ‘GreyMatter Fulfillment Operating System” tech. Vicarious’s system will handle technology for autonomous vertical picking, which involves getting a robot to perceive “the size, shape and material characteristics of inventory items, including when these are loosely positioned in an unstructured fashion”, then pick them up and approach, retrieve, and place items into order boxes. “Vicarious’ computer-vision and robotics technology is a breakthrough in the ability to handle unstructured, previously hard-to-grasp items,” said Vicarious co-founder Dileep George in a press release announcing the move.  

Why this matters: The physical world is a huge challenge for AI. Most AI systems get trained in a purely digital context (e.g, computer vision systems get trained on digitized images of the world, and then are deployed in reality against… digital images of the world), whereas robots need to be trained in simulation then taught to generalize to the real world. This is especially challenging because of things like differences in the physics fidelity between simulators and reality, or hardware issues (e.g, air pressure/temperature/etc will screw around with the motor responses of certain robots, and the sun has a nasty habit of moving across the sky which continually changes illumination in outdoor/hybrid settings, throwing off vision systems).
  GreyOrange and Vicarious partnering up is a further symptom of the success of AI being applied to robotics. That’s a big deal: if we can get more flexible AI systems to work here, we can unlock tremendous economic value. Vicarious also isn’t the only company trying to revolutionize fulfillment with robotics – that’s also the focus of the (deep learning-based) startup, Covariant, among others. 
  Read more: GreyOrange and Vicarious Launch Autonomous Vertical Picking Solution for Apparel and Omnichannel Fulfillment (GlobeNewswire press release).
  Find out more about GreyOrange (GreyOrange site).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

NSCAI publishes final report on US AI strategy
The USA’s National Security Commission on AI has delivered its final report on US AI strategy. The report warns that the US risks being overtaken in technology without an acceleration in AI adoption, supported by substantial federal investment over the next five years.

Recommendations:

  • US military should work towards achieving ‘AI readiness’ by 2025: increasing DoD AI R&D spending to $8bn/year in 2026 (vs. $1.5bn in 2021); establishing a Digital Service Academy and National Digital Reserve Corps to address the talent deficit; more research into ensuring AI systems are robust and reliable.
  • US should embrace autonomous weapons and work with other nations to establish international standards and mitigate risks, while reaffirming DoD’s policy that human judgement be involved in any decision to kill. 
  • Overall federal funding for R&D should climb to at least 1% of GDP by 2026 (vs 0.6% in 2017).
  • Non-defense AI R&D funding should increase to $32bn/year (vs. $1.5bn in 2021); $32bn investment over five years in domestic semiconductor capacity (see Import 238).
  • To build a stronger AI workforce, the US should offer green cards to all STEM PhD graduates at US universities and double the number of employment-based visas, alongside substantially more funding for STEM education at all levels.
  • Establishing a Technology Competitiveness Council, tasked with developing and overseeing a National Technology Strategy, and coordinating efforts across government.

Read more: NSCAI report in full

————–

FAccT suspends Google sponsorship

ACM’s FAccT conference has paused its sponsorship by Google, following the turmoil and departures at the company’s Ethical AI team. Lead researchers Timnit Gebru and Margaret Mitchell were forced out earlier this year, after disputes around the company’s suppression of ethics research (see Import 226; 235).

   Read more: AI ethics research conference suspends Google sponsorship (VentureBeat) 


————–

Highlights from semiconductor substack– Mule’s Musings on Heterogeneous Compute; ASML and lithography; vertical monopolies; GPT-3.
– Deep Forest’s primers on semiconductor foundries (pt 1, pt 2).
– Employ America on the economics of the current chip shortage.

###################################################

Tech Tales:

The Speech for the Rebels
[2030: A country in central Africa where the US and China are fighting a proxy war primarily via the stoking of local political tensions]

They’d spent a few million dollars to digitize everything – and I do mean everything – that they’d gathered from the rebels. Then they started writing out speeches and testing the AI against it. The idea was that if you said something and the AI, which had been finetuned on all the digitized data, thought what you were saying had a low probability, then that told you that your speech was out of sync with the ‘mood’ inherent to the rebel group. On the other hand, if your speech was being predicted as likely by the AI system, that told you it might resonate.

Rhetorical Finetuning, the analysts called it, or R-FT.
Silver Tongue, was the name of the system we used.
The Mouth – that’s what we called it.
– “Go see how well The Mouth works on them”.
– “Oh, you’re back, I guess the mouth worked for you”.
– “Just tell ’em what the Mouth says and see what happens”.
– “Don’t improvise, the Mouth works”.

The strangest truth about The Mouth was it worked – really well. One classified document noted that “campaigns which utilized R-FT via Silver Tongue saw a 15% improvement in our post-engagement de-escalation metrics, resulting in a lowered casualty rate for warfighters in the region”.

So that’s why we ended up sticking The Mouth on our wrists. The AI runs inside a wearable which has a microphone – the bracelet glows green when we’re saying things that The Mouth predicts are probably and it glows red when we say things that aren’t. We spend a lot of time in training getting taught to not look directly at our wrists while talking, but rookies do it anyway. Now, when I give my talks – even improvised ones, after an incident, or to resolve something, or ask for a favor – I get my little signals from the bracelet and I say the words and keep it green.

I don’t know the language the local rebels speak. My translator picks up most of it, but not the things they whisper to eachother, hands over their mouths, looking at me as I use the mouth to talk. What are they saying, I wonder?

Look, when the American tries to speak like us, their wrist flashes green.
What does the red mean, Father?
The red is when they are telling the truth, my Son.

Things that inspired this story: The miniaturization of communication technology for conflict; thinking about language models and how they could be weaponized for purposes of propaganda or other state-level objectives; thoughts about how AI might get integrated with warfare; various loosely connected ideas around how AI influences culture through re-magnification of things the AI picked up; the natural skepticism of all humans in all places to unfamiliar people giving them a convincing speech.