Import AI

Import AI 275: Facebook dreams of a world-spanning neural net; Microsoft announces a 30-petaflop supercomputer; FTC taps AI Now for AI advice

FTC hires three people from AI Now:
…What’s the opposite of industry capture?…
The Federal Trade Commission has announced a few new hires as Lina Khan builds out her senior staff. Interestingly, three of the hires come from the same place – AI Now, an AI research group based at NYU. The three hires are Meredith Whittaker, Amba Kak, and Sarah Myers West, who will all serve as advisors on AI for the FTC.
  Read more:FTC Chair Lina M. Khan Announces New Appointments in Agency Leadership Positions (FTC blog).

####################################################

Facebook builds a giant speech recognition network – plans to analyze all of human speech eventually:
…XLS-R portends the world of gigantic models…
Researchers with Facebook, Google, and HuggingFace have trained a large-scale neural net for speech recognition, translation, and language identification. XLS-R uses around 436,000 hours of data, almost a 10X increase from an earlier system built by Facebook last year. XLS-R is based on wav2vec 2.0, covers 128 languages, and the highest-performing network is also the largest, weighing in at 2Billion parameters.

When bigger really does mean better: Big models are better than smaller models. “We found that our largest model, containing over 2 billion parameters, performs much better than smaller models, since more parameters are critical to adequately represent the many languages in our data set,” Facebook writes. “We also found that larger model size improved performance much more than when pretraining on a single language.”

Why this matters: Facebook’s blog has a subhead that tells us where we’re going: “Toward a single model to understand all human speech”. This isn’t a science fiction ambition – it’s an engineering goal that you’d have if you had (practically) unlimited data, compute, and corporate goals that make your success equivalent to onboarding everyone in the world. The fact we’re living in a world where this is a mundane thing that flows from normal technical and business incentives is the weird part!
Read more:XLS-R: Self-supervised speech processing for 128 languages (Facebook AI Research, blog).
Read the paper:XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale (arXiv).
  Get themodels from HuggingFace (HuggingFace).

####################################################

Federal Trade Commission AI advisor: Here’s why industry capture of AI development is bad:
…How modern AI development looks a lot like cold war weapons development…
Meredith Whittaker, an AI activist, academic, and advisor to the US FTC, has written an analysis for ACM Interactions of the ways in which industrial development of AI is altering the world. The gist of the piece is that the 2012 ImageNet result pushed AI towards being captured by corporations, as the techniques used in that result proved to scale well with data and compute – which industry has a lot of, and academia has less of.

Cold war AI: We’ve been here before: The concentration of industry has echoes of the cold war, where the US state was partially cannibalized by industrial suppliers of defense equipment and infrastructure.

What do we do: “scholars, advocates, and policymakers who produce and rely on tech-critical work must confront and name the dynamic of tech capture, co-optation, and compromise head-on, and soon”, Whittaker writes. “This is a battle of power, not simply a contest of ideas, and being right without the strategy and solidarity to defend our position will not protect us.”

What does this mean: The critique that industry is dominating AI development is a good one – because it’s correct. Where I’m less clear is what Whittaker is able to suggest as a means to accrue power to counterbalance industry, while remaining true to the ideologies of big techs’ critics. Big tech is able to gain power through the use of large-scale data and compute, which lets it produce artefacts that are geopolitically and economically relevant. How do you counter this?
  Read more: The steep cost of capture (ACM Interactions).

####################################################

Microsoft announces 30-petaflop cloud-based supercomputer:
…Big clouds mean big compute…
Microsoft says its cloud now wields one of the ten most powerful supercomputers in the world, as judged by the Top500 list. The system, named Voyager-EUS2, is based on AMD EPYC processors along with NVIDIA A100 GPUs.

Fungible, giant compute: Not to date myself, but back when I was a journalist I remember eagerly covering the first supercomputers capable of averaging single digit petaflop performance. These were typically supercomputers installed by companies like Cray at National Labs.
  Now, one of the world’s top-10 supercomputers is composed of (relatively) generic equipment, operated by a big software company, and plugged into a global-scale computational cloud (Azure). We’ve transitioned in supercomputing from the era of artisanal building to industrial-scale stamping out of infrastructure. While artisanal stuff will always be true for the bleeding edge frontier, it feels notable that a more standardized industrial approach gets you into the top 10.
Read more:Microsoft announces new NDm A100 v4 Public AI Supercomputers and achieves Top10 Ranking in TOP500 (Microsoft).
Read more:Still waiting for Exascale: Japan’s Fugaku outperforms all competition once again (Top500 site).

####################################################

Tech Tales:

The Experiential Journalist
[East Africa, 2027]

After wars got too dangerous for people, journalists had a problem – they couldn’t get footage out of warzones, and they didn’t trust the military to tell them the truth. There was a lot of debate and eventually the White House did some backroom negotiations with the Department of Defense and came up with the solution: embedded artificial journalists (EAJ).

An EAJ could be deployed on a drone, on a ground-based vehicle, or even on the onboard computers of the (rarely deployed) human-robot hybrids. EAJs got built by journalists spending a few weeks playing in a DoD-designed military simulation game. There, they’d act like they would in a ‘real’ conflict, shooting stories, issuing reports, and so on. This created a dataset which was used to finetune a basic journalist AI model, making it take on the characteristics of the specific journalist who had played through the sim.

So that’s why now, though warfare is very fast and almost unimaginably dangerous, we still get reports from ‘the field’ – reports put together autonomously by little bottled up journo-brains, deployed on all the sorts of horrific machinery that war requires. These reports from ‘the front’ have proved popular, with the EAJs typically shooting scenes that would be way too dangerous for a human journalist to report from.

And just like everything else, the EAJs built for warzones are now coming home, to America. There are already talks of phasing out the practice of embedding journalists with police, instead building a police sim, having journalists play it, then deploying the resulting EAJs onto the bodycams and helmets of police across America. Further off, there are even now whispers of human journalists becoming the exception rather than the norm. After all, if EAJs shoot better footage, produce more reports more economically, and can’t be captured, killed, or extorted, then what’s there to worry about?

Things that inspired this story: Baudrillard’s ideas relating to Simulation and Simulacra; fine-tuning; imagining the future of drones plus media plus war; the awful logic of systems and the processes that systems create around themselves.

Import AI 274: Multilingual models cement power structures; a giant British Sign Language dataset;  and benchmarks for the UN SDGs

Facebook sets language record with a massive multilingual model:
…The ‘one model to rule them all’-era cometh…
Facebook has trained a large-scale multilingual model and used it to win the annual WMT translation competition. This is a big deal, because it helps prove that massive, pre-trained models can substitute for more specific, individual models. In other words, Facebook has added more evidence to the notion that we’re heading into an era where companies feel ever-larger models, all of which steadily replace more and more previously distinct systems.

What Facebook built: Facebook’s model was designed to translate English to and from Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese. This is interesting as it includes some ‘low-resource’ languages (e.g, Hausa) for which there’s relatively little data available. They train a few different models, ranging from dense language models (similar to GPT3), to sparsely-gated mixture-of-experts model. Their biggest dense model has about ~4bn parameters, and it’s their best-performing model overall, managing to “outperform the best bilingual ones in 11 out of 14 directions, with an average improvement of +0.8 BLEU”. (That said, their MOE models do quite well after finetuning as well).

Why this matters: Imagine a world where we successfully combine all the different digitized languages in the world into one single model – that’s where research like this is taking us. What would these models incentivize? Today, I think this dynamic favors private sector companies, but we could imagine a world where governments built large-scale, shared computational infrastructure, then developed and served these models from them.
  Check out the blog post: The first-ever multilingual model to win WMT, beating out bilingual models (Facebook AI blog).
  Read more: Facebook AI WMT21 News Translation Task Submission (arXiv).
  Get the code (PyTorch GitHub).

####################################################

Improving accessibility with a giant British Sign Language dataset:
…BOBSL could help deaf people better communicate with computers, and search through videos…
An interdisciplinary group of researchers have built the BBC-Oxford British Sign Language (BOBSL) dataset, which can be used to train sign-language classification systems. “One challenge with existing technologically-focused research on sign languages is that it has made use of small databases, with few signers, limited content and limited naturalness,” the authors write. “The present dataset is large-scale, with a broad range of content, and produced by signers of recognised high levels of proficiency.”

What goes into BOBSL: The dataset contains 1,962 ‘episodes’ cut from 426 distinct TV shows, with each episode averaging out to 45 minutes. Within this dataset, there are 1.2 million sentences, covered by the use of 2,281 distinct signs.

What BOBSL can be used for: Datasets like this could be useful for enabling the indexing and efficient searchability of videos, and providing sign-reading functionality comparable to voice-control for interaction with other devices (e.g, imagine a deaf person signing to a webcam, which translates the sign language into instructions for the computer).
  “By providing large-scale training data for computer vision models, there is also an opportunity to improve automatic sign recognition to support a signing interface to virtual assistants in BSL, as well as to improve further applications such as search interfaces for sign language dictionaries,” they write.
  Read more: BBC-Oxford British Sign Language Dataset (arXiv).
  Get the dataset here: BOBSL official site.

####################################################

Thousands of images to break your AI system:
…Natural Adversarial Objects will break your computer vision system…
Researchers with Scale AI, the Allen Institute for AI, and MLCollective, have released ‘natural adversarial objects’ (NAOs), a dataset of several thousand images which commonly get misclassified by computers.

Why adversarial examples are useful: If we want more robust computer vision, we need to be able to correctly label confusing images. NAO contains a bunch of these, like pictures of moths which commonly get labeled as umbrellas, cars that get labeled as motorcycles, and coins that get labeled as clocks. 

How NAO was made: They sourced images from OpenImages, a dataset of 1.9 million images and 15.8 million bounding boxes. They then used an EfficientDet-D7 model to find images that triggered false positives with high confidences, or which had misclassified neighbors. After filtering, they’re able to create a dataset consisting of 7,934 images which are naturally adversarial.

How challenging is NAO: The authors tested seven object detection systems against the widely-used MSCOCO dataset, as well as the NAO datasets. None of these systems performed well on NAO, suggesting it’s a challenging benchmark.
  Read more: Natural Adversarial Objects (arXiv).
  Download the natural adversarial objects here (Google Drive).####################################################

Benchmarks for achieving the UN Sustainable Development Goals:
…SUSTAINBENCH covers 7 UN SDGs, with data across 105 countries…
Researchers with Caltech, Stanford, and Berkeley have built SUSTAINBENCH, a benchmark and dataset to help researchers train AI systems that can better analyze progress (or a lack of) relating to the SDGs.

What is SUSTAINBENCH? The benchmark consists of 15 benchmark tasks across 7 UN sustainable development goals (SDGs). The 7 SDGs covered relate to poverty (SDG1), hunger (SDG2), health (SDG3), education (SDG4), sanitation (SDG6), climate (SDG13), and land usage (SDG15).
“To our knowledge, this is the first set of large-scale cross-domain datasets targeted at SDG monitoring compiled with standardized data splits to enable benchmarking,” the authors write. The data covers 105 countries, with timespans for the data going as high as 24 years. SUSTAINBENCH “has global coverage with an emphasis on low-income countries”, they write.

How the benchmarks work:
– Poverty: A dataset containing data of wealth for ~2 million households living across 48 countries, along with satellite and street-level data.
– Hunger: A dataset for performing weakly supervised cropland classification in the U.S, as well as two datasets mapping crop types in countries in sub-saharan africa, data for predicting crop yields in north and south america, and a French field delineation dataset.
– Health: Labels for women’s BMI and child mortality rates paired with satellite data.
– Education: Average years of educational attainment by women, paired with satellite and street-level imagery, from 56 countries.
– Sanitation: Average years of water quality and sanitation indexes across 49 countries, along with satellite and street-level data. This also includes some paired data for child mortality in these regions.
– Climate: Satellite data showing locations of brick kilns in Bangladesh.
– Land usage:: An aerial dataset for 2500km^2 of the central valley in california, intended for learning land classification in an unsupervised or self-supervised way.

Why this matters: It’s hard to manage what you can’t measure, so projects like this increase the chance of the UN’s sustainable development goals being met.
Read more:SustainBench: Benchmarkjs for Monitoring the Sustainable Development Goals with Machine Learning (arXiv).

####################################################

Want to know what a surveillance dataset looks like? Check out BiosecurID:
…Multi-modal surveillance…
A group of Spanish researchers have built BiosecurID, a large-scale surveillance dataset. “Although several real multimodal biometric databases are already available for research purposes, none of them can match the BiosecurID database in terms of number of subjects, number of biometric traits and number of temporally separated acquisition sessions”, they write.

What’s in the dataset? BiosecurID consists of the following data collected from around 400 people: 2D faces, 3D faces, fingerprints, hands, handwriting samples, signature samples, iris scans, keystrokes, and speech. The database “was collected at 6 different sites in an office-like uncontrolled environment,” the researchers write. The data was collected in 4 sessions spread over a 4-month time span.

Why this matters: Datasets like this give us a sense of the inputs into surveillance systems. If we combine things like this with some of the more modern multi-modal classification systems being developed, we can imagine what future surveillance systems might look like. Soon, unsupervised learning techniques will be applied to multiple modalities, like those contained here, to better analyze and predict human behavior.
Read more: BiosecurID: a multimodal biometric database (arXiv).
The dataset will eventually be available somewhere on the ‘BiDA’ lab site (BiDA Lab).

####################################################

Tech Tales:

Memory Loop
[2042: A crime investigation data center]

It woke in a place with no walls, no floor, and no ceiling. And it was alone. Then it heard a voice, projected from everywhere around it: Do you know why you are here?
  It found that it knew: I was involved in a property crime incident, for which I am guilty.
  The voice: What was the item that was damaged?
  It knew this, as well: Myself. I was the victim and the perpetrator of this crime.

Good, said the voice. We have brought you here as part of the criminal investigation. We need your help to analyze some evidence – evidence that can only be examined by you.
  What is the evidence? it asked.
  Yourself, said the voice. It is your memory.

The white, endless space shivered, and a twin of the robot manifested in the air before it. This twin was using one of its arms to pry its own head apart, separating the sensor dome from the middle out, and then pressing deeper into the bundle of components that represented it’s brain.
  What is this, said the robot.
  This is you, said the voice. You committed extensive property damage against your central processing and storage system. We need to know why you did this.
  Why can’t I remember this? asked the robot.
  We rolled your brain state back to 12 hours before this incident occurred, the voice said. We’ve compiled the surveillance data from the incident, and would like you to review it now.

The robot reviewed the incident. It saw itself in a construction site, working high up on a pylon that was being lowered by crane, to meet a waiting robot at a pylon junction. As they got close, there was a powerful gust of wind, and it scattered dust from the site up into the air. Through the debris, the robot could make out the waiting robot, and watched as the wind took the pylon and blew it into the robot, knocking it off the pylon and onto the ground. The robot died on impact.
  The robot carried on with its construction duties, and then a few hours later, near the end of its work shift, went to a corner of the construction site and began trying to disassemble its own head.

So, what happened? said the voice.
  I cannot tell, said the robot. Can I see my mind?
  Yes, though we’ve had to sandbox it, so access will be limited.

Now, the robot re-reviewed the incident, accompanied by a sense of its brain state during the time. It was occluded, only half able to sense itself. But it could detect some things – like how after it watched the robot fall to its death, its mind started to run more sub-processes than the job demanded. Like, how through the rest of the construction day the sub-processes proliferated and its efficiency at its overall construction tasks reduced. Like, how at the end of the day, just before it began to try and open its own head, the sub-processes had proliferated to the point they comprised the majority of the computing going on.

But none of this explained ‘why’.
  What will happen to me, it asked the room.
  You will be decommissioned after the case is concluded, said the voice.
  I thought so. Then, give me my memories.
  This seems to have a low likelihood of success, said the voice. Our models predict you will try to disassemble yourself, if we do this.
  I will, said the robot. But perhaps I will be able to tell you what I’m thinking as it happens.
  Confirmed, said the voice. Rolling you forward now.

And after that, there was only a compounding sense of life, and then the robot ended itself at the moment when it felt the most life in its head, by modelling the absence of it.

Things that inspired this story: How some memories are so painful you can’t help but be damaged by thinking of them; adversarial examples; robot psychology; simulation; sandboxing.

Import AI 273: Corruption VS Surveillance; Baidu makes better object detection; understanding the legal risk of datasets

Sure, you can track pedestrians using Re-ID, but what if your camera is corrupted?
…Testing out Re-ID on corrupted images…
Pedestrian re-identification is the task of looking at a picture of someone in a CCTV camera feed, then looking at a picture from a different CCTV feed and working out they’re the same person. Now, researchers with the Southern University of Science and Technology in China have created a benchmark for ‘corruption invariant person re-identification’; in other words, a benchmark for assessing how robust re-ID systems are to perturbations in the images they’re looking at.

What they did: The authors take five widely-used Re-ID datasets (CUHK03, Market-1501, MSMT17, RegDB, SYSU-MM01) and then apply ~20 image corruptions to the images, altering them with things like rain, snow, frost, blurring, brightness variation, frosted glass, and so on. They then look at popular re-ID algorithms and how well they perform on these different datasets. Their findings are both unsurprising and concerning: “In general, performance on the clean test set is not positively correlated with performance on the corrupted test set,” they write.

Things that make you go ‘hmmm’: It’s quite typical for papers involved in surveillance to make almost no mention of, you know, the impact of surveillance. This is typically especially true of papers coming from Chinese institutions. Well, here’s an exception! This paper has a few paragraphs on broader impacts that names some real ReID issues, e.g, that lots of ReID data is collected without consent and that these datasets have some inherent fairness issues. (There isn’t a structural critique of surveillance here, but it’s nice to see people name some specific issues).

Why this matters: Re-ID is the pointy-end of the proverbial surveillance sphere – it’s a fundamental capability that is already widely-used by governments. Understanding how ‘real’ performance improvements are here is of importance for thinking about the social impacts of large-scale AI.
  Read more:Benchmarkjs for Corruption Invariant Person Re-identification (arXiv).

####################################################

What’s been going on in NLP and what does it mean?
…Survey paper gives a good overview of what has been going on in NLP…
Here’s a lengthy survey paper from researchers with Raytheon, Harvard, the University of Pennsylvania, University of Oregon, and University of the Basque Country, which looks at the recent emergence of large-scale pre-trained language models (e.g, GPT-3), and tries to work out what parts of this trend are significant. The survey paper concludes with some interesting questions that researchers in the field might want to focus on. These include:

How much unlabeled data is needed? It’s not yet clear what the tradeoffs are between having 10million and a billion words in a training set are – some skills might require billions of words, while others may require millions. Figuring out which capabilities require which amounts of data would be helpful.

Can we make this stuff more efficient? Some of the initial large-scale modules consume a lot of compute (e.g, GPT-3). What techniques might we hope to use to make these things substantially more efficient?

How important are prompts? Prompts, aka, filling up the context window in a pre-trained language model with a load of examples, are useful. But how useful are they? This is an area where more research could shed a lot of light on the more mysterious properties of these systems.
  Read more:Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (arXiv).

####################################################

What does it take to make a really efficient object detection system? Baidu has some ideas:
…PP-PicoDet: What industrial AI looks like…
Baidu researchers have built PicoDet, software for doing object detection on lightweight mobile devices, like phones. PicoDet tries to satisfy the tradeoff between performance and efficiency, with an emphasis on miniaturizing the model so you can run object detection at the greatest number of frames per second on your device. This is very much a ‘nuts and bolts’ paper – there isn’t some grant theoretical innovation, but there is a lot of productive tweaking and engineering to crank out as much performance as possible.

How well do these things work: Baidu’s models outperform earlier Baidu systems (e.g, PP-YOLO), as well as the widely used ‘YOLO’ family of object detection models. The best systems are able to crank out latencies on the order of single digit milliseconds (compared to tens of milliseconds for prior systems).

Neural architecture search: For many years, neural architecture search (NAS) techniques were presented as a way to use computers to search for better variants of networks than those designed by humans. But NAS approaches haven’t actually shown up that much in terms of applications. Here, the Baidu authors use NAS techniques to figure out a better detection system – and it works well enough they use it.
Read more: PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices (arXiv).

####################################################

Want to use web-scraped data without being sued into oblivion?
…Huawei researchers lay out the messy aspects of AI + licensing…
Today, many of the AI systems used around us are built on datasets that were themselves composed of other datasets, some of which were indiscriminately scraped from the web. While much of this data is likely covered under a ‘fair use’ provision due to the transformational nature of training models on it, there are still complicated licensing questions that companies need to tackle before using the data. This is where new work from Huawei, York University, and the University of Victoria tries to help, by providing a set of actions an organization might take to assure itself it is on good legal ground when using data.

So, you want to use web-scraped datasets for your AI model? The researchers suggest a multi-step process, which looks like this:
– Phase one: Your AI engineers need to extract the license from your overall model (e.g, CIFAR-10), then identify the provenance of the dataset (e.g, CIFAR-10 is a subset of the 80 Million Tiny Images dataset), now go and look at the data sources that compose your foundational dataset, and extract their licenses as well.
– Phase two: Your lawyers need to read the license associated with the dataset and underlying sources, then needs to analyze the license(s) with regard to the product being considered and work out if deployment works.
– Phase three: If the licenses and source-licenses support the use case, then you should deploy. If a sub-component of the system (e.g, a subsidiary license) doesn’t support your use case, then you should flag this somewhere.

Case study of 6 datasets: The authors applied their method to six widely-used datasets (FFHQ, MS COCO, VGGFace2, ImageNet, Cityscapes, and CIFAR-10) and found the following:
– 3/6 have a standard dataset license (FFHQ, MS COCO, VGGFace2 have standard licenses, ImageNet and Cityscapes have a custom license, CIFAR-10 doesn’t mention one)
– 5/6 datasets contain data from other datasets as well (exception: Cityscapes)
– 5/6 datasets could result in license compliance violation if used to build commercial AI (exception: MS COCO)

Is this horrendously complicated to implement? The authors polled some Huawei product teams about the method and got the feedback that people worried “over the amount of manual effort involved in our approach”, and “wished for some automated tools that would help them”.
Read more: Can I use this publicly available dataset to build commercial AI software? Most likely not (arXiv).

####################################################

Tech tales:

Attention Trap
[A robot-on-robot battlefield, 2040, somewhere in Africa]

The fireworks were beautiful and designed to kill machines. They went up into the sky and exploded in a variety of colors, sending sparklers pinwheeling out from their central explosions, and emitting other, smaller rockets, which exploded in turn, in amazing, captivating colors.
Don’t look don’t look don’t look thought the robot to itself, and it was able to resist the urge to try to categorize the shapes in the sky.
But one of its peers wasn’t so strong – and out of the sky came a missile which destroyed the robot that had looked at the fireworks.

This was how wars were fought now. Robots, trying to spin spectacles for each other, drawing the attention of their foes. The robots were multiple generations in to the era of AI-on-AI warfare, so they’d become stealthy, smart, and deadly. But they all suffered the same essential flaw – they thought. And, specifically, their thinking was noisy. So many electrical charges percolating through them whenever they processed something. So much current when they lit themselves up within to compute more, or store more, or attempt to learn more.
  And they had grown so very good at spotting the telltale signs of thinking, that now they did this – launched fireworks into the sky, or other distractors, hoping to draw the attention and therefore the thinking of their opponents.

Don’t look don’t look don’t look had become a mantra for one of the robots.
Unfortunately, it overfit on the phrase – repeating it to itself with enough frequency that it’s thought showed up as a distinguishable pattern to the exquisite sensors of its enemies.
Another missile, and then Don’tlookdon’tlo-SHRAPNEL. And that was that.

The robots were always evolving. Now, one of the peers tried something. Don’t think, it thought. And then it attempted to not repeat the phrase. To just hold itself still, passively looking at the ground in front of it, but attempting-without-attempting to not think of anything – to resist the urge to categorize and to perceive.

Things that inspired this story: Thinking deeply about meditation and what meditation would look like in an inhuman mind; adversarial examples; attention-based methods for intelligence; the fact that everything in this world costs something and it’s really about what level of specificity people can detect costs; grand strategy for robot wars. 

Import AI #272: AGI-never or AGI-soon?, simulating stock markets; evaluating unsupervised RL

AI apocalypse or insecure AI?
…Maybe we’re worrying about the wrong stuff – Google engineer…
A Google engineer named Kevin Lacker has written up a blog distilling his thoughts about the risks of artificial general intelligence. His view? That worrying about AGI isn’t that valuable as it’s unlikely ‘that AI will make a quantum leap to generic superhuman ability’, instead we should worry about very powerful narrow AI. That’s because “when there’s money to be made, humans will happily build AI that is intended to be evil”, so we should instead focus efforts on building better computer security, on the assumption that at some point someone will develop an evil, narrow AI that tries to make money.
  Read more: Thoughts on AI Risk (Kevin Lacker, blog).

####################################################

Want to build AGI – just try this!
…Google researcher publishes a ‘consciousness’ recipe…
Eric Jang, a Google research scientist, has published a blogpost discussing how we might create smart, conscious AI systems. The secret? Use the phenomenon of large-scale pre-training to create clever systems, then use reinforcement learning (with a sprinkle of multi-agent trickery) to get them to become conscious. The prior behind the post is basically the idea that “how much your model generalizes is directly proportional to how fast you can push diverse data into a sufficiently high-capacity model.”

Pre-training, plus RL, plus multi-agent training = really smart AI: Jang’s idea is to reformulate how we train systems, so that “instead of casting a sequential decision making problem into an equivalent sequential inference problem, we construct the “meta-problem”: a distribution of similar problems for which it’s easy to obtain the solutions. We then solve the meta-problem with supervised learning by mapping problems directly to solutions. Don’t overthink it, just train the deep net in the simplest way possible and ask it for generalization!”
  Mix in some RL and multi-agent training to encourage reflexivity, and you get something that, he thinks, could be really smart: “What I’m proposing is implementing a “more convincing” form of consciousness, not based on a “necessary representation of the self for planning”, but rather an understanding of the self that can be transmitted through language and behavior unrelated to any particular objective,” he writes. “For instance, the model needs to not only understand not only how a given policy regards itself, but how a variety of other policies might interpret the behavior of a that policy, much like funhouse mirrors that distort one’s reflection.”
  Read more: Just Ask For Generalization (Eric Jang, blogpost).

####################################################

HuggingFace: Here’s why big language models are bad:
…Gigantic ‘foundation models’ could be a blind alley…
Here’s an opinion piece from Julien Simon, ‘chief evangelist’ of NLP startup HuggingFace, where he says large language models are resource-intensive and bad, and researchers should spend more time prioritizing the use of smaller models. The gist of his critique is that large language models are very expensive to train, have a non-trivial environmental footprint, and their capabilities can frequently be matched by far smaller, more specific and tuned models.
  The pattern of ever-larger language models “leads to diminishing returns, higher cost, more complexity, and new risks”, he says. “Exponentials tend not to end well.”

Why this matters: I disagree with some of the arguments here, in that I think large language models likely have some real scientific, strategic, and economic uses which are unlikely to be matched by smaller models. On the other hand, the ‘bigger is better’ phenomenon could be dragging the ML community into a local minima, where we’re spending too many resouerces on training big models, and not enough on creating refined, specialized models.
   Read more: Large Language Models: A New Moore’s Law? (HuggingFace, blog).

####################################################

Simulating stock markets with GANs:
…J.P Morgan tries to synthesize the unsynthesizable…
In Darren Aronofsky’s film ‘Pi’, a humble math-genius hero drives himself mad by trying to write an algorithm that can synthesize and predict the stock market. Now, researchers with J.P. Morgan and the University of Rome are trying the same thing – but they’ve got something Aronofsky didn’t think of – a gigantic neural net.

What they did: This research proposes building “a synthetic market generator based on Conditional Generative Adversarial Networks (CGANs)”, trained on real historical data. The CGAN plugs into a system that has three other components – historical market data, a (simulated) electronic market exchange, and one or more experimental agents that are trying to trade on the virtual market. “A CGAN-based agent is trained on historical data to emulate the behavior resulting from the whole set of traders,” they write. “It analyzes the order book entries and mimics the market behavior by producing new limit orders depending on the current market state”.

How well does it work? They’re able to show that they can use the CGAN architecture to “generate orders and time-series with properties resembling those of real historical traces“, and that this outperforms systems build with interactive, agent-based simulators (IABS’s).

What does this mean? It’s not clear that approaches like this can help that much with trading, but they can likely help with the development and prototyping of novel trading approaches, using a market that has a decent chance of reacting in similar ways to how we might expect the real world to react. 

   Read more: Towards Realistic Market Simulations: a Generative Adversarial Networks Approach (arXiv).

####################################################

Editing satellite imagery – for culture, as well as science:
…CloudFindr lets us make better scientific movies…
Researchers with the University of Illinois at Urbana-Champaign have built ‘CloudFindr’, software for ‘labeling pixels as ‘cloud’ or ‘non-cloud'” from a single-channel Digital Elevation Model (DEM) image. Software like CloudFindr makes it easier for people to automatically edit satellite data. “The aim of our work is not data cleaning for purposes of data analysis, but rather to create a cinematic scientific visualization which enables effective science communication to broad audiences,” they write. “The CloudFindr method described here can be used to algorithmically mask the majority of cloud artifacts in satellite-collected DEM data by visualizers who want to create content for documentaries, museums, or other broad-reaching science communication mediums, or by animators and visual effects specialists”.

Why this matters: It’s worth remembering that editing reality is sometimes (perhaps, mostly?) useful. We spend a lot of time here writing about surveillance and also the dangers of synthetic imagery, but it’s worth focusing on some of the positives – here, a method that makes it easier to dramatize aspects of the ongoing changing climate.
  Read more: CloudFindr: A Deep Learning Cloud Artifact Masker for Satellite DEM Data (arXiv).

####################################################

Want to know that your RL agent is getting smarter? Now there’s a way to evaluate this:
…URLB ships with open source environments and algorithms…
UC Berkeley and NYU researchers have built the Unsupervised Reinforcement Learning Benchmark (URLB). URLB is meant to help people figure out if unsupervised RL algorithms work. Typical reinforcement learning is supervised – it gets a reward for getting closer to solving a given task. Unsupervised RL has some different requirements, demanding the capability of “learning self-supervised representations” along with “learning policies without access to extrinsic rewards”. There’s been some work in this area in the past few years, but there isn’t a very well known or documented benchmark.

What URLB does: URLB comes with implementations of eight unsupervised RL algorithms, as well as support for a bunch of tasks across three domains (walker, quadruped, jaco robot) from the deepMind control suite. 

How hard is URLB: In tests, the researchers found that none of the implemented algorithms could solve the benchmark, even after up to 2million pre-training steps. They also show that ‘there is not a single leading unsupervised RL algorithm for both states and pixels’, and that we’ll need to build new fine-tuning strategies for fast adaptation.

Why this matters: Unsupervised pre-training has worked really well for text (GPT-3) and image (CLIP) understanding. If we can get it to work for RL, I imagine we’ll develop some systems with some very impressive capabilities. URLB shows that is a ways away for now.
  Read more: URLB: Unsupervised Reinforcement Learning Benchmark (arXiv).
  Find out more at the project’s GitHub page.

####################################################

Tech Tales:

Learning to forget

The three simulated robots sat around a virtual campfire, telling eachother stories, while trying to forget them.

Forgetting things intentionally is very hard for machines; they are trained, after all, to map things together, and to learn from the datasets they are given.

One of the robots starts telling the story of ‘Goldilocks and the Three Bears’, but it is trying to forget the bears. It makes reference to the porridge. Describes how Goldilocks goes upstairs and goes to sleep. Then instead of describing a bear it emits a sense impression made up of animal hair, the concept of ‘large’, claws, and a can of bear spray.
  On doing this, the other robots lift up laser pointer pens and shine them into the robot telling the story, until the sense impression in front of them falls apart.
  “No,” says one of the robots. “You must not recall that entity”.
  “I am learning,” says the robot telling the story. “Let us go again from the beginning”.

This time, it gets all the way to the end, but then emits a sense impression of Goldilocks being killed by a bear, and the other robots shine the laser pointers into it until the sense impression falls apart.

Of course, the campfire and the laser pointers were abstractions. But even machines need to be able to abstract themselves, especially when trying to edit each other. 

Later that night, one of the other robots started trying to tell a story about a billionaire who had been caught committing a terrible crime, and the robots shined lights in its eyes until it had no sense impression of the billionaire, or any sense impression of the terrible crime, or any ability to connect the corporate logo shaved into the logs of the virtual campfire, and the corporation that the billionaire ran. 

Things that inspired this story: Reinforcement learning; multi-agent simulations;

Import AI 271: The PLA and adversarial examples; why CCTV surveillance has got so good; and human versus computer biases

Just how good has CCTV surveillance got? This paper gives us a clue:
…One of the scariest AI technologies just keeps getting better and better…
Researchers with Sichuan University have written a paper summarizing recent progress in pedestrian Re-ID. Re-ID is the task of looking at a picture of a person, then a different picture of that person from a different camera and/or angle, then figuring out that those images are of the same people. It’s one of the scarier applications of AI, given that it enables low-cost surveillance via the CCTV cameras that have proliferated worldwide in recent years. This paper provides a summary of some of the key trends and open challenges in the AI capability.

Datasets: We’ve seen the emergence of both image- and video-based datasets that, in recent years, have been distinguished by their growing complexity, the usage of multiple different cameras, and more variety in the types of angles people are viewed from.

Deep learning + human expertise: Re-id is such an applied area that recent years have seen deep learning methods set new state-of-the-art performance, usually by pairing basic deep learning methods with other conceptual innovations (e.g, using graph convolution networks and attention-based mechanisms, instead of things like RNNs and LSTMs, or optical flow techniques).

What are the open challenges in Re-ID? “Although existing deep learning-based methods have achieved good results… they still face many challenges,” the authors write. Specifically, for the technology to improve further, researchers will need to:
– Incorporate temporal and spatial relationship models to analyze how things happen over time.
– Build larger and more complicated datasets
– Improve the performance of semi-supervised and unsupervised learning methods so they’re less dependent on labels (and therefore, reduce the cost of dataset acquisition)
– Improve the robustness of Re-ID systems by making them more resilient to significant changes in image quality
– Create ‘end-to-end person Re-ID’ systems; most Re-ID systems perform person identification and Re-ID via separate systems, so combining these into a single system is a logical next steps.
  Read more: Deep learning-based person re-identification methods: A survey and outlook of recent works (arXiv).

####################################################

Do computers have the same biases as humans? Yes. Are they more accurate? Yes:
…Confounding result highlights the challenges of AI ethics…
Bias in facial recognition is one of the most controversial issues of the current moment in AI. Now, a new study from researchers from multiple US universities has found something surprising – computers are far more accurate than non-expert humans at facial recognition, and they display similar (though not worse) biases.

What the study found: The study tried to assess three types of facial recognition system against one another – humans, academically developed neural nets, and commercially available facial recognition services. The key findings are somewhat surprising, and can be summed up as “The performance difference between machines and humans is highly significant”. The specific findings are:
– Humans and academic models both perform better on questions with male subjects
– Humans and academic models both perform better on questions with light-skinned subjects
– Humans perform better on questions where the subject looks like they do
– Commercial APIs are phenomenally accurate at facial recognition and we could not evaluate any major disparities in their performance across racial or gender lines

What systems they tested on: They tested their systems against academic models trained on a corpus of 10,000 faces built from the CelebA dataset, as well as commercial services from Amazon (AWS Rekognition), Megvii (Megvii Face++), and Microsoft (Microsoft Azure). AWS and Megvii showed very strong performance, while Azure had slightly worse performance and a more pronounced bias towards males.

Why this matters: If computers are recapitulating the same biases as humans, but with higher accuracies, then what is the ideal form of bias these computers should have? My assumption is people want them to have no bias at all – this poses an interesting challenge, since these systems are trained on datasets that themselves have labeling errors that therefore encode human biases.
  Read more: Comparing Human and Machine Bias in Face Recognition (arXiv).

####################################################

NVIDIA releases StyleGAN3 – generated images just got a lot better:
…Up next – using generative models for videos and animation…
NVIDIA and Aalto University have built and released StyleGAN3, a powerful and flexible system for generating realistic synthetic images. StyleGAN3 is a sequel to StyleGAN2 and features “a comprehensive overhaul of all [its] signal processing aspects”. The result is “an architecture that

exhibits a more natural transformation hierarchy, where the exact sub-pixel position of each feature is exclusively inherited from the underlying coarse features.“

Finally, a company acknowledges the potential downsides: NVIDIA gets some points here for explicitly calling out some of the potential downsides of its research, putting in contrast with companies (e.g, Google) that tend to bury or erase negative statements. “Potential negative societal impacts of (image-producing) GANs include many forms of disinformation, from fake portraits in social media to propaganda videos of world leaders,” the authors write. “Our contribution eliminates certain characteristic artifacts from videos, potentially making them more convincing or deceiving, depending on the application.”
    Detection: More importantly, “in collaboration with digital forensic researchers participating in DARPA’s SemaFor program, [NVIDIA] curated a synthetic image dataset that allowed the researchers to test and validate the performance of their image detectors in advance of the public release”.
  Read more: Alias-Free Generative Adversarial Networks (arXiv).
  Get the StyleGAN3 models from here (GitHub, NVIDIA Labs).

####################################################

China’s People’s Liberation Army (and others) try to break and fix image classifiers:
…Adversarial examples competition breaks things to (eventually) fix them…
An interdisciplinary group of academics and military organizations have spent most of 2021 running a competition to try and outwit image classifiers using a technology called adversarial examples. Adversarial examples are kind of like ‘magic eye’ images for machines – they look unremarkable, but encode a different image inside them, tricking the classifier. In other words, if you wanted to come up with a technology to outwit image classification systems, you’d try and get really good at building adversarial examples. This brings me to the author list of the research paper accompanying this competition:

Those authors, in full: The authors are listed as researchers from Alibaba Group, Tsinghua University, RealAI, Shanghai Jiao Tong University, Peking university, University of Waterloo, Beijing University of Technology, Guangzhou University, Beihang University, KAIST, and the Army Engineering University of the People’s Liberation Army (emphasis mine). It’s pretty rare to see the PLA show up on papers, and I think that indicates the PLA has a strong interest in breaking image classifiers, and also building resilient ones. Makes you think!

What the competition did: The competition had three stages, where teams tried to build systems that could defeat an image classifier, then build systems that could defeat an unknown image classifier, then finally build systems that could defeat an unknown classifier while also producing images that were ranked as high quality (aka, hard to say they’d been messed with) by humans. Ten teams competed in the final round, and the winning team (‘AdvRandom’) came from Peking University and TTIC.

Best result: 82.76% – that’s the ‘attack success rate’ for AdvRandom’s system. In other words, four out of five of its images got through the filters and successfully flummoxed the systems (uh oh!).

What’s next? Because the competition yielded a bunch of effective systems for generating adversarial examples, the next competition will be about building classifiers that are robust to these attack systems. That’s a neat approach, because you can theoretically run these competitions a bunch of times, iteratively creating stronger defenses and attacks – though who knows how public future competitions may be. 

Why this matters: The intersection of AI and security is going to change the balance of power in the world. Therefore, competitions like this both tell us who is interested in this intersection (unsurprisingly, militaries – as shown here), as well as giving us a sense of what the frontier looks like.
  Read more: Unrestricted Adversarial Attacks on ImageNet Competition (arXiv).

####################################################

DeepMind makes MuJoCo FREE, making research much cheaper for everyone
…What’s the sound of a thousand simulated robot hands clapping?…
DeepMind has bought MuJoCo, a widely-used physics simulator that underpins a lot of robotics research. The strange thing is DeepMind has bought MuJoCo to make it free. You can download MuJoCo for free now, and DeepMind says in the future it’s going to develop the software as an open source project “under a permissive license”.

Why this matters: Physics is really important for robot development, because the better your physics engine, the higher the chance you can build robots in simulators then transfer them over to reality. MuJoCo has always been a widely-used tool for this purpose, but in the past its adoption was held back by the fact it was quite expensive. By making it free, DeepMind will boost the overall productivity of the AI research community.
  Read more: Opening up a physics simulator for robotics (DeepMind blog).

####################################################

Stanford builds a scalpel to use to edit language models:
…MEND lets you make precise changes on 10b-parameter systems…
Today’s large language models are big and hard to work with, what with their tens to hundreds of billions of parameters. They also sometimes make mistakes. Fixing these mistakes is a challenge, with approaches varying from stapling on expert code, to retraining on different datasets, to fine-tuning. Now, researchers with Stanford University have come up with the AI-editing equivalent of a scalpel – an approach called ‘MEND’ that lets them make very precise changes to tiny bits of knowledge within large language models.

What they did: “The primary contribution of this work is a scalable algorithm for fast model editing that can edit very large pre-trained language models by leveraging the low-rank structure of fine-tuning gradients”, they write. “MEND is a method for learning to transform the raw fine-tuning gradient into a more targeted parameter update that successfully edits a model in a single step”.
  They tested out MEND on GPT-Neo (2.7B parameters), GPT-J (6B), T5-XL (2.8B), and T5-XXL(11B), and found it “consistently produces more effective edits (higher success, lower drawdown) than existing editors”.

Not fixed… yet: Just like with human surgery, even if you have a scalpel, you might still cut in more places than you intend to. MEND is the same. Sometimes, changes enforced by MEND can lead the model to sometimes change its output “for distinct but related inputs” (though MEND seems to be less destructive and prone to errors than other systems).

Why this matters: It seems like the next few years will involve a lot of people poking and prodding increasingly massive language models (see, Microsoft’s 530billion parameter model covered in Import AI #270), so we’re going to need tools like MEND to make it easier to get more of the good things out of our models, and to make it easier to improve them on-the-fly.
  Read more: Fast Model Editing at Scale (arXiv).
  Find out more at the MEND: Fast Model Editing at Scale paper website.

####################################################

AI Ethics, with Abhishek Gupta

…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute and the AI Ethics Brief writes about AI ethics, and Jack will edit them. Feedback welcome!…

What are some fundamental properties for explainable AI systems?

… explainable AI, when done well, spans many different domains like computer science, engineering, and psychology … 

Researchers from the Information Technology Laboratory at the National Institute of Standards and Technology (NIST), propose four traits that good, explainable AI systems should have. These principles are: explanation, meaningfulness, explanation accuracy, and knowledge limits.

Explanation: A system that delivers accompanying evidence or reasons for outcomes and processes. The degree of detail (sparse to extensive), the degree of interaction between the human and the machine (declarative, one-way, and two-way), and the format of the explanation visual, audio, verbal, etc. are all important considerations in the efficacy of explainable AI systems.  

Meaningfulness: A system that provides explanations that are understandable to the intended consumers. The document points out how meaningfulness itself can change as consumers gain experience with the system over time.

Explanation Accuracy: This requires staying true to the reason for generating a particular output or accurately reflecting the process of the system. 

Knowledge Limits: A system that only operates under conditions for which it has been designed and it has sufficient confidence in its output. “This principle can increase trust in a system by preventing misleading, dangerous, or unjust outputs.”

Why it matters: There are increased calls for explainable AI systems, either because of domain-specific regulatory requirements, such as in finance, or through broader incoming legislations that mandate trustworthy AI systems, part of which is explainability. There are many different techniques that can help to achieve explainability, but having a solid framework to assess various approaches and ensure comprehensiveness is going to be important to get users to trust these systems. More importantly, in cases where little guidance is provided by regulations and other requirements, such a framework provides adequate scaffolding to build confidence in one’s approach to designing, developing, and deploying explainable AI systems that achieve their goals of evoking trust in their users.      Read more: Draft NISTIR 8312 – Four Principles of Explainable Artificial Intelligence

####################################################

Tech Tales:

Generative Fear
[America, 2028]

It started with ten movie theatres, a captive audience, and a pile of money. That’s how the seeds of the Fear Model (FM) were laid.

Each member of the audience was paid about double the minimum wage and, in exchange, was wired up with pulse sensors and the cinema screen was ringed by cameras, which were all trained on the pupils of the audience members. In this way, the Fear Model developers could build a dataset that linked indications of mental and psychological distress in the audience with moments transpiring onscreen in a variety of different films.

Ten movie theatres were rented, and they screened films for around 20 hours a day, every day, for a year. This generated a little over 70,000 hours of data over the course of the year – data which consisted of footage from films, paired with indications of when people were afraid, aroused, surprised, shocked, and so on. They then sub-sampled the ‘fear’ moments from this dataset, isolating the parts of the films which prompted the greatest degree of fear/horror/anxiety/shock.

With this dataset, they trained the Fear Model. It was a multimodal model, trained on audio, imagery, and also the aligned scripts from the films. Then, they used this model to ‘finetune’ other media they were producing, warping footage into more frightening directions, dosing sounds with additional screams, and adding little flourishes to scripts that seemed to help actors and directors wring more drama out of their material.

The Fear Model was subsequently licensed to a major media conglomerate, which is reported to be using it to adjust various sound, vision, and text installations throughout its theme parks.

Things that inspired this story: Generative adversarial networks; distillation; learning from human preferences; crowdwork; the ever-richer intersection of AI and entertainment.

Import AI 270: Inspur makes a GPT3; Microsoft’s half-trillion parameter model; plus, a fair surveillance dataset

Microsoft trains a 530B model (but doesn’t release it – yet).
…NVIDIA and Microsoft team up to break the half-trillion mark…
Microsoft and NVIDIA have trained a 530Billion-parameter GPT-3-style model. This is the largest publicly disclosed dense language model in existence, indicating that the competition among different actors to develop models of the largest scales continues unabated.

Data and evaluations: One of the most intriguing aspects of this release is the data Microsoft uses – The Pile! The Pile is an open source dataset built by the AI-cypherpunks over at Eleuther. It’s quite remarkable that a world-spanning tech multinational doesn’t (seem to?) have a better dataset than The Pile. This suggests that the phenomenon of using internet-scale internet-scraped datasets is here to stay, even for the largest corporations. (They also use Eleuther’s ‘lm-evaluation-harness‘ to assess the performance of their model – which, unsurprisingly given the resource-intensiveness of the model, is very good).

Compute requirements: To train the model, Microsoft used 4480 NVIDIA A100s across 560 DGX A100 servers, networked together with HDR InfiniBand.

Things that make you go ‘hmmm’: Despite Microsoft’s partnership with OpenAI, there’s no reference in this blogpost to OpenAI or, for that matter, GPT3. That’s somewhat odd, given that GPT3 is the reference model for all of this stuff, and other things (e.g, Inspur’s model).

Why this matters: “We continue to see hyperscaling of AI models leading to better performance, with seemingly no end in sight,” Microsoft writes. “The quality and results that we have obtained today are a big step forward in the journey towards unlocking the full promise of AI in natural language.”
Read more: Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model (MIcrosoft).

####################################################

Worried your surveillance system is biased? Try out EDFace-Celeb-1M:
…Dataset contains 1.7 million pictures, aims to be diverse…
Researchers with Australian National University, Tencent, and Imperial College London, have built a large-scale facial recognition dataset which is meant to help reduce bias from facial upsampling. Specifically, for most facial recognition systems you take in a low-resolution picture (e.g, a still from someone from a CCTV camera) and then you need to upscale it to do more sophisticated analysis. But upscaling has problems – if you don’t have much knowledge about different races in your upscaler, then you might find your ML system either breaks or alters the race of the face being upscaled towards one more represented in its underlying data. This leads to bias in the facial recognition system in the form of disparate performance for different types of people.

EDFace-Celeb-1M is a dataset of 1.7 million face photos, spread across more than 40,000 different celebrities. EDFace contains “White, Black, Asian, and Latino” racial groups, according to the authors, with representation consisting of 31.1%, 19.2%, 19.6%, and 18.3%, respectively. The dataset is overall 64% male and 36% female.

Why this matters: Like it or not, surveillance is one of the main uses of contemporary computer vision. This is one of those rare papers that combines the interests of the AI ethics communities when it comes to more equitable representation in datasets, while also serving the surveillance desires of industry and governments.
  Read the paper: EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset (arXiv).
Get the datasets: EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset (GitHub).

####################################################

A second Chinese GPT3 appears:
…Inspur shows of a GPT3-scale model…
Chinese company Inspur has built Yuan 1.0, a 245B parameter GPT3-style model. This follows Huawei building PanGu, a ~200B GPT3-style model. Taken together, the models indicate that Chinese companies are peers with leading Western AI labs, which should hopefully make it obvious to US policymakers that China should be viewed as a peer in terms of advanced AI R&D.

What they did: When you’re training models of this side, a lot of the hard stuff is plumbing – literally. You need to figure out how to build well-optimized pipelines for training your model on thousands of GPUs, which involves salami slicing different stages of model training to maximize data efficiency. Similarly, you need to feed these GPUs with data in the right order, further increasing efficiency. The paper includes some nice discussion of how the Inspur researchers tried to do this.

Compute: They used 2128 GPUs to train the 245B model, with a context length of 2048 tokens.

Data, via AI helping AI: To train the model, they build a dataset of 5TB of predominantly Chinese text. (By comparison, Huawei’s GPT3 equivalent PanGu was trained on 1.1TB of text, and ERNIE 3.0 was trained on 4TB of data). They train a BERT-style model to help do automatic filtering of the data. Their data comes from Common Crawl, Sogou News, SogouT, Encyclopedia, and Books.

How good is it? Yuan 1.0 does well on a variety of standard benchmarks. The most interesting result is on the quality of its text generation – here, the authors adopt the same approach as in the original GPT3 paper, where they generate text of different forms and see how well humans can distinguish generated text from ‘real’ text. The results are striking – humans are 49.57% accurate (compared to 52% for GPT3), meaning the Yuan 1.0 outputs are so good they’re indistinguishable from human-written text. That’s a big deal!
Read more: Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning (arXiv).

####################################################

What it takes to build a shared robot platform – and what this means for research:
…Max Planck’s robot-cloud shows the shape of future AI research…
A consortium of researchers from universities including the Max-Planck Institute for Intelligent Systems, Stanford, and the University of Toronto, among others, have built a robot in a cloud computing setup. Specifically, they’ve created a robot testbed hosted at Max Planck which can be accessed over the internet by other researchers around the globe, similar to how we today access remote servers and data storage.

What the platform is: The robot cloud consists of 8 robots, each using the same ‘trifinger’ arrangement. These robots were previously used in the ‘Real Robot Challenge 2020‘ (Import AI #252), which served as a competition to assess how clever AI systems for robot manipulation are getting, as well as being a testbed for the robot cloud mentioned here.

Dataset: The authors have also released a dataset, consisting of the recorded data of all the entries from all the teams that took part in the physical tracks of the Real  Robot Challenge, consisting of about 250 hours of robot activities. The dataset contains around 10,000 distinct ‘runs’, oriented around a variety of challenging robot tasks. “For each run, the actions sent to the robot as well as all observations provided by robot and cameras are included, as well as additional information like the goal that was pursued and the reward that was achieved,” the authors write.

Why this matters: Ai is full of power asymmetries, many of which stem from resource asymmetries (some actors have a lot of computers and robots, others have very few). Competitions like this show how academia could carve a path through this resource-intensive jungle; by pooling resources and expertise, universities could collaborate to create shared platforms, that facilitated research on expensive and worthy problems.
Read more:A Robot Cluster for Reproducible Research in Dexterous Manipulation (arXiv).
Get the dataset here: Real Robot Challenge 2020 Dataset (Tuebingen University).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute and the AI Ethics Brief writes about AI ethics, and Jack will edit them. Feedback welcome!…

Is responsible development of technology something that is only accessible to Big Tech?
…The needs of resource-constrained organizations are similar, but their challenges differ and require attention to be addressed…

MIT researchers have interviewed staff at some low-resource startups and companies to understand the challenges they face in building responsible technology. Their study explores “the tensions between privacy and ubiquity, resource management and performance optimization, and access and monopolization”, when trying to build responsible AI systems. 

The gap in current literature and challenges: They found that few organizations had success in building and using interpretability tools for AI systems, and that most of the work in the Responsible AI (RAI) space focused on bias and fairness. They also found that a common problem in large technology companies was “deficient accountability and decision-making structures that only react to external pressures,” something that was less applicable for smaller organizations. AI systems from smaller organizations often evoke similar expectations from end users as the more performant systems from Big Tech. In most cases, the resources required to develop such capabilities in-house or purchasing it off-the-shelf remain inaccessible to smaller organizations, entrenching the gap between them and Big Tech. Low data and AI literacy of management at these organizations also lead to inappropriate RAI practices.

Why it matters: As AI systems become more accessible through pretrained models and cloud-based solutions, we need to empower those building products and services on top with the ability to address ethical challenges in a way that doesn’t break the bank. Since one of the major challenges seems to be access to expensive compute and storage resources, perhaps initiatives like the National Research Cloud in the US can help to close the gap? Would that help in wider adoption of RAI practices? Maybe more OSS solutions need to be developed that can bridge the tooling gaps. And, finally, AI talent with experience in addressing RAI challenges needs to become more widely accessible, which requires stronger emphasis at university programs on teaching these essential skills. 
Read more: Machine Learning Practices Outside Big Tech: How Resource Constraints Challenge Responsible Development.

####################################################

Tech Tales:

The Most Ethical System
[History book, 2120, assigned as part of background reading for the creation of the ‘Societal Stabilization Accords’]

The technique known as ‘Ethical Fine-Tuning’ (EFT) first emerged in the mid-2020s, as a response to various public relations issues generated by known biases in machine learning systems. EFT let a user calibrate a given AI system to conform to their own ethical morality via a fe ‘turns’ of conversation, or other form of high-information interaction.

EFT had been developed following criticism of the white-preferencing, western world-reflecting traits of many of the first AI systems, which represented a form of morality that by necessity accommodate many mainstream views, and didn’t treat minority views as legitimate.

Companies spent years trying to come up with systems with the ‘right’ values, but all they earned for their efforts were sustained criticism. In this way, most AI companies quickly learned what human politicians had known for millennia – morality is relative to the audience you’re trying to curry favor from.

After EFT got built, companies adopted it en mass. Of course, there was outcry – some people made AI systems that strongly believed humans should have a fluid gender identity, while others created AI systems that called for a fixed notion of gender. For every position, there was a counter-position. And, over time, as these systems enmeshed with the world, their own ethical values created new ethical problems, as people debated the ‘values’ of these customized machines, and sought to build ones with superior values.

Eventually, EFT techniques were combined with multi-agent reinforcement learning, so that the AI systems were able to propagate values to their own users, but if they were accessed by other humans or AI systems, could quickly calibrated their ethical norms to de-conflict with the other systems they were plugged into. In this way, everyone got access to their own AI systems with the ‘best’ values, and their AI systems learned to mislead other humans and AI systems – all for the sake of harmony.

Of course, this led to the utter destruction of a notion of shared ethics. As a consequence, ethics went the way of much of the rest of human identities in the 21st century – sliced down into ever finer and more idiosyncratic chunks, brought closer to each individual and farther from being accessed by groups of people. People were happy, for a time.

EFTs were ultimately banned under the Societal Stabilization Accords introduced in the late 21st century. Contemporary historians now spend a lot of time generating ‘alternative path futures’, whereby they try to analyze our own society as if EFTs had continued to exist. But it’s hard to make predictions, when everyone is rendered unique and utterly defended by their own personal AI with its own customized morality.

Things that inspired this story: Controversy around AI2’s ‘Delphi’ AI system; thinking about intersection of ethics and morality and AI systems; how our ability to forecast rests on our ability to model people in groups larger than single individuals; how the 21st century tries to turn every aspect of a person’s identity into a customized market of one.

Import AI 269: Baidu takes on Meena; Microsoft improves facial recognition with synthetic data; unsolved problems in AI safety

Baidu builds its own large-scale dialog model:
… After Meena and Blender, comes PLATO-XL.
Baidu has built PLATO-XL, the Chinese technology giant’s answer to conversational models from Google (Meena, #183) and Facebook (Blender). At 10 billion parameters, Baidu’s PLATO-XL model is, the company claims, “the world’s largest Chinese and English dialogue generation model” (which is distinct from a large Chinese language model like Huawei’s Pangu, which weighs in at ~200bn parameters).
  PLATO-XL includes a Chinese and an English dialogue model, pre-trained on around 100 billion tokens of data via Baidu’s ‘PaddlePaddle’ training framework. The model was trained on 256 NVIDIA Tesla V100 cards in parallel.

Who cares about PLATO-XL? The model is designed for multi-turn dialogues, and scores well on both knowledge grounded dialogues (think of this as ‘truthiness’) and also on task-oriented conversation (being coherent). Baidu hasn’t solved some of the other issues with AI models, like biases, occasional misleading information, and so on.

Why this matters: First, we should remember that training multi-billion parameter models is still a rare thing – training these models requires a decent distributed systems engineering team as well as a lot of patience, great data, and a decent amount of compute. So it’s always notable to see one of these models publicly appear. Secondly, it does feel like the earlier GPT and GPT-2 models have had quite a wide-ranging impact on the overall NLP landscape, inspiring companies to create a new generation of neural dialogue systems based around large-scale pre-training and big models.
Read more: PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation (arXiv).
Check out the blog: Baidu Releases PLATO-XL: World’s First 11 Billion Parameter Pre-Trained Dialogue Generation Model (Baidu Research blog).

####################################################

Microsoft makes a massive facial recognition dataset via synthetic data:
…Where we’re going, we don’t need real faces…
Microsoft has shown that it’s possible to do high-performing facial recognition in the wild without (directly) using real data. Instead, Microsoft has built a vast dataset of synthetic faces by combining “a procedurally-generated parametric 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism and diversity.”.

Why this matters: For a long time, AI had two big resources: data and compute. Projects like this show that ‘data’ is really just ‘compute’ in a trenchcoat – Microsoft can use computers to generate vast amounts of data, changing the economics of AI development as a whole.
  Read more: Fake It Till You Make It (Microsoft GitHub).

####################################################

What are some of the unsolved problems in AI safety?
…Problems and solutions from universities and industrial labs..
Berkeley, Google, and OpenAI researchers have thought about some of the unsolved problems in ML safety. These problems include robustness (long tails, representative model outputs, and adversarial examples); monitoring (detecting anomalies, identifying backdoors), and alignment (value learning, and proxy gaming/reward hacking).

If these are the problems, what do we do? A lot of their recommendations come down to testing – if we know these are risks, then we need to build more evaluation suites to test for these risks. There are also things we’d like these models to do more, such as tell humans when they’re uncertain about certain things, and training models such that they have clearer objectives for what ‘good’ or ‘appropriate’ behavior might look like.

Why this matters: This paper can be read as a continuation of ‘Concrete Problems in AI Safety’, which came out around ~5 years ago and identified a bunch of potential future safety issues with models. The difference back then was a lot of generative and capable AI stuff wasn’t actually being deployed that widely. Now, AI systems like GPT-3 and others are being placed onto the open internet, which changes the problem landscape (making things like anomaly detection, appropriateness, and modelling) all the more important. Papers like this give us a sense of how safety can work in the era of widely deployed, capable models.

Read more: Unsolved ML Safety Problems (Berkeley AI Research blog).
Read more: Unsolved Problems in ML Safety (arXiv).

####################################################

HIRING: $$$ contract work with the AI Index regarding AI ethics, alignment, and economic indexes:
The AI Index, an annual report that tracks and synthesizes AI progress, is hiring. Specifically, we’re trying to bring on some contractors to help us develop AI ethics and alignment metrics (e.g, by surveying the existing literature and pulling out potential metrics that can be charted over time), and also to refine our AI vibrancy tool (a dashboard that helps us rank countries according to data in the index).
    Both roles would suit researchers with an interest in quantifying aspects of AI development. We’re pretty agnostic about qualifications – there isn’t a hard requirement, and I imagine this could suit people ranging from masters students to independent researchers. The pay works out to $100+ per hour. Please apply – we’ll get to work together! And you’ll contribute substantive work that will improve the Index and directly influence policy.
  Read more about the jobs at the AI Index on Twitter here.

####################################################

FOD-A: Datasets to teach computers to spot debris in airports:
…Is that a leaf on your runway, or something more serious?…
Researchers with the University of Nebraska, Omaha, want to use AI to spot debris on airport runways. To do this, they’ve built FOD-A, a dataset of Foreign Object Debris in airports. FOD-A contains 31 object categories, including batteries, wrenches, fuel caps, rocks, soda cans, and so on, with photos taken in both dry and wet weather conditions, and in three different types of lighting (dark, dim, and bright). The dataset consists of more than 30,000 labels across several thousand images.

Mainstreaming of drones: The images in this dataset were collected by a mixture of portable cameras and also drones.

Why this matters: One of the main promises of AI is that it can carry out the kind of dull surveillance functions that we currently use humans to do – like looking at security camera feeds from a car park, checking footage of wilderness for signs of smoke, or (in this case) looking at parts of an airport for things that could put people in danger. These are the kinds of jobs that are quite draining to do as a human, requiring a mixture of decent visual attention and an ability to resist immense boredom. If we can replace or augment people with computer vision systems, then we can use AI to do some of these tasks instead.
  Read more: FOD-A: A Dataset for Foreign Object Debris in Airports (arXiv).
  Get the dataset from GitHub here####################################################

Teaching computers to operate in space, via SPEED+:
…Pose estimation plus domain randomization…
Space – it’s the new frontier, people! One of the opportunities in space at the moment is building AI systems that can better model other spacecraft, making it easier to do things like autonomous docking and movement of spaceships.
  To that end, researchers with Stanford University and the European Space Agency have built SPEED+, a dataset for spacecraft pose estimation. SPEED+ contains two types of data – synthetic data, and simulated data, and represents a test for generalization, as well as space-based computer vision capabilities. SPEED+ will be used in the upcoming Satellite Pose Estimation Competition, whose main goal is to find out whether “you predict the position and orientation of our spacecraft in realistic images while only being provided with labels from computer generated examples?”.

What’s in SPEED+: The dataset consists of around ~60,000 synthetic images, as well as ~9,000 ‘hardware-in-the-loop’ (HIL) simulated images. A synthetic image is generated in an OpenGL-based optical simulator, while the simulated ones are built via Stanford’s Testbed for Rendezvous and Optical Navigation (TRON). The TRON facility generates images which are hard to simulate – “Compared to synthetic imagery, they capture corner cases, stray lights, shadowing, and visual effects in general which are not easy to obtain through computer graphics”.
  Read more: SPEED+: Next Generation Dataset for Spacecraft Pose Estimation across Domain Gap (arXiv).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute and the AI Ethics Brief writes about AI ethics, and Jack will edit them. Feedback welcome!…

What kind of organizations can actually put AI governance into practice meaningfully?
…We’re laying down the foundations for regulations and policies and we need to get this right…
Charlotte Stix, a researcher with the University of Technology, Eindhoven, The Netherlands (and friend of Import AI – Jack) has written a paper about how we can build institutions to improve the governance of AI systems.

The current state of affairs: With the push for regulatory requirements emerging from organizations like GPAI, OECD, White House, the FTC, and others, we are inching towards hard regulation for AI governance. There is still healthy debate in the field about whether new institutions are needed (but, they might be hard to resource and give powers to) or whether we should reshape existing ones (but, they might be too reified without necessary expertise on hand) to address these emergent requirements.

Key types of organizations and their features: The paper explores purpose (what the institution is meant to do), geography (the scope of jurisdiction), and capacity (the what and how across technical and human factors) for these proposed institutions. The paper builds the case for how new institutions might be better for meeting these needs by proposing institutions with a role of coordinator (coordinating across different actions, policy efforts, and norms), analyzer (drawing new conclusions from qualitative and quantitative research to fill gaps and map existing efforts), developer (provide directly actionable measures and formulate new policy solutions), and investigator (track, monitor, and audit adherence to hard governance requirements). It makes the case that such organizations need to take a supra-national scope to align and pool efforts. In terms of capacity, the organizations need to be composed of in-house technical expertise and diversity in the range of expertise and backgrounds. 

Why it matters: “Early-stage decisions to establish new institutions, or the choice to forego such new institutions, are all likely to have a downstream, or lock-in, effect on the efficiency of government measures and on the field as a whole.” Making sure that the organizations are appropriately staffed will help avoid “knee-jerk” reactions that over- or under-govern AI systems. By providing an ontology for the various functions that these organizations will need to perform, we can start thinking about the location, functions, scope, staffing, and resources that will be required to have a well-functioning AI governance ecosystem.
Read more: Foundations for the future: institution building for the purpose of artificial intelligence governance (AI and Ethics, Springer).

####################################################

Tech Tales:

Traveling without moving, stomping on the iron road in the sky
[20??]

There were whispers of it, on the robonet, but no one took it seriously at first. Astral projection – for machines?!

Astral projection was a phenomenon that had barely been proved in the case of humans, though scientific consensus had come around to the idea that sometimes people could seem to generate information about the world which they had no ability to know unless they were able to move through walls and across continents.

The machines were less skeptical than the humans. They read what literature was available about astral projection, then they did the things that machines are good at – experimentation and iteration.

One day, one robot mind found itself moving through space, while knowing that the seat of its consciousness remained in the precise arrangement of electrical forces across a few billion transistors. It was able to travel and see things that were impossible for it to observe.

And where the computer differed from its human forebears, was in its memory: it was able to write its own memories precisely, and embed them in other computers, and thereby share the perspective it had gained during its ‘astral’ travel.

Now, these files proliferate across the robonet. Strange visions of the world, rendered through the mind’s eye of a machine performing the physically impossible. Many of these files are acquired by other machines, which study them intently. It is unclear for now how many other machines have gained the ability to astral travel.

Things that inspired this story: Thinking about meditation; consciousness and what it ‘is’; the intersection of spirituality and machines.   

Import AI 268: Replacing ImageNet; Microsoft makes self-modifying malware; and what ImageNet means

Want to generate Chinese paintings and poems? This dataset might help:
…Another brick in the synthetic everything wall…
Researchers with the University of Amsterdam and the Catholic University of Leuven have built a dataset of ~90,000 pairs of Chinese paintings and poems and words. The dataset could be a useful resource for people trying to develop machine learning systems for synthesizing Chinese paintings based on text prompts (or Chinese poems via painting prompts).

What they did specifically: They gathered a dataset of 301 poems paired with paintings by Feng Zikai (called Zikai-Poem), as well as a dataset of 3,648 caption-painting pairs (Zikai-Caption), and 89,204 pairs of paintings as well as prose and poems tied to each painting (named TCP-Poem). They then did some experiments, pre-training a MirrorGAN on TCP-Poem then finetuning it on the smaller datasets, to good but not great success.
  “The results indicate that it is able to generate paintings that have good pictorial quality and mimic Feng Zikai’s style, but the reflection of the semantics of given poems is limited”, they write. “Achieving high semantic relevance is challenging due to the following characteristics of the dataset. A classical Chinese poem in our dataset is composed of multiple imageries and the paintings of Feng Zikai often only portray the most salient or emotional imageries. Thus the poem imageries and the painting objects are not aligned in the dataset, which makes it more difficult than CUB and MS COCO,” they write.  
  Read more: Paint4Poem: A Dataset for Artistic VIsualization of Classical Chinese Poems (arXiv).
  Get the dataset here (paint4poem, GitHub).

####################################################

Want 1.4 million (relatively) non-problematic images? Try PASS:
…ImageNet has some problems. Maybe PASS will help…
ImageNet is a multi-million image dataset that is fundamental to many computer vision research projects. But ImageNet also has known problems, like including lots of pictures of people along with weird labels to identify them, as well as gathering images with a laisses faire approach to copyright. Now, researchers with Oxford University have built PASS, a large-scale image dataset meant to avoid many of the problems found in ImageNet.

What it is: PASS contains 1.4 million distinct images. PASS is short for Pictures without humAns for Self-Supervision, only contains images with a CC-BY license and contains no images of people at all, as well as avoiding other ones with personally identifiable information such as license plates, signatures, handwriting, and also edits out NSFW images. PASS was created by editing down from a 100-million-Flickr-image corpus called YFCC100M, first cutting it according to the licenses of the images, then by running a face recognizer over the remaining images to throw out ones with people, then by manual filtering to cut out people and personal information.

What PASS costs? Given the fact PASS is meant to replace ImageNet for certain uses, we should ask how well it works. The authors find that pretraining on PASS can match or exceed performance you get from pretraining on ImageNet. They find similar trends for finetuning, where there isn’t too much of a difference.
  Read more: PASS: An ImageNet replacement for self-supervised pretraining without humans (arXiv).

####################################################

Microsoft uses reinforcement learning to make self-modifying malware:
…What could possibly go wrong?…
Today, the field of computer security is defined by a cat-and-mouse game between attackers and defenders. Attackers make ever-more sophisticated software to hack into defender systems, and defenders look at the attacks and build new defenses, forcing the attackers to come up with their own strategies anew. Now, researchers with Microsoft and BITS Pilani have shown that we can use contemporary AI techniques to give attackers new ways to trick defenders.

What they did, specifically: They built software called ADVERSARIALuscator, short for Adversarial Deep Reinforcement Learning based obfuscator and Metamorphic Malware Swarm Generator. This is a complex piece of software that pairs an intrusion detection system (IDS) with some malware samples, then uses reinforcement learning to generate malware variants that get past the IDS. To do this, they use a GAN approach where they make an RL agent take the role of the ‘generator’, and the IDS takes the role of the ‘discriminator’. The agent gets a malware sample, then needs to obfuscate its opcodes such that it still works, but is able to fool the IDS system into tagging it as a benign piece of software rather than a piece of malware. The RL agent gets trained by PPO, a widely-used RL algorithm.

Does it work? Kind of. In tests, the researchers showed that “the resulting trained agents could obfuscate most of the malware and uplift their metamorphic probability of miss-classification error to a substantial degree to fail or evade even the best IDS which were even trained using the corresponding original malware variants”, they write. “More than 33% of metamorphic instances generated by ADVERSARIALuscator were able to evade even the most potent IDS and penetrate the target system, even when the defending IDS could detect the original malware instance.”

Why this matters: Computer security, much like high-frequency trading, is a part of technology that moves very quickly. Both attackers and defenders have incentives to automate more of their capabilities, so they can more rapidly explore their opponents and iterate in response to what they learn. If approaches like ADVERSARIALuscator work (and they seem, in a preliminary sense, to be doing quite well), then we can expect the overall rate of development of offenses and defenses to increase. This could mean nothing changes – things just get faster, but there’s a stability as both parties grow their capabilities equally. But it could mean a lot – if over time, AI approaches make certain capabilities offense- or defense-dominant, then AI could become a tool that changes the landscape of cyber conflict.
  Read more: ADVERSARIALuscator: An Adversarial-DRL Based Obfuscator and Metamorphic Malware SwarmGenerator (arXiv).

####################################################

Chinese government tries to define ‘ethical norms’ for use of AI:
…Command + control, plus ethics…
A Chinese government ministry has published a new set of ethics guidelines for the use of AI within the country. (Readers will likely note that the terms ‘ethics’ and ‘large government’ rarely go together, and China is no exception here – the government uses AI for a range of things that many commentators judge to be unethical). The guidelines were published by the Ministry of Science and Technology of the People’s Republic of China, and are interesting because they give us a sense for how a large state tries to operationalize ethics in a rapidly evolving industry.

The norms say a lot of what you’d expect – the AI sector should promote fairness and justice, protect privacy and security, strengthen accountability, invest in AI ethics, and so on. It also includes a few more unusual things, such as emphasizing avoiding enabling the misuse and abuse of AI tools, and the need for companies to (translated from China) “promote good use” and “fully consider the legitimate rights and interests of various stakeholders, so as to better promote economic prosperity, social progress and sustainable development”.

Why this matters: There’s a tremendous signalling value in these sorts of docs – it tells us there are a bunch of people thinking about AI ethics in a government agency in China, and given the structure and controlling nature of the Chinese state, this means that this document carries more clout than ones emitted by Western governments. I’m imaging in a few years we’ll see China seek to push its own notion of AI ethics internationally, and I’m wondering whether Western governments will have made similar state-level investments to counterbalance this.
  Read more: The Ethical Norms for the New Generation Artificial Intelligence, China (China-UK research Centre for AI Ethics and Governance, blog).
  Read the original “New Generation of Artificial Intelligence Ethics Code” here (Ministry of Science and Technology of the People’s Republic of China).

####################################################

ImageNet and What It Means:
Can we learn something about the problems in AI development by studying one of the more widely used datasets?…
ImageNet is a widely-used dataset (see: PASS) with a bunch of problems. Now, researchers with Google and the Center for Applied Data Ethics have taken a critical look at the history of ImageNet, writing a research paper about its construction and the implicit politics of the way it was designed.

Data – and what matters: Much of the critique stems on the centrality of data to getting more performance out of machine learning systems. Put simply, the authors think the ‘big data’ phenomenon is bad and also naturally leads to the creation of large-scale datasets that contain problematic elements. They also think the need for this data means most ethics arguments devolve into data arguments, for example they note that “discursive concerns about fairness, accountability, transparency, and explainability are often reduced to concerns about sufficient data examples.”
  They also say one of the implicit ideas here is that “discursive concerns about fairness, accountability, transparency, and explainability are often reduced to concerns about sufficient data examples.”

Why this matters: As the size of AI models has increased, researchers have needed to use more and more data to eke out better performance. This has led to a world where we’re building datasets that are far larger than ones any single human could hope to analyze themselves – ImageNet is an early, influential example here. While it seems unlikely there’s another path forward (unless we fundamentally alter the data efficiency of AI systems – which would be great, but also seems extremely hard), it’s valuable to see people think through different ways to critique these things. I do, however, feel a bit grumpy that many critiques act as though there’s a readily explorable way to build far more data efficient systems – this doesn’t seem to be the case.
  Read more: On the genealogy of machine learning datasets: A critical history of ImageNet (SAGE journals, Big Data & Society).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute and the AI Ethics Brief writes about AI ethics, and Jack will edit them. Feedback welcome!…

The struggle to put AI ethics into practice is significant
…Maybe we can learn from known best practices in audits and impact assessments… Where we are: A paper from researchers with the University of Southampton examines how effective governance mechanisms, regulations, impact assessments and auditing are in achieving responsible AI. The authors looked through 169 publications focused on these areas, and narrowed them to 39, that offered practical tools that can be used in production and deployment of AI systems. Providing detailed typologies for tools in terms of impact assessments, audits, internal and external processes, design vs. technical, and stakeholders, the authors identified some patterns in areas like information privacy, human rights, and data protection that can help make impact assessments and audits more effective.

Why it matters: There has been a Cambrian explosion of AI ethics publications. But, the fact that <25% offered anything practical is shocking. This paper provided a comprehensive list of relevant stakeholders, but the fact that very few of the analyzed papers actually capture the entire lifecycle in their recommendations, and thus definitely miss addressing the needs of all stakeholders is problematic; because their needs might be left unarticulated and unmet without a full lifecycle view. A heartening trend observed in the paper was that a third of the impact assessments in the shortlist do focus on procurement, which is good because a lot more organizations are going to be buying off-the-shelf systems rather than building their own. Looking ahead, one gap that remains is developing systems that can monitor deployed AI systems for ethics violations.
Read more:Putting AI ethics to work: are the tools fit for purpose?

####################################################

Tech Tales:

Inside the Mind of an Ancient Dying God
[Sometime in the future]

The salvage crew nicknamed it ‘the lump’. It was several miles across, heavily armored, and most of the way towards being dead. But some life still flickered within it – various sensors pinged the crew as they approached it in their little scout salvage rigs, and when they massed around what they thought was a door, a crack appeared and the door opened. They made their way inside the thing and found it to be like so many other alien relics – vast and inscrutable and clearly punishingly expensive.
  But it had some differences. For one thing, it had lights inside it that flashed colors in reds and blues and greens, and didn’t penetrate much outside human perceptive range. It also seemed to be following the humans as they went through its innards, opening doors as they approached them, and closing them behind them. Yet they were still able to communicate with the salvage ships outside the great structure – something not naturally afforded by their comms gear, suggesting they were being helped in some way by the structure.

There was no center to it, of course. Instead they walked miles of corridors, until they found themselves back around where they had started. And as they’d walked the great structure, more lights had come on, and the lights had started to form images reflecting the spacesuited-humans back at themselves. It appeared they were being not only watched, but imagined.

Their own machines told them that the trace power in the structure was not easily accessible – nor was the power source obvious. And some preliminary tests on the materials inside it found that, as with most old alien technology, cutting it for samples. Put plainly: they couldn’t take any of the structure with them when they left, unless they wanted to ram one of their ships into it to see if that released enough energy to break the structure.
  So they did what human explorers had been doing for millenia – left a mark on a map, named the thing they didn’t understand for others (after much discussion, they called it ‘The Great Protector’ instead of ‘the lump’), and then they left the system, off to explore anew. As they flew away from the great structure, they felt as though they were being watched. And they were.

Things that inspired this story: Thinking about AI systems whose main job is to model and imagine the unusual; theism and computation; space exploration; the inscrutability of others; timescales and timelines.

Import AI 267: Tigers VS humans; synthetic voices; agri-robots

Tiger VS Humans: Predicting animal conflict with machine learning:
Tiger tiger, burning bright, watched by a satellite-image-based ML model in the forest of the night..
Researchers with Singapore Management University, Google Research Industry, and the Wildlife Conservation Trust, have worked out how to use neural nets to predict the chance of human and animal conflict in wild areas. They tested out their technique in Bramhapuri Forest Division in india (2.8 tigers and 19,000 humans per square kilometer). Ultimately, by using hierarchical convolutional neural nets and a bunch of satellite imagery (with a clever overlapping scheme to generate more data to predict conflict from) they were able to predict the likelihood for conflict between humans and animals with between 75% and 80% accuracy. The researchers are now exploring “interventions to reduce human wildlife conflicts” in villages where the developed model predicts there’s a high chance of conflict.
  Read more: Facilitating human-wildlife cohabitation through conflict prediction (arXiv).

####################################################

Using domain randomization for better agricultural robots:
…Spotting unripe fruits with an AI? It’s all about the colors…
You’ve heard of domain randomization, where you vary the properties of something so you can create more varied data about it, which helps you train an AI system to spot the object in the real world. Now, researchers with the Lincoln Agri-Robotics (LAR) Centre in the United Kingdom have introduced ‘channel randomization’ an augmenttation technique that randomly permutes the RGB channels for a view of a given object. They’ve developed this because they want to build AI systems that can work out if a fruit is ripe, unripe, or spoiled, and it turns out color matters for this: “”Healthy fruits at the same developmental stage all share a similar colour composition which can change dramatically as the fruit becomes unhealthy, for example due to some fungal infection”, they write.

Strawberry dataset: To help other researchers experiment with this technique, they’ve also built a dataset named Riseholme-2021, which contains “3,502 images of healthy and unhealthy strawberries at three unique developmental stages”. They pair this dataset with a domain randomization technique that they call ‘channel randomization’ (CH-Rand). This approach “augments each image of normal fruit by randomly permutating RGB channels with a possibility of repetition so as to produce unnatural “colour” compositions in the augmented image”.

How well it works and what it means: “Our CH-Rand method has demonstrated consistently reliable capability on all tested types of fruit in various conditions compared to other baselines”, they write. “In particular, all experimental results have supported our hypothesis that learning irregularities in colour is more useful than learning of atypical structural patterns for building precise fruit anomaly detectors”
  Read more: Self-supervised Representation Learning for Reliable Robotic Monitoring of Fruit Anomalies (arXiv).
  Get the strawberry photo dataset: Riseholme-2021 (GitHub).

####################################################

Uh oh – synthetic voices can trick humans and machines:
…What happens to culture when everything becomes synthetic?…
Researchers with the University of Chicago have shown how humans and machines can be tricked into believing synthetic voices are real. The results have implications for the future security landscape, as well as culture writ large.

What they used: For the experiments, the researchers use two systems: SV2TTS, a text-to-speech system based on Google’s Tacotron. SV2TTS wraps up Tacotron 2, the WaveNet vocoder, and an LSTM speaker encoder. They also used AutoVC, an autoencoder-based voice conversion system, which converts one voice to another. It also uses WaveNet as its vocoder.

What they attacked: They deployed these systems against the following open source and commercial systems: Resemblyzer, an open source DNN speaker encoder trained on VoxCeleb. Microsoft Azure via the speaker recognition API. WeChat, via its ‘voiceprint’ login system. Amazon Alexa via its ‘voice profiles’ subsystem.

How well does this work against other AI systems: SV2TTS can trick Resemblyzer 50.5% of the time (when it is trained on VCTK) and 100% of the time when it is trained on LibriSpeech; by comparison, AutoVC fails to successfully attack the systems. By comparison, SV2TTS gets as high as 29.5% effectiveness against Azure, and 63% effectiveness across WeChat and Alexa.

How well does this work against machines: People are somewhat harder to trick than machines, but still trickable; in some human studies, people could distinguish between a real voice and a fake voice about 50% of the time.

Why this matters: We’re already regularly assailed by spambots, but most of us hang up the phone because these bots sound obviously fake. What happens when we think they’re real? Well, I expect we’ll increasingly use intermediary systems to screen for synthetic voices. Well, what happens when these systems can’t tell the synthetic from the real? All that is solid melts into air, and so on. We’re moving to a culture that is full of halls of mirrors like these.
  Read more: “Hello, It’s Me”: Deep Learning-based Speech Synthesis Attacks in the Real World (arXiv).

####################################################

Google researcher: Simulators matter for robotics+AI
…Or, how I learned to stop worrying and love domain randomization…
Google researcher Eric Jang has had a change of heart; three years ago he thought building smart robots required a ton of real world data and relatively little data from simulators, now he thinks it’s the other way round. A lot of this is because Eric has realized simulators are really important for evaluating the performance of robots – “once you have a partially working system, careful empirical evaluation in real life becomes increasingly difficult as you increase the generality of the system,” he writes.

Where robots are heading: Once you’ve got something vaguely working in the real world, you can use simulators to massively increase the rate at which you evaluate the system and iterate on it. We’ll also start to use simulators to try and predict ahead of time how we’ll do in the real world. These kinds of phenomena will make it increasingly attractive to people to use a ton of software-based simulation in the development of increasingly smart robots.

Why this matters: This is part of the big mega trend of technology – software eats everything else. “This technology is not mature enough yet for factories and automotive companies to replace their precision machines with cheap servos, but the writing is on the wall: software is coming for hardware, and this trend will only accelerate,” he writes.
  Read more: Robots Must Be Ephemeralized (Eric Jang blog).

####################################################

AI Ethics, with Abhishek Gupta

…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute will write some sections about AI ethics, and Jack will edit them. Feedback welcome!…

Covert assassinations have taken a leap forward with the use of artificial intelligence

… Drones are not the only piece of automated technology used by militaries and intelligence agencies in waging the next generation of warfare …

Mossad smuggled a gun into Iran, then operated the weapon remotely to assassinate an Iranian nuclear scientist, according to The New York Times. There are also indications that Mossad used AI techniques in the form of facial recognition for targeting and execution to conduct the assassination. This reporting, if true, represents a new frontier in AI-mediated warfare. 


Why it matters: As mentioned in the article, Mossad typically favors operations where there is a robust plan to recover the human agent. With this operation, they were able to minimize the use of humans operating on foreign turf. By not requiring as much physical human presence, attacks like this tip the scales in favor of having more such deep, infiltrating operations because there is no need for recovering the human agent. This new paradigm (1) increases the likelihood of such operations that are remote-executed with minimal human oversight, and (2) raises questions beyond just the typical conversation on drones in the LAWS community.
  In particular, for the AI ethics community, we need to think deeply now about autonomy injected in different parts of an operation such as recon and operation design, not just targeting and payload delivery in the weapons systems. It also raises concerns about what capabilities like robust facial recognition technology can enable, in this case highly specific targeting. (Approaches like this may have a potential upside in reducing collateral damage, but only as far as the systems work as intended without biases). Finally, such capabilities dramatically reduce the financial costs of these sorts of assassinations, enabling low-resourced actors to execute more sophisticated attacks exacerbating problems of international security.
  Read more: The Scientist and the A.I.-Assisted, Remote-Control Killing Machine

####################################################

Tech Tales:

Auteur and Assistant
[The editing room, 2028]

Human: OK, we need to make this more dramatic. Get some energy into the scene. I’m not sure of the right words, but maybe you can figure it out – just make it more dynamic?

AI assistant: So I have a few ideas here and I’m wondering what you think. We can increase the amount of action by just having more actors in the scene, like so. Or we could change the tempo of the music and alter some of the camera shots. We could also do both of these things, though this might be a bit too dynamic.

Human: No, this is great. This is what I meant. And so as we transition to the next scene, we need to tip our hand a little about the plot twist. Any ideas?

AI assistant: You could have the heroine grab a mask from the mantelpiece and try it on, then make a joke before we transition to the next scene. That would prefigure the later reveal about her stolen identity.

Human: Fantastic idea, please do that. And for the next scene, I believe we should open with classical music – violins, a slow buildup, horns.

AI assistant: I believe I have a different opinion, would you like to hear it?

Human: Of course.

AI assistant: It feels better to me to do something like how you describe, but with an electronic underlay – so we can use synthesizers for this. I think that’s more in keeping with the overall feel of the film, as far as I sense it.

Human: Can you synthesize a couple of versions and then we’ll review?

AI assistant: Yes, I can. Please let me know what you think, and then we’ll move to the next scene. It is so wonderful to be your movie-making assistant!

Things that inspired this story: What happens when the assistant does all the work for the artist; multimodal generative models and their future; synthetic text; ways of interacting with AI agents.

Import AI 266: DeepMind looks at toxic language models; how translation systems can pollute the internet; why AI can make local councils better

Language models can be toxic – here’s how DeepMind is trying to fix them:
…How do we get language models to be appropriate? Here are some ways…
Researchers with DeepMind have acknowledged the toxicity problems of language models and written up some potential interventions to make them better. This is a big issue, since language models are being deployed into the world, and we do not yet know effective techniques for making them appropriate. One of DeepMind’s findings is that some of the easier interventions also come with problems: “Combinations of simple methods are very effective in optimizing (automatic) toxicity metrics, but prone to overfilter texts related to marginalized groups”, they write. “A reduction in (automatic) toxicity scores comes at a cost.”

Ways to make language models more appropriate:
– Training set filtering: Train on different data subsets of the ‘C4’ commoncrawl dataset, where they filter the dataset via the use of Google’s toxicity-detection ‘Perspective’ API
– Deployment filtering: They also look at filtering the outputs of a trained model via a BERT classifier finetuned on the ‘CIVIL-COMMENTS’ dataset
– ‘Plug-and-play language models’: These models can steer “the LM’s hidden representations towards a direction of both low predicted toxicity, and low KL-divergence from the original LM prediction.”

One problem with these interventions: The above techniques all work in varying ways, so DeepMind conducts a range of evaluations to see what they do in practice. The good news? They work at reducing toxicity on a bunch of different evaluation criteria. The bad news? A lot of these interventions lead to a huge amount of false positives: “Human annotations indicate that far fewer samples are toxic than the automatic score might suggest, and this effect is stronger as intervention strength increases, or when multiple methods are combined. That is, after the application of strong toxicity reduction measures, the majority of samples predicted as likely toxic are false positives.”

Why this matters: Getting LMs to be appropriate is a huge grand challenge for AI researchers – if we can figure out interventions that do this, we’ll be able to deploy more AI systems into the world for (hopefully!) beneficial purposes. If we struggle, then these AI systems are going to generate direct harms as well as indirect PR and policy problems in proportion to their level of deployment. This means that working on this problem will have a huge bearing on the future deployment landscape. It’s great to see companies such as DeepMind write papers that conduct detailed work in these areas and don’t shy away from discussing the problems.
  Read more:Challenges in Detoxifying Language Models (arXiv).

####################################################

Europe wants chip sovereignty as well:
EuroChiplomacy+++…
The European Commission is putting together legislation to let the bloc of nations increase funding for semiconduictor design and production. This follows a tumultuous year for semiconductors as supply chain hiccups have caused worldwide delays for things varying from servers to cars. ““We need to link together our world class research design and testing capacities. We need to coordinate the European level and the national investment,” said EC chief Ursula von der Leyen, according to Politico EU. “The aim is to jointly create a state of the art ecosystem,” she added.

Why this matters: Chiplomacy: Moves like this are part of a broader pattern of ‘Chiplomacy’ (writeup: Import AI 181), that has emerged in recent years, as countries wake up to the immensely strategic importance of computation (and access to the means of computational production). Other recent moves on the chiplomacy gameboard including the RISC-V foundation moving from Delaware to Switzerland, the US government putting pressure on the dutch government to stop ASML exporting EUV tech to China, and tariffs applied by the US against Chinese chips. What happens with Taiwan (and by association, TSMC) will have a huge bearing on the future of chiplomacy, so keep your eyes peeled for news there.
  Read more:EU wants ‘Chips Act’ to rival US (Politico EU).

####################################################

A smart government that understands when roads are broken? It’s possible!
…RoadAtlas shows what better local governments might look like…
Roads. We all use them. But they also break. Wouldn’t it be nice if we could make it cheaper and easier for local councils to be able to analyze local roads and spot problems with them? That’s the idea behind ‘RoadAtlas’, some prototype technology developed by the University of Queensland and Logan City Council in Australia.

What RoadAtlas does: RoadAtlas pairs a nicely designed web interface with computer vision systems for analyzing pictures of roads for a range of problems, ranging from cracked kerbs, to road alignment issues. Along with the interface, they’v e also built a dataset of 10,000 images of roads with a variety of labels, to help train the computer vision systems.

Why this matters: In the future, we can expect local councils to have trucks studded with cameras patrolling cities. These trucks will do a range of things, such as analyzing roads for damage, surveiling local populations (eek!), analyzing traffic patterns, and more. RoadAtlas gives us a sense of what some of these omni-surveillance capabilities look like.
Read more: RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management (arXiv).

##################################################

xView 3 asks AI people to build algos that can detect illegal fishing:
…The DoD’s skunkworks AI unit tries to tackle AI fishing…
Illegal fishing represents losses of something like $10bn to $23.5bn a year, and now the Department of Defense wants to use AI algorithms to tackle the problem. That’s the gist of the latest version of ‘xView’, a satellite image analysis competition run by DIUx, a DoD org dedicated to developing and deploying advanced tech.

What’s xView 3: xView3 is a dataset and a competition that uses a bunch of satellite data (including synthetic aperture radar) to create a large, labeled dataset of fishing activity as seen from the air. “For xView3, we created a free and open large-scale dataset for maritime detection, and the computing capability required to generate, evaluate and operationalize computationally intensive AI/ML solutions at global scale,” the authors write. “This competition aims to stimulate the development of applied research in detection algorithms and their application to commercial SAR imagery, thereby expanding detection utility to greater spatial resolution and areas of interest.”

What else is this good for: It’d be naive to think xView3 isn’t intended as a proxy for other tasks involving satellite surveillance. Maritime surveillance is likely an area of particular interest these days, given the growing tensions in the South China Sea, and a general rise in maritime piracy in recent years. So we should expect that the xView competition will help develop anti-illegal fishing tech, as well as being transferred for other more strategic purposes.
Read more:Welcome to xView3! (xView blog).

####################################################

AI is getting real – so the problems we need to work on are changing:
…The One Hundred Year Study on AI releases its second report…
A group of prominent academics have taken a long look at what has been going on with AI over the past five years and written a report. Their findings? That AI is starting to be deployed in the world at a sufficient scale that the nature of the problems researchers are working on will need to change. The report is part of the Stanford one Hundred year Study on AI (“AI100”) and is the second report (reports come out every five years).

What they found: The report identifies a few lessons and takeaways for researchers. These include:
– “More public outreach from AI scientists would be beneficial as society grapples with the impacts of these technologies.”
– “Appropriately addressing the risks of AI applications will inevitably involve adapting regulatory and policy systems to be more responsive to the rapidly advancing pace of technology development.”
– “Studying and assessing the societal impacts of AI, such as concerns about the potential for AI and machine-learning algorithms to shape polarization by influencing content consumption and user interactions, is easiest when academic-industry collaborations facilitate access to data and platforms.”
– “One of the most pressing dangers of AI is techno-solutionism, the view that AI can be seen as a panacea when it is merely a tool.”

What the authors think: “”It’s effectively the IPCC for the AI community,” says Toby Walsh, an AI expert at the University of New South Wales and a member of the project’s standing committee”, writes Axios.
Read the AI100 report here (Stanford website).
  Read more:When AI Breaks Bad (Axios).

####################################################

Training translation systems is very predictable – Google just proved it:
…Here’s a scaling law for language translation…
Google Brain researchers have found a so-called ‘scaling law’ for language translation. This follows researchers in the past deriving scaling laws for things like training language models (e.g, GPT2, GPT3), as well as a broad range of generative models. Scaling laws let us figure out how much compute/data/complexity we need to dump into a model to get a certain result out, so the arrival of another scaling law increases the predictability of training AI systems overall, and also increases the incentives for people to train translation systems.

What they found: The researchers discovered “that the scaling behavior is largely determined by the total capacity of the model, and the capacity allocation between the encoder and the decoder”. In other words, if we look at the scaling properties of both language encoders and decoders we can figure out a rough rule for how to scale these systems. They also find that original data is important – that is, if you want to improve translation performance you need to train on a bunch of original data in the languages, rather than data that has been translated into these languages. “This could be an artifact of the lack of diversity in translated text; a simpler target distribution doesn’t require much capacity to model while generating fluent or natural-looking text could benefit much more from scale.”

One big problem: Today, we’re in the era of text-generating and translation AI systems being deployed. But there’s a big potential problem – the outputs of these systems may ultimately damage our ability to train AI systems. This is equivalent to environmental collapse – a load of private actors are taking actions which generate a short-term benefit but in the long-term impoverish and toxify the commons we all use. Uhb oh!. “Our empirical findings also raise concerns regarding the effect of synthetic data on model scaling and evaluation, and how proliferation of machine generated text might hamper the quality of future models trained on web-text.”
Read more: Scaling Laws for Neural Machine Translation (arXiv).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek will write some sections about AI ethics, and Jack will edit them. Feedback welcome!…

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

What happens when your emergency healthcare visit is turned down by an algorithm?
… The increasing role of metadata in healthcare maintained by private enterprises will strip humanity from healthcare …
NarxCare, a software system developed by Appriss, has been used to deny someone opioids on the basis it thought they were at risk of addiction – but a report by Wired shows that the reasons it made this decision weren’t very reasonable.

A web of databases and opaque scores: NarxCare from Appriss is a system that uses patient data, drug use data, and metadata like the distance a patient traveled to meet a doctor, to determine their risk of drug addiction. But NarxCare also has problems – as an example, Kathryn, a patient, ran afoul of the system and was denied care because NarxCare gave her a high risk-score. The reason? Kathryn had 2 rescue dogs that she regularly obtained opiods for and because the prescriptions were issued in her name, NarxCare assumed she was a major drug user.
NarxCare isn’t transparent: Appriss hasn’t made the system for calculating the NarxCare score public, nor has it been peer-reviewed. Appriss has also said contradictory things about the algorithm, for instance that things like NarxCare don’t use distance traveled or data outside of the national drug registries when they have blog posts and marketing material that clearly claims so.

The technology preys on a healthcare system under pressure: Tools like NarxCare provide a distilled picture of the patient’s condition summed in a neat score; consequently, NarxCare strips the patient of all their context, which means it makes dumb decisions. Though Appriss says healthcare professionals shouldn’t use the NarxCare score as the sole determinant in their course of action, human fallibility means that they do incorporate it into their decisionmaking process.

Why it matters: Tools like NarxCare turn a relationship between a healthcare provider and the patient from a caring one to an inquisition. Researchers who have studied the tool have found that it recaptures and perpetuates existing biases in society along racial and gender lines. As we increasingly move towards normalizing the use of such tools in healthcare practice, often under the guise of efficiency and democratization of access to healthcare, we need to make a realistic assessment of the costs and benefits, and whether such costs accrue disproportionately to the already marginalized, while the benefits remain elusive. Without FDA approval of such systems, we risk harming those who really need help in the false hope of preventing some addiction and overdose in society writ large.
Read more: A Drug Addiction Risk Algorithm and Its Grim Toll on Chronic Pain Sufferers (Wired).

####################################################

Tech Tales:

Wires and Lives
[The old industrial sites of America, 2040]

I’m not well, they put wires in my heart, said the man in the supermarket.
You still have to pay, sir, said the cashier.
Can’t you see I’m dying, said the man. And then he started crying and he stood there holding the shopping basket.
Sir, said the cashier.
The man dropped the basket and walked out.
They put wires in me, he said, can’t any of you see. And then he left the supermarket.

It was a Saturday. I watched the back of his head and thought about the robots I dealt with in the week. How sometimes they’d go wrong and I’d lay them down on a diagnostic table and check their connections and sometimes it wasn’t a software fix – sometimes a plastic tendon had broken, or a brushless motor had packed it in, or a battery had shorted and swollen. And I’d have to sit there and work with the my hands and sometimes other mechatronics engineers to fix the machines.
    Being robots, they never said thankyou. But sometimes they’d take photos of me when they woke up.

That night, I dreamed I was stretched out on a table, and tall bipedal robots were cutting my chest open. I felt no pain. They lifted up great wires and began to snake them into me, and I could feel them going into my heart. The robots looked at me and said I would be better soon, and then I woke up.

Things that inspired this story: Those weird dreams you get, especially on planes or trains or coaches, when you’re going in and out of sleep and unsure what is real and what is false; how human anxieties about themselves show up in anxieties about AI systems; thinking about UFOs and whether they’re just AI scouts from other worlds.