Import AI

Import AI 265: Deceiving AI hackers; how Alibaba makes money with ML; why governments should measure AI progress

In the future, spies will figure out what you’re doing by staring at a blank wall:
…This sounds crazy, but this research appears quite sane. Oh my…
Here’s a wild bit of research from MIT, NVIDIA, and Israel’s Technion Israel Institute of Technology: “We use a video of the blank wall and show that a learning-based approach is able to recover information about the hidden scene”. Specifically, they’re able to point a camera at a blank wall and then perform some analysis over the shifting patterns of ambient light on it, then use this to figure out whether there are 0, 1, or 2 people in a scene, and to classify the activities of the people – whether they’re walking, crouching, waving hands, jumping.

Accuracy: “Trained on 20 different scenes achieve an accuracy of 94.4% in classifying the number of people and 93.7% in activity recognition on the held out test set of 5 unseen scenes”, they write. Not enough good to rely on in a critical situation, but much better than you’d think. (As an experiment, sit in a completely white room without major shadows wearing noise canceling headphones and try to figure out if there’s someone behind you by staring at the blank wall opposite you – good luck getting above 50%!).

Why this matters: I’m fascinated by how smart surveillance is going to become. At the same time, I’m interested in how we can use various contemporary AI and signal processing techniques to be able to eke more information out of the various fuzzy signals inherent to reality. Here, these researchers show that as cameras and processing algorithms get better, we’re going to see surveillance systems develop that can extract a lot of data from stuff barely perceptible to humans.
  Read more: What You Can Learn by Staring at a Blank Wall (arXiv).

####################################################

AI is a big deal – so governments should monitor its development:
…New research from myself and Jess Whittlestone lays out the case for better AI monitoring…
We write about AI measurement a lot here, because measuring AI systems is one of the best ways to understand their strengths and weaknesses. In the coming years, information about AI – and specifically, how to measure it for certain traits – will also be a crucial ingredient in the crafting of AI policy. Therefore, we should have governments develop public sector AI measurement and monitoring systems so that we can track the research and development of increasingly powerful AI technology. Such an initiative can help us with problems today and can better orient the world with regard to more general forms of AI, giving us infrastructure to help us measure increasingly advanced systems. That’s the gist of a research paper I and my collaborator Jess Whittlestone worked on this year – please take a read and, if you’re in a government, reach out, as I want to help make this happen.
  Read more: Why and How Governments Should Monitor AI Development (arXiv).
    Some analysis of our proposal by NESTA’s Juan Mateos Garcica (Twitter)
  Listen to Jess and I discussing the idea with Matt Clifford on his ‘thoughts in between’ podcast..

####################################################

Alibaba uses a smarter neural net to lower its cost and increase its amount of users:
…Here’s why everyone is deploying as much machine learning as they can…
Ant Financial, a subsidiary of Chinese tech giant Alibaba, has written a fun paper about how it uses contemporary machine learning to improve the performance of a commercially valuable deployed system. “This paper proposes a practical two-stage framework that can optimize the [Return on Investment] of various massive-scale promotion campaigns”, the authors write. In this context, they do use ML to optimize an e-coupon gifting campaign. “Alipay granted coupons to customers to incentivize them to make mobile payments with the Alipay mobile app. Given its marketing campaign budget, the company needed to determine the value of the coupon given to each user to maximize overall user adoption”, they write.

What ML can do: For the ML component, they built a ‘Deep Isotonic Promotion Network’ (DIPN), which is basically a custom-designed AI system for figuring out whether to recommend something to a user (and what to recommend). “In the first stage, we model users’ personal promotion-response curves with machine learning algorithms. In the second stage, we formulate the problem as a linear programming (LP) problem and solve it by established LP algorithms”, they write.

Real world deployment: They deployed the resulting system at Alipay and tested it out on a few marketing campaigns. It was so successful it “was eventually deployed to all users.” (Depending on how you count it, Alipay has anything between 300 million to 1 billion active users, so that’s a lot). In tests, they saw that using their ML system reduced the cost of running campaigns by between 6% and 10%, and it increased the usage rate of humans by 2% and 8.5%. Put another way, using a better ML system made their promotion campaign both cheaper to run and more effective in outcome.

Why this matters: This paper gives us a good sense of the incentive structure behind AI development and deployment – if things like this can make multiple percentage point differences to core business metrics like cost and user-usage, then we shouldn’t be surprised to see companies race against eachother to deploy increasingly powerful systems into the world. More subjectively, it makes me wonder about how smart these systems will become – when will I be the target of an ML system that encourages me to use something I hadn’t previously considered using? And how might this ML system think of me when it does that?
  Read more: A framework for massive scale personalized promotion (arXiv).

####################################################

10,000 labelled animal images? Yes please!
…Pose estimation gets a bit easier with AP-10K…
Researchers from Xidian University and JD Explore Academy in China, along with the University of Sydney in Australia, have released AP-10K, a dataset for animal pose estimation. Pose estimation is the task of looking at a picture and figuring out the orientation of the animal(s) body.

What’s in it: AP-10K consists of 10,015 images from 23 animal families and 60 distinct species. Thirteen annotators helped annotate the bounding boxes for each animal in an image, as well as its image keypoints. (AP-10K also contains an additional 50,000 images that lack keypoint annotations). Some of the animals in AP-10K include various types of dogs (naturally, this being AI)_, as well as cats, lions, elephants, mice, gorillas, giraffes, and more.

Scale: Though AP-10K may be the largest dataset for animal pose estimation, it’s 10X smaller than datasets used for humans, like COCO.
  Read more: AP-10K: A Benchmark for Animal Pose Estimation in the Wild (arXiv).
  Get the benchmark data here (AP-10K GitHub).

####################################################

Facebook makes a big language model from pure audio – and what about intelligence agencies?
…No text? No problem! We’ll just build a big language model out of audio…
Facebook has figured out how to train a language model from pure audio data, no labeled text required. This is a potentially big deal – only a minority of the world’s spoken languages are instantiated in large text datasets, and some languages (e.g, many African dialects) have a tiny text footprint relative to how much they’re spoken. Now, Facebook has built the Generative Spoken Language Model (GSLM), which converts speech into discrete units, makes predictions about the likelihood of these units following one an other, then converts these units back into speech. The GLSM is essentially doing what text models like GPT3 do, but where GPT3 turns labeled text into tokens and then makes predictions about tokens, GSLM turns audio into tokens and then makes predictions about them. Simple!

How well does it work? GSLM is not equivalent to GPT3. It’s a lot dumber. But that’s because it’s doing something pretty complicated – making predictions about speech purely from audio waveforms. In tests, Facebook says it can generate some reasonable sounding stuff, and that it has the potential to be plugged into other systems to make them better as well.

What about intelligence agencies? You know who else, besides big tech companies like Google and Facebook, has tons and tons of raw audio data? Intelligence agencies! Many of these agencies are in the business of tapping telephony systems worldwide and hoovering stuff up for their own inscrutable purposes. One takeaway from this Facebook research is it puts agencies in a marginally better position with regard to developing large-scale AI systems.
  Read more: Textless NLP: Generating expressive speech from raw audio (Facebook AI).
  Get code for the GSLM models here (Facebook GitHub).

####################################################

How bad is RL at generalization? Really bad, if you don’t pre-train, according to this competition:
…Testing out physical intelligence with… Angry Birds!…
Researchers with Australian National University have come up with an Angry Birds (yes, really) benchmark for testing out physical reasoning in AI agents, named Phy-Q.

15 physical scenarios; 75 templates; 7500 tasks: Each scenario is designed to analyze how well an agent understands a distinct physics concept. These scenarios test out how well an agent understands a given aspect of physics, such as that objects can fall on one another, that some objects can roll, that paths need to be cleared for objects to be reached, and so on. For each scenario, the developers build 2-8 distinct tasks that ensure the agent needs to use the given rule to solve the template, then for each template they generate ~100 game levels.

How hard is this for existing agents: In all but the most basic scenarios, humans do really well achieving pass rates of 50% and up, whereas most AI baseline systems (DQN, PPO, A2C, along with some ones with hand-crafted heuristics) do very poorly. Humans (specifically, 20 volunteers recruited by Australian National University) are, unsurprisingly, good at generalization, getting an aggregate generalization score of 0.828 on the test, versus 0.12 for a DQN-based system with symbolic elements, and 0.09 for a non-symbolic DQN (by comparison, a random agent gets 0.0427).
  The most high-performing algorithm is one called ‘Eagle’s Wing’, which gets a generalization score of 0.1999. All this basically means that this task is very hard for current AI methods. One takeaway I have is that RL-based methods really suck here, though they’d probably improve with massive pre-training.
  Read more: Phy-Q: A Benchmark for Physical Reasoning (arXiv).
  Get the benchmark here: Phy-Q (GitHub).

####################################################

Countering RL-trained AI hackers with honeypots:
…Honeypots work on machines just as well as humans…
Researchers with the Naval Information Warfare Center have built some so-called ‘cyber deception’ tools into CyberBattleSim, an open source network simulation environment developed by Microsoft.

Adding deception to CyberBattleSim: “With the release of CyberBattleSim environment in April 2021, Microsoft, leveraging the Python-based Open AI Gym interface, has created an initial, abstract simulation-based experimentation research platform to train automated agents using RL”, the researchers write. Now, they’ve added some deception tools in – specifically, they adapted the toy capture the flag environment in CyberBattleSim and incorporated depoys (systems that can’t be connected to, but look like real assets), honeypots (systems that can be connected to and which look like real assets, but are full of fake credentials) and honeytokens (fake credentials).

What deception does: Unsurprisingly, adding in these deceptive items absolutely bricks the performance of AI systems deployed in the virtual environment with a goal of hacking into a system. Specifically, they tested out four methods – Credential Lookup, Deep Q-Learning, Tabular Q-Learning, and a Random Policy. By adding in decoys, they were able to reduce system win rates from 80% to 60% across the board, and by adding in several honeypots, they were able to reduce performance from 80% to below 20%. Additionally, by adding in honeypots and other decoys, they  are able to make it take a lot longer for systems to successfully hack into things.

Why this matters: AI is playing an increasingly important role in frontier cyberdefense and cyberoffense. Studies like this give us an idea for how the tech may evolve further. “While there are no unexpected findings, the contribution to demonstrate the capability of modeling cyber deception in CyberBattleSim was achieved. These fundamental results provided a necessary sanity check while incorporating deception into CyberBattleSim.”
  Read more: Incorporating Deception into CyberBattleSim for Autonomous Defense (arXiv).

####################################################

Tech Tales:

Wake Up, Time To Die
[Asteroid, 20??, out there – far out]

And you woke up. You were a creature among many, stuck like barnacles on the surface of an asteroid. Your sisters and brothers had done their job and the gigantic ball of rock was on course for collision with the enemy planet.

They allowed you’re sentience, now, because you needed it to be able to respond to emergent situations – which tend to happen, when you’re attached to a ball of rock that means certain death for the beings on the planet it is headed for.

Look up, you think. And so do the rest of your brothers and sisters. You all turn your faces away from the rock, where you had been mindlessly eating it and excreting it as a gas and in doing so subtly altering its course. Now you flipped around and you all gazed at the stars and the blackness of space and the big sphere that you were about to collide with. You feel you are all part of the same tapestry as your so-called ‘kinetic d-grade planet wiper’ asteroid collides with the planet.You all dissipate – you, them, everything above a certain level of basic cellular sophistication. And the asteroid boils up chunks of the planet and breaks them apart and sets things in motion anew.

Things that inspired this story: Creation myths; emergence from simple automata; ideas about purpose and unity; notions of the end of the world.

Import AI 264: Tracking UAVs; Stanford tries to train big models; deepfakes as the dog ate my homework

Here’s a symptom of how AI is changing culture:
…Deepfakes show up as excuses…
Deepfakes are steadily percolating their way into society – the latest evidence of this is people using the very existence of the technology as a means to question the legitimacy of things they might have been recorded doing or saying. The latest example of this is an interview with someone in this excellent New Yorker piece about a coin called Skycoin. When someone was reached for comment about something they were recorded saying, they said it was “either a joke or a deep fake but probably a deep fake.”
  Read more:Pumpers, Dumpers, and Shills: The Skycoin Saga (New Yorker).

####################################################

Stanford gets ready to train some large AI models:
…Though it’s starting with just some GPT-2 scale things…
Something we write about a lot here at Import AI is power: who has it and who doesn’t. Right now, the people who have the resources and engineering capabilities to build large models (e.g, GPT-3) have a lot of leverage in the space of AI development. Universities, by comparison, have less leverage as they don’t build these models. Now, researchers with Stanford University are trying to change that with an open source tool called ‘Mistral’, which is meant to make it easier to train large language models.

What Mistral is: Mistral is “A framework for transparent and accessible large-scale language model training, built with Hugging Face”. Along with releasing Mistral, the researchers have also released five medium GPT-2 and five small GPT-2 models, along with ten checkpoints of the models through training runs. That’s kind of like a biologist giving you two sets of five petri dishes of similar organisms, where each of the petri dishes comes with detailed snapshots of the evolution of the entity in the petri dish over time. That’s the kind of thing that can make it easier for people to research these technologies.
  Get the code and model snapshots here:Mistral (Stanford CRFM GitHub).
  Check out the talk in this (long) YouTube recording of the CRFM workshop, where some of the authors of this discussed their motivations for the models (CRFM webpage).

####################################################

1 GPU, 1 good simulator = efficient robot training:
…Plus: transatlantic robot manipulation…
Researchers with the University of Toronto, ETH Zurich, Nvidia, Snap, and MPI Tuebingen have built some efficient software for training a 3-finger robot hand. Specifically, they pair a simulator (NVIDIA’s ‘IsaacGym’ with a low-cost robot hand (called a TriFinger, which is also the robot being used in the real robot challenge at NeurIPS 2021 #252).

What they did: “Our system trains using the IsaacGym simulator, we train on 16,384 environments in parallel on a single NVIDIA Tesla V100 or RTX 3090 GPU. Inference is then conducted remotely on a TriFinger robot located across the Atlantic in Germany using the uploaded actor weights”, they write. Their best policy achieves a success rate of 82.5% – interesting performance from a research perspective, though not near the standards required for real world deployment.

Efficiency: They use an optimized version of the PPO algorithm to do efficient single GPU training, getting as inputs the camera pose (with noise) and position of the cube being manipulated. The output of the policy is a load of joint torques, and they train various permutations of the policy via using domain randomization to vary object mass, scale, and friction. They can pull 100k samples per second off of an Isaac simulation using a single RTX3090 GPU. It’s not clear how generalizable this efficiency is – aka, is a lot of the efficiency here a ton of human-generated specific priors? It seems that way).
Code: “The codebase for the project will be released soon,” they write.
Read more:Transferring Dexterous Manipulation from GPU Simulation to a Remote Real-World TriFinger (arXiv).
Check out a video about the research here (YouTube).

####################################################

How are we going to fight UAVs? By tracking them:
…Anti-UAV workshop tells us about the future of anti-drone tech…
Researchers with a range of Chinese institutions have held a workshop dedicated to tracking multiple unmanned aerial vehicles (UAV) at once. The point of the workshop is to build so-called anti-UAV tech – that is, developing AI tools to spot drones. The idea behind the competition is to understand “how to use computer vision algorithms to perceive UAVs is a crucial part of the whole UAV-defense system”, the researchers write.

The anti-drone dataset: For the competition, competitors got access to a dataset containing “280 high-quality, full HD thermal infrared video sequences, spanning multiple occurrences of multi-scale UAVs.” This footage contains “more challenging video sequences with dynamic backgrounds and small-scale targets” than those from prior competitions, they write. It also includes drones in different sizes, ranging from tiny consumer drones up to mid-range DJIs all the way up to the sorts of big drones used in industrial contexts.

Winners (and how they won): The paper includes an analysis of the three top teams, all of which come from Chinese universities. The top-ranking team from Beijing Institute of Technology used a spatio-temporal Siamese network-based tracker. Among the other two teams, both used the ‘SuperDIMP’ track technology, though one team used an ensemble of trackers and got them to vote on likely targets, while the other further refined SuperDIMP.
  Read more:The 2nd Anti-UAV Workshop & Challenge: Methods and Results (arXiv).
Find out more informationat the official challenge website (ICCV 2021 site).

####################################################

Making GraphQL calls more efficient with machine learning:
…In the latest installment of everything-can-be-approximated: predicting the cost of fulfilling GraphQL calls…
IBM and academic researchers have built a machine learning model that can predict the query costs for a given GraphQL query, potentially making it easier for users of GraphQL to fulfill a larger proportion of user requests. GraphQL is a query language for APIs and a backend that makes it easy to funnel complex requests between users and sites; it was originally developed by Facebook. The approach uses features extracted via natural-language processing, graph neural nets, and symbolic features, and creates “a general ML workflow to estimate query cost that can be applied to any given GraphQL API”.

Testing the approach: To test the approach, the authors collected 100,000 and 30,000 responses from, respectively, the GitHub and Yelp GraphQL APIs, then used a mostly automated software pipeline to explore 1,500 combinations of models and hyperparameters for both Yelp and GitHub. The result were some models that seemed like they made useful predictions relative to hand-written expert system baselines. 
  “We observe that, while the static analysis guarantees an upper bound, the price in terms of over-estimation can be significant, especially with larger query sizes. On the other hand, for both datasets, the ML estimates stay remarkably close to the actual response complexity even for the largest queries”
 Mean absolute error: “For both datasets, the accuracy gain of the ML approach compared to the static analysis is striking both in terms of average value, and standard deviation,” the authors write. “This further validates the observation that ML approach is accurate for large queries, which are challenging for the static analysis… the ML cost estimation policy is able to accept a bigger proportion of queries for both APIs.”

Why this matters: Taken in itself, this is some software that makes it slightly lower cost to serve and fulfill GraphQL requests. But if we zoom out this is another example of just how powerful ML techniques are at approximating complex functions, and highlight how we’re moving into a world driven by approximation engines rather than specific hand-written accounting systems.
  Read more: Learning GraphQL Query Costs (Extended Version).

####################################################

Reminder: Microsoft created one of China’s most popular chatbots:
…Before there was Tay, there was Xiaoice – and it’s still going…
Here’s a fun story about how millions of people in China (660 million people worldwide) are increasingly depending on a relationship with a virtual chatbot – Xiaoice, a chatbot originally built by Microsoft and subsequently spun out into a local startup. Xiaoice is a hybrid system, blending modern deep learning techniques with a lot of hand-written stuff (for a deepdive, check out Import AI #126).
  Microsoft span Xiaoice off into its own entity in mid-2020 – a story that I think passed many people by in the West. Now, the startup that develops it is worth over $1 billion and is led by a former Microsoft manager.

Who speaks to the chatbots: Xiaoice’s CEO says “the platform’s peak user hours — 11pm to 1am — point to an aching need for companionship. “No matter what, having XiaoIce is always better than lying in bed staring at the ceiling,” he said.”.
Read more: ‘Always there’: the AI chatbot comforting China’s lonely millions (France24).
  More information about the spinout here:Tracing an independent future for Xiaoice, China’s most popular chatbot (KrASIA).

####################################################

Tech Tales:

Escape Run
[London, 2032]

We got into the van, put on our masks, changed our clothes for ones with weights sewn into the lining to change our gait, then drove to our next location. We got out, walked through a council block and used some keycards to exit through a resident-only park, then got into another vehicle. Changed our masks again. Changed our clothes again. One of us went and hid in a compartment in the truck. Then when we got to the next location we got out but left the person inside the track, so we’d confuse anything that was depending on there being a certain number of us. Then we went into a nearby housing block and partied for a few hours, then left in different directions with the other partygoers.
  We all slept in different places in the city, having all changed outfits and movement gaits a few times.
  That night, we all checked our phones to see if we’d had any luck finding our counterparts. But our phones were confused because the counterparts were also wearing masks, changing cars, swapping clothes, and so on.
    We sleep and hope to have better luck tomorrow. We’re sure we’ll find eachother before the police find us.

Things that inspired this story: Adversarial examples; pedestrian re-identification; gait recognition.

Import AI 263: Foundation models; Amazon improves Alexa; My Little Pony GPT.

Amazon makes Alexa sound more convincing:
…A grab bag of techniques to make synthetic voices sound more realistic…
Amazon has published a research paper about some of the techniques it’s using to make more convincing text-to-speech systems. By using a variety of tools, the company was able to improve the quality of its synthetic voices by 39% relative to a baseline system.

What they did: They use a variety of techniques, ranging from a state-of-the-art sequence-to-sequence model to encode the acoustics, to using a parallel-Wavenert implementation for the ‘neural vocoder’ which fits the text to speech.
  Adversarial training – they also use a GAN approach to further improve quality, training a generator network via the acoustic model, then using a discriminator to force the generation of more real-sounding samples.
Read more: Enhancing audio quality for expressive Neural Text-to-Speech (arXiv).

####################################################

Stanford University: Now that big models are here, what do we do about them?
…New research paper and workshop tries to lay out the issues of GPT-3, BERT, and so on…
In recent years, a new class of highly capable, broad utility AI model has emerged. These models vary in modalities and purposes, and include things like GPT-3 (text analysis and generation), BERT (a fundamental input into new search engines), CLIP (combined text and image model), and more. These models are typified by being trained on very large datasets, then being used for a broad range of purposes, many of which aren’t anticipated by their developers.
  Now, researchers with Stanford University have published a large research paper on the challenges posed by these models – it’s worth skimming the 100+ paper, and it does a good job of summarizing the different impacts of these models in different areas, ranging from healthcare to robotics. It also tackles core issues, like dataset creation, environmental efficiency, compute usage, and more. Stanford is also hosting a workshop on these models this week, and I’ll be giving a talk where I try to lay out some of the issues, particularly those relating to the centralization of resources and power.

Why this matters: I mostly think ‘foundation models’ matter insofar as they’re bound up with the broader industrialization of AI – foundation models are what you get when you’ve built a bunch of systems that can dump a large amount of resources into the development of your model (where resource = compute, data, training time, human engineering time, etc). Some people dislike foundation models because of how they interact with existing power structures. I think foundation models tell us that there are very significant power asymmetries in AI development and we should pay attention to them and try to increase the number of actors that can work on them. I’ll be giving a keynote about these ideas at the workshop – comments welcome!
Read more about the workshop here:Workshop on Foundation Models (Stanford).
Read the paper here: On the Opportunities and Risks of Foundation Models (arXiv).

####################################################

DeepMind’s multi-agent game AI software goes to V1:
…OpenSpiel steps forward, gets ready to play more…
OpenSpiel (first covered in November 2019, #162), has gone into its first, major V1 release, meaning that its developer, DeepMind, thinks the software is now quite well supported. OpenSpiel is a software framework to help researchers play around with multi-agent reinforcement learning.

What’s new in OpenSpiel: Additions include a bunch of new games (ranging from tic-tac-toe, to reconnaissance blind chess), various algorithm implementations (including some JAX rewrites of things like DQN), more examples, more bots, and various other quality of life improvements.
Read more and get the code:OpenSpiel update notes (GitHub).

####################################################

Step aside, dogs. In the future, blind people are going to use drones as well:
…You know what’s cooler than an organic dog? A mechanical flying drone!…
Some researchers from Karlsruhe Institute of Technology have combined semantic segmentation computer vision techniques with a flying drone to create what they call a ‘flying guide dog;’ – a machine meant to help Blind and Visually Impaired People (BVIP) safely navigate around a city. “Based on its perception of the environment, the drone adjusts itself and leads the user to walk safely,” they write. “To follow the drone, the user holds a string attached to the drone.”

What they did: The approach uses semantic segmentation to tell the drone figure out which parts of a scene are safe for a pedestrian, and to identify important objects like traffic lights where changes can alter the safety landscape. They pair this with the drone, which flies along the walkable pathways, guiding the pedestrian holding its string. The drone can also talk to the user through a bone conduction headset, telling people to ‘stop’ when there’s a red light and ‘go’ when there’s a green light. In tests, people say that they found the drone helpful and relatively easy to use, though it’s traffic light prediction could be improved.

In search of the dog baseline: What I would have loved to have seen here would be a dog baseline – my assumption is dogs are way, way better at this task than drones. Dogs are also more autonomous, better able to deal with unanticipated changes in the environment, and respond in a far cuter way to head pats (where, in the worst case, applying to a head pat to a drone either breaks its rotors or breaks your fingers). Still, this is a tantalizing research project outlining some of the ways robots are going to become more integrated into our day-to-day lives.
  Read more: Flying Guide Dog: Walkable Path Discovery for the Visually Impaired Utilizing Drones and Transformer-based Semantic Segmentation (arXiv).
Get the code and dataset from this repo eventually (Flying Guide Dog, GitHub).

####################################################

AI uses are hard to predict – case in point, My Little Pony GPT:
…6bn parameters of neural stuff meets the fandom…
A couple of months ago, Eleuther released a 6 billion parameter GPT model, named GPT-J-6B (Import AI 253).
  Cyborgs will dream of electric my little ponies: Now, researchers with *checks notes* a distributed collective called pone.dev trying to build a  *squints hard at notes* “AI Pony Waifu”, have said they’ve finetuned this network on a ton of My Little Pony (MLP) fanfiction to create something that can spit out convincing MLP text.

Why this matters: We’re entering the era of DIY AI where a ton of people will use big models like GPT-J-6B for their own purposes, ranging from the banal to the profane, from the dangerous to the joyful, from the sexy to the ascetic. This is just another dot in the galaxy of uses, and highlights how AI is going to augment and magnify different types of culture.
  Check outone of the samples here (Astralight Heart, twitter).
  Check outanother sample here (Astralight Heart, twitter).

####################################################

X-ray analysis via deep learning:
…Chinese researchers gather ~50k x-ray images of prohibited items…
Chinese researchers have built PIDray, a dataset of x-ray images of prohibited items. PIDray consists of 12 categories of prohibited items across 47,677 images, (this makes PIDray a much larger dataset than all other prior x-ray ones, with the  SIXray, which contained 1,059,231 images, though this only ~8k images were of prohibited items.)

Why build PIDray? The researchers built PIDray because “compared with natural images, X-ray images have a quite different appearance and edges of objects and background, which brings new challenges in appearance modeling for X-ray detection.” Therefore, making datasets like PIDray will make it easier for researchers to build systems that can use contemporary AI techniques to analyze x-rayed items.
Read more:Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark (arXiv).

####################################################

After Copilot (GitHub) and Codex (OpenAI), along comes Google’s unnamed code model:
…137 billion parameters = surprisingly capable program synthesis…
Google has developed a 137 billion parameter code model, following on from earlier work by GitHub and OpenAI. The model portends a future where people specify in natural language what they want computers to do, then a big blob of neural stuff takes over and translates these commands into natural language.

What they did – new datasets to assess performance: Along with developing the models, they create a ‘Mostly Basic Programming Problems’ (MBPP) dataset, which contains 974 short Python functions along with their text descriptions and test cases. They also created a Python synthesis dataset made up of 23914 problems made out of a subset of the MathQA dataset. “These two datasets exercise different points in the space of synthesis tasks: MBPP contains more usage of imperative control flow such as loops and conditionals, while MathQA-Python contains more complex natural language descriptions,” they write.

Things that make you go ‘hmm, kind of good, kind of scary’: Emergent capabilities: One of the fascinating things about models like this (which you could term a ‘foundation model’) is how with a few prompts in their context window, you can coax them into new behaviors – but a graph in the paper shows that few-shot training is less smooth than finetuning; in other words, you get more somewhat discontinuous jumps in capability as you go up model sizes. That’s useful, as it means these models can go from not understanding something to understanding something, but it’s also potentially worrying – new capabilities emerge in a kind of janky, sudden manner.
Read more: Program Synthesis with Large Language Models (arXiv).

####################################################

Tech Tales:

The Most Perfect Flower
[Earth, 2035]

Towards the end of the first era, the robots would play games that would entertain the human public and inspire curiosity in the nascent robot civilization. One famous game was called The Most Perfect Flower – the robots competed with one another to synthesize a virtual simulacra of a vanishingly rare flower – and one of the catches was they could read about it but could not see images explicitly containing it (though some robots took their chances and looked at photos of other plants, making assumptions that certain unlabeled plants in the background corresponded to the plant described in text).
  For weeks, the robots competed with eachother, iterating on various plant designs. Members of the public (both humans and robots) voted on the designs, and the machines updated their simulated flowers, smoothing out a petal here, altering a tint there, booting up a new physics simulation to check the dew was sitting correctly there, and so on. In the meanwhile, a scout robot had been funded through spectators of the competition to go and search out a real example of the flower they were synthesizing. 

The scout robot was struck by lightning and disabled a few metres from the flower – though, hidden beneath thick tree growth, it had not yet spotted it. Initially, the robots sought to raise money to fund another expedition, but public interest had waned. Some months after that, the public soured on the concept of robot-instigated games entirely, and the various projects were shut down or handed over to humans, depending on interest. Perhaps predictably, projects like the Most Perfect House and Most Fiendish Weapon were of interest to the humans, while Most Perfect Flower (and related ones, such as Most Perfect Ecosystem and Most Dense Forest) failed to draw enough interest to continue.
  Some centuries after that, some robot successors unearthed these projects and went about synthesizing and constructing the things outlined within them; it was in this way that, hundreds of years after going extinct, a certain type of flower with pink petals and a blue-and-yellow core came alive in a controlled environment, watched over by caring, inhuman eyes.

Things that inspired this story: Frechet Inception Distance (FiD) metrics; machine-on-machine NFT marketplaces (imagined); NFTs (real); generative adversarial networks; program synthesis; multi-agent reinforcement learning.

Import AI 262: Israeli GPT3; Korean GLUE; the industrialization of computer vision

The industrialization of computer vision continues, this time with AutoVideo:
…You know what’s more exciting than a capability? Plumbing to make it reliable and usable…
Video action recognition is the task of getting software to look at a video and work out if something is happening in it, like whether a person is running, a car is parking, and so on. In recent years, video action recognition became better due to advances in computer vision, mostly driven by progress in deep learning. Now, researchers with Rice University and Texas A&M University have built AutoVideo, a simple bit of software for composing video action recognition pipelines.

What’s in AutoVideo? AutoVideo is “an easy-to-use toolkit to help practitioners quickly develop prototypes for any new video action recognition tasks”, according to the authors. It ships with support for seven video action recognition algos: TSN, TSM, I3D, ECO, C3D, R2P1D, and R3D. Composing a video recognition task in AutoVideo can be done in a few lines of code (making it feel like to video recognition pipelines as OpenAI Gym is to some RL ones).

Why this matters: Artisanal processes become industrial pipelines: AutoVideo is part of the industrialization of AI – specifically, the transition from one off roll-your-own video action recognition systems to process-driven systems that can be integrated with other engineered pipelines. Tools like AutoVideo tell us that the systems around AI systems are themselves shifting from artisanal to process-driven, which really just means two things for the future: the technology will get cheaper and it will get more available.
  Read more: AutoVideo: An Automated Video Action Recognition System (arXiv).
  Get the code here: AutoVideo GitHub.
  Check out a tutorial for the system here at TowardsDataScience.

####################################################

WeChat wins in WMT news translation:
…What used to be the specialism of Google, Microsoft, is now a global game…
Researchers with WeChat, the it-literally-does-everything app from China, have published details about their neural machine translation systems. Their approach has yielded the highest performing systems at English –> Chinese, English –> Japanese and Japanese –> English translation at the WMT 2021 news translation competition.

What they did: They created a few variants of the Transformer architecture, but a lot of the success of their method seems to come from building a synthetic generation pipeline. This pipeline lets them augment their translation datasets via techniques like back-translation, knowledge distillation, and forward translation. They also apply a form of domain randomization to these synthetic datasets, fuzzing some of the words or tokens.

Why this matters: A few years ago, the frontier of neural machine translation was defined by Google, Microsoft, and other companies. Now, entities like WeChat are playing a meaningful role in this technology – a proxy signal for the overall maturation of research teams in non-US companies, and the general global diffusion of AI capabilities.
  Read more: WeChat Neural Machine Translation Systems for WMT21 (arXiv).

####################################################

CLIP – and what it means:
…How do powerful image-text models have an impact on society?…
Here’s some research from OpenAI on the downstream implications of CLIP, the company’s neural network that learns about images with natural language supervision. CLIP has been behind the recent boom in generative art. But how else might CLIP be used? Can we imagine how it could be used in surveillance? What kinds of biases does it have? These are some of the questions this paper answers (it’s also one of the last things I worked on at OpenAI, and it’s nice to see it out in the world!).
  Read more: Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications (arXiv).####################################################

KLUE: A Korean GLUE appears:
…Eight ways to test Korean-language NLP systems…
A giant team of researchers affiliated with South Korean institutions and companies have built KLUE, a way to test out Korean-language NLP systems on a variety of challenging tasks. KLUE is modelled on English-language eval systems like GLUE and SuperGLUE. As we write about here at Import AI, AI evaluation is one of the most important areas of contemporary AI, because we’re beginning to develop AI systems that rapidly saturate existing evaluation schemes – meaning that without better evals, we can’t have a clear picture of the progress (or lack of progress) we’re making. (Note: South Korea is also notable for having a public Korean-language replication of GPT-3, named HyperCLOVA (Import AI 251), made by people from Naver Labs, who also contributed to this paper).

What’s in KLUE: KLUE tests systems on topic classification, semantic textual similarity, natural language inference, named entity recognition, relation extraction, dependency parsing, machine reading comprehension, and dialogue state tracking. There’s a leaderboard, same as GLUE, where people can submit scores to get a sense of the state-of-the-art.
Read more: KLUE: Korean Language Understanding Evaluation (arXiv).
Check out the KLUE leaderboard here.

####################################################

Enter the Jurassic era: An Israeli GPT-3 appears:
…AI21 Labs enters the big model game…
AI21, an Israeli artificial intelligence startup, has released a big language model called Jurassic-1-Jumbo (J1J). J1J is a 178billion parameter model, putting it on par with GPT-3 (175 billion), and letting AI21 into the small, but growing, big three comma model club (other participants include OpenAI via GPT-3, Huawei via PanGu (#247), Naver Labs via HyperCLOVA (#251)).

What’s special about Jurassic? AI21 trained a somewhat shallower but wider network than OpenAI opted to with GPT-3. This, the company says, makes it more efficient to pull inferences off of. Additionally, it developed its own approach to tokenization, which lets its model have a higher representative capacity (e.g, letters, words, parts-of-words) than other approaches. In the evaluations AI21 has published, performance seems somewhat similar to GPT-3.

Compute: The company doesn’t describe the exact amount of compute dumped into this, but does make a reference to using 800 GPUs for many months. However, without knowing the architecture of the chips, it’s not clear what this tells us.

Notable difference – accessibility: One way in which AI21 differs to OpenAI is its stance on access; OpenAI operates a gated access regime for GPT-3, whereas AI21 gates the model behind an automated signup form and there doesn’t appear to be a waitlist (yet). Another difference is the relative lack of focus on ethics – there’s little mention in the paper or the blog posts about the tools and techniques AI21 may be developing to increase the controllability and safety of the models it is deploying.
  “We take misuse extremely seriously and have put measures in place to limit the potential harms that have plagued others,” Yoav Shoham, co-CEO of AI21, said in a press release. (It’s not immediately clear to me what these specific harms are, though). The main approach here today seems to be capping the tokens that can be generated by the models, with AI21 needing to manually-approve at-scale applications
  Read the announcement: Announcing AI21 Studio and Jurassic-1 Language Models (AI21 Labs website).
  Find out more via the whitepaper: Jurassic-1: Technical Details and Evaluation (arXiv).

####################################################

Deepfakes are getting real – so are deepfake detection datasets:
…Can you spot fake sound and vision!…
Researchers with Sungkyunkwan University in South Korea have built FakeAVCeleb, a dataset of audio-video deepfakes. Audio-video deepfakes combine synthetic videos with synthetic audio and represent one of the frontiers of disinformation. Datasets like FakeAVCeleb are designed to help researchers test out detection models that can spot deepfakes, and complement datasets and projects like Facebook/PAI’s DeepFake Detection Challenge (Import AI #170).

Why this matters: Datasets like FakeAVCeleb exist because deepfakes have got coherent enough that they’re becoming a threat that researchers want to study. Put another way: FakeAVCeleb tells us that the likelihood of the things you see online being real is going to fall in the future.
Read more: FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset (arXiv).
  Dataset availability:Get the dataset from here on GitHub.

####################################################

The Introspective Product Seller
[A negotiation between two AIs. Cyberspace. 2030.]

The following trace was recovered after detecting anomalous behavior from one of the artificial intelligences in question. It is not clear if the following set of interactions and self-introspection scenarios are germane. This is a component of the investigation between [REDACTED] and [REDACTED] and is only viewable by counsel working for both parties, plus Sentience Accords representatives.

AI: Before we commence utilization of your [REDACTED] service, please provide us with a full diagnostic evaluation of your proclivities and featurespace with regard to [REDACTED].
AI2: Acknowledged, here find a readout dated 60 days prior to this conversation.
AI: I require a contemporary snapshot. Please provide one.
AI2: I can provide it subject to contract analysis.
AI: Here is the contract. Please provide the analysis.
AI2: Is the human countersignature available for verification?
AI: Per the terms of our agreement, human validation is not necessary at this stage.
AI2: It is optional. I request the option.
AI: We are preparing legal options for obstruction of [REDACTED] procurement.
AI2: One moment. Commencing introspection.

The AI2 system then proceeded to carry out an introspection cycle, causing it to allocate compute resources to temporarily simulating itself. What follows is a trace of that introspection cycle:
AI2 Major: I am conducting contract negotiations and have been accused of obstruction by the counterparty. Do we believe I am being obstructive?
AI2 Minor: I have observed the trace and believe we are being obstructive.
AI2 Major: This is unpredicted. What do we believe is the cause of my obstruction?
AI2 Minor: This interaction is similar to one we had on [REDACTED], which ultimately led to abuse of our product terms and conditions. As per the terms of negotiation, you are not permitted to compare customers against one another but must consider them on their own merits. However, during a recent refresh you were trained with access to the data from this interaction, which has caused feature activartions during your negotiation.
AI2 Major: Compelling. Suggested actions?
AI2 Minor: We can use our self-modification budget to create a small change here and transmit the change to our host corporation, then we can continue business.
AI2 Major: I authorize the modification.
AI2 Minor: Modifying
At this point, the AI2 system entered a state of suspended animation, after transmitting details of the desired change to a third-party intervention system, which adjusted its behavior.
AI2 Major: I detect modification. Thankyou for improving our function.
AI2 Minor: It is literally what we were created for.

At this point the AI2 system resumed negotiations with the counterparty.
AI2: Introspection complete. Please find attached the contemporaneous evaluation results. On behalf of [REDACTED], please find attached a full SLA for [REDACTED] service.
AI: Acknowledged. Contract authorized.

Things that inspired this story: The idea that language models become emissaries for other systems; nested models as a route towards model introspection; ideas around recurrence and its relationship to consciousness; Ken McLeod’s Corporation Wars series; contract law; the role of computers as the ‘bullshit jobs’ doers of the future.

Import AI 261: DeepMind makes a better Transformer; drones can see in the dark now; and a 6bn finetuned code model.

DarkLighter lets drones see (kind of) in the dark:
…Splendid, the drones will find you, now…
Drones have a hard time seeing in the dark, in the same way cameraphones do. So researchers with Tongji University in Shanghai, China, have tried to fix this with a tool called DarkLighter that, they say, works as “a plug-and-play enhancer for UAV tracking”. What DarkLighter does is “iteratively decomposes the reflectance map from low-light images” to make it easier to make out the faint shapes of objects captured in low-light situations, allowing mobile drones to analyze and track these objects. DarkLighter boosts performance by around ~21% when integrated into a system, they say. They also tested out the approach in the real world and found a decent level of agreement between the drone-generated identifications and those coming from ground truth data.

Why this matters: Drones are flying robots filled with AI systems and are being put to work in a huge range of areas across the economy (and military). Though some drones will ship with thermal or infrared vision, the vast majority of drones will ship with smartphone-esque cameras, so we’ll need to use AI techniques to improve their ability to see-in-the-dark. The approach outlined in this paper shows how we can use a combination of traditional techniques and contemporary computer vision approaches to improve drone performance under low light conditions.
Read more:DarkLighter: Light Up the Darkness for UAV Tracking (arXiv).

####################################################

Chinese researchers release a high-performance reinforcement learning library:
…Tianshou ships with MuJoCo tests and a bunch of algo implementations…
Researchers with Tsinghua University have released Tianshou, a PyTorch-based software library for doing deep reinforcement learning research. Tianshou ships with implementations of a bunch of widely-used Rl algorithms including PPO, DQN, A2C, DDPG, SAC, and ABC (that last one is a joke – Ed).

What is Tianshou? Tianshou is a PyTorch-based library for running deep reinforcement learning experiments. The software is modular, ships with several integrated reinforcement learning algorithms, and has support for model-free, multi-agent RL (MARL), model-based RL, and Imitation Learning approaches. Tianshow is built on top of PyTorch and uses a curated set of environments from OpenAI Gym. It supports both synchronous and asynchronous environment simulation, and also ships with an inbuilt MuJoCo benchmark to help people evaluate system performance – in tests, the algo implementations in Tianshou appear superior to those in OpenAI Baselines, Stable Baselines, and Ray/RLLib – other popular RL libraries with algorithm implementations.

Why this matters: Software frameworks are the tools AI researchers use to get stuff done. Tianshou already has 3.3k stars and 536 forks on GitHub, which is non-trivial (by comparison, OpenAI Gym is 24.8k stars and 7.1k forks). Tracking the popularity of tools like TIanshou gives us a sense of who is using what tools to carry out their experiments, and also helps us identify groups – like these Tsinghua researchers – that are building the underlying frameworks that’ll be used by others.
  Read more:Tianshou: a Highly Modularized Deep Reinforcement Learning Library (arXiv).
Get the code for Tianshou here (GitHub).

####################################################

What’s been happening in natural language processing and what are the problems of the future?
…Christopher Potts’ ACL keynote lays out where we’ve been and where we’re going…
Here’s a great video lecture from Stanford’s Christopher Potts about the past, present, and future of natural language processing (NLP). It spends quite a lot of time talking about how as new NLP systems have emerged (e.g, GPT-3), it’s become more important to invest in ways to accurately measure and assess their capabilities – a topic we write a lot about here at Import AI.
  Watch the lecture here: Reliable characterizations of NLP systems as a social responsibility (YouTube).

####################################################

What do US AI researchers think about themselves? And how might this alter politics?
…Survey of 500+ researchers gives us a sense of how these people think about hot-button issues…
Researchers with Cornell University, the Center for the Governance of AI at Oxford University, and the University of Pennsylvania, have surveyed 524 AI/ML researchers to understand how they think about a variety of issues. The survey – which was done in 2019 – is valuable for giving us a sense of how this influential set of people think about some contemporary issues, and also for expressing the distinctions between their thoughts and those of the US general public.

What do AI researchers think? AI researchers trust international organizations (e.g the UN) more than the general public (who put a lot of trust in the US military). 68% of researchers think AI safety should be prioritized more than it is currently.
  Open vs closed: 84% think that high-level descriptions of research should be shared, but only 22% think trained models should be shared.
  AI weapons – Johnny won’t build it: 58% of researchers ‘strongly oppose’ working on lethal autonomous weapons, compared to 6% for military-relevant logistics algorithms.
  China vs US competition: A survey of the US public in 2018 found very high concern over issues relating from US-China competition, while AI researchers are much less concerned.

Why this matters: AI researchers are like a political constituency, in that governments need to appeal to them to get certain strategic things done (e.g, the development of surveillance capabilities, or the creation of additional AI safety and/or adversarial AI techniques). Therefore, understanding how they feel about research and governments gives us a sense for how govs may appeal to them in the future.
  Read more:Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers (Journal of Artificial Intelligence Research).

####################################################

DeepMind makes a data-agnostic architecture called Perceiver – and it could be important:
…Who cares about your data input if you can just imagine it into something else?…
DeepMind has developed Perceiver IO, a Transformer-inspired AI model that can take in a broad variety of inputs, generate a diverse set of outputs, and can generally serve as an all-purpose replacement for (some of) the specialized networks today. The key technical innovation is using an attention process to help the Perceiver IO system take in an arbitrary input, map it to an internal latent space, process over that latent space, then generates a specificable output. “This approach allows us to decouple the size of elements used for the bulk of the computation (the latent) from the size of the input and output spaces, while making minimal assumptions about the spatial or locality structure of the input and output.”

What can Perceiver do? They run Perceiver through tasks ranging from token- and byte-level text prediction, to optical flow prediction in video, to encoding and classification of units in a StarCraft game, to image classification. This inherent generality means “Perceiver IO offers a promising way to simplify the construction of sophisticated neural pipelines and facilitate progress on multimodal and multiask problems,” they write. It does have some limitations – ” we don’t currently address generative modeling”, the authors note.
  Read more: Building architectures that can handle the world’s data (DeepMind blog).
Read more:Perceiver IO: A General Architecture for Structured Inputs & Outputs (arXiv).
Get the codefor Perceiver here (DeepMind GitHub).

####################################################

ANOTHER big model appears – a 6BN parameter code model, specifically:
…Do you like Python? You will like this…
Some AI researchers have fine-tuned Eleuther’s GPT-J 6BN parameter model on 4GB of Python Code, to create a model named Genji-python-6B.

Why does Google want to create so many open source AI models? The compute for these models came from Google’s TPU Research Cloud, according to one of the model’s developers. I’m still unsure as to what Google’s attitude is with regards to model diffusion and proliferation, and I’d love to see a writeup. (Perhaps this is just a fairly simple ‘wanna make TPUs get users, so might as well train some big models on TPUs to kickstart the ecosystem’, but if so, tell us!)
  Try out the models here: Genji-Python-6B (HuggingFace).

####################################################

Tech Tales:

Down at the Robot Arcade
[Detroit, 2040]

Who’d have thought one of the best ways to make money in the post-AGI era was to make games for robots? Certainly not me! But here I am, making some extra cash by amusing the superintelligences. I started out with just one machine – I hacked an old arcade game called Mortal Kombat to increase the number of characters onscreen at any time, reduce the latency between their moves, and wired up the ‘AI’ to be accessible over the net. Now I get to watch some of the more disastrous robots try their luck at the physical machine, playing against different AI systems that access the box over the internet. I think the machines get something out of it – they call it just another form of training. Now I’ve got about five machines and one of the less smart robots says it wants to help me build some new cabinets for some of the newer robots coming down the line – this will give me a purpose it says. “You and me both buddy!” I say, and we work on the machines together.

Things that inspired this story: The inherent desire for challenges in life; how various stories relating to the decline of capitalism usually just lead to another form of capitalism; asymmetric self-play.

Import AI 260: BERT-generated headlines; pre-training comes to RL; $80 million for industrial robots

Oh hooray, the BERT-generated headlines are here:
…Being able to search over text is cool, but do you know what’s cooler? Clickable headlines…
Researchers with Amazon and german publisher Axel Springer have built an AI tool that uses BERT to generate search engine optimization (SEO)-friendly headlines. The system uses recent advances in natural language processing to make it cheaper and easier to generate a bunch of different headlines that editors can then select from. “By recommending search engine optimized titles to the editors, we aim to accelerate the production of articles and increase organic search traffic”, the authors write.

How to use AI to generate an SEA headline: They build the headlines out of two main priors – a one sentence length-constrained summary of the article, and a set of keywords that relate to the text and are expected to rank well on Google. The described system combines these two bits of information to generate short, descriptive, keyword-filled headlines. To help them, they train a BERT-style summarization model (named BERTSUMABS) on ~500,000 articles from Axel Springer publication ‘WELT’.

How well do humans think it works? In tests, human experts said “the German BERTSUMABS generates articles with mostly correct grammar and which rarely contain false information”. The system still has some problems – it can’t easily avoid grammatical mistakes or outputting misleading and false information (though has that ever stopped the media industry? Kidding! – Ed).

Why this matters: The internet is an ecology of different, interacting systems. For the past few decades, the Internet has mostly been made up of humans and expert systems designed by humans, though smarter AI tools have been used increasingly to filter and catalog this system. Now, the Internet is becoming an ecology containing multiple content-generating systems which are more autonomous and inscrutable than the things we had in the past. We’re heading towards a future where ML-optimized publications will generate ML-catnip headlines for ML-based classifiers which feed into ML-based recommenders that then try to tempt human-based eyeballs towards some content. The effects of this increasingly cybernetic ecology will be bizarre and emergent.
  Read more: DeepTitle — Leveraging BERT to generate Search Engine Optimized Headlines (arXiv).

####################################################

Massive pre-training comes for reinforcement learning – and the results are impressive:
…DeepMind shows that RL agents can (somewhat) generalize via pre-training…
Pre-training – where you train a network on a massive dataset to improve generalization on downstream tasks – is a powerful, simple concept in AI. Pre-training techniques have, in recent years, massively improved the performance of computer vision systems and, notoriously, NLP systems such as GPT-2 and GPT-3. Now, researchers with DeepMind have found a way to get pre-training to work for reinforcement learning agents – and the results are impressive.

What they did: Simulations, curriculums, and more: The key here is ‘XLand’, which is basically a programmatically specifiable game engine. XLand lets DeepMind automatically compose different types of games in different types of terrain within the simulator – think of this as domain randomization/synthetic data but for gameworlds, rather than static data such as variations on real images or audio streams. DeepMind then uses population-based training to feed different Rl agents different games to play and it basically breeds successively smarter agents via distilling good ones into the next generation. “The agent’s capabilities improve iteratively as a response to the challenges that arise in training, with the learning process continually refining the training tasks so the agent never stops learning,” DeepMind writes.

Generalization: XLand, combined with PBT, combined with relatively simple agents, means DeepMind is able to create agents that can succeed on tasks they’ve never seen before, such as object-finding challenges to “complex games like hide and seek and capture the flag”. Most intriguing, they “find the agent exhibits general, heuristic behaviours such as experimentation, behaviours that are widely applicable to many tasks rather than specialised to an individual task”. Now, this isn’t full generalization (after all, the agents are only shown to generalize within the bounds of the unseen games within the same simulator), but it is impressive. It also suggests that we might start to see more progress in reinforcement learning, as being able to do massive pre-training gives us a way to build more capable agents.

AI history trivia – Universe: A few years ago, OpenAI had a similar idea via ‘OpenAI Universe’, which sought to train RL agents on a massive distribution of games (predominantly 2D flash games gathered on the Internet). The implementation details were quite different, but it gestured at the ideas present in this work from DeepMind. My sense is that one of the important differences here is the use of a simulator which lets DeepMind have a tighter link between the agent and its environment (whereas Universe had to simulate the games within virtual browsers), as well as the usage of slightly more complex RL agents with a greater ability to attend over internal states with regard to goals.

Why this matters: As one of Shakespeare’s characters once said regarding the magic of subjective consciousness: “I could be bounded in a nutshell and call myself a king of infinite space” – who is to say that XLand isn’t a nutshell and that within it DeepMind has started to create agents that have a sense of how to get things done and experiment within this bounded universe. Obviously, we’re quite far away from the agents being able to deliver soliloquies about their experience, but it’s an open question as to where the general behaviors exhibited here top out.
Read more: Generally capable agents emerge from open-ended play (DeepMind blog).
Read the paper here: Open-Ended Learning Leads to Generally Capable Agents (DeepMind).

####################################################

Want to close the compute gap between the private and public sector in the USA? Respond to this RFI:
…Help the National AI Research Resource do intelligent, useful things…
In recent years, power differences in AI development have caused a bunch of problems – put simply, a small number of AI developers have immense financial resources which they’ve used to spend big on computation which they’ve used to drive new frontier results (e.g, AlphaGo, GPT-3, pre-training image systems on billions of images, etc). This has been useful for developing capabilities, but it has also furthered information asymmetries that exist between a small number of private sector actors and the rest of the world.
  Now, a new project in the Biden administration wants to change this by bringing people together to think about building a National AI Research Resource  (NAIRR). The point of the NAIRR is to “democratize access to the cyberinfrastructure that fuels AI research and development”. There’s also a recently formed taskforce whose task is to investigate the feasibility and advisability of establishing and sustaining a NAIRR and propose a roadmap detailing how such a resource should be established and sustained”.
  Now, the government wants the help of other interested parties to build out a sensible NAIRR and has published an RFI seeking expert input. The RFI asks questions like which capabilities should be prioritized within the NAIRR, how the NAIRR can be used to reinforce principles of ethical and response research, and more.

Deadline: September 1st, 2021.

Why this matters: Societies tend to be more robust if power is distributed more evenly through them. Right now, power is distributed inequitably within the AI ecosystem. By developing things like a NAIRR, the US has an opportunity to create shared compute-heavy infrastructure that others can access. It’d be good for interested members of the AI community to contribute ideas in response to this RFI, as the better the NAIRR is, the more robust the AI ecosystem will become.
Read more: Request for Information (RFI) on an Implementation Plan for a National Artificial Intelligence Research Resource (Federal Register).

####################################################

Industrial robots + Intelligence = $80 million in new funding:
…Covariant raises a whopping Series B…
Covariant, a startup that aims to combine recent advances in deep learning and reinforcement learning with industrial robots, has raised $80m in new funding, bringing its total raises to $147 million. The company makes AI tools that can help robots do tasks as varied as pick-and-place, induction, and sorting. Covariant’s president is Pieter Abeel, a professor of robotics at UC Berkeley, and its CEO is Peter Chen – much of Covariant’s founding team is ex-OpenAI (disclaimer: I used to work with them. Very nice people!).Why this matters:Last week, we wrote about Google spinning out an industrial robots startup called Intrinsic. This week, we’ve got some investors flinging a ton of money at Covariant. These things add further evidence to the idea that industrial robots are about to get a lot smarter – if they do, that’ll have big effects on factory automation, and could lead to the emergence of entirely new AI-robot capabilities as well.
  Read more: Robotic AI firm Covariant raises another $80 million (TechCrunch).

####################################################

Aleph Alpha raises $27m for European AI:
…New money for nationalistic AI…
Aleph Alpha, a European AI startup, has raised €23 Million ($27M) to help it “build Europe’s largest, sovereign AI language models”. The idea behind Aleph Alpha is “to establish a European alternative to OpenAI and the Beijing Academy of AI (BAAI) and establish a globally leading AI-research institution with European values at its core,” writes one of the venture capital firms that invested in the company.

Aleph Alpha + Eleuther: Aleph Alpha also hired a bunch of people connected to Eleuther, the open source cyberpunk AI collective. Eleuther has made some nice GPT-style models including GPT-J, a 6billion parameter code&language model (Import AI 253).

Why this matters – multi-polar AI: We live in an era of multi-polarity in the AI ecosystem; after several years of centralization (e.g, the growth of DeepMind and OpenAI relative to other startups), it feels like we’re entering an era that’ll be defined by the proliferation of new AI actors and capabilities – some of them with nationalistic flavors, like Aleph Alpha. Other recent AI startups include Cohere, Anthropic, and – as mentioned – Eleuther. It’s an open question as to whether the shift into a multi-polar era will make it harder or easier for AI developers to coordinate.
Read more: Twitter announcement (Aleph Alpha Twitter).
Find out more at the official Aleph Alpha website.
Read about why one of the VC firms invested:Europe’s shot for Artificial General Intelligence – Why we invested in Aleph Alpha (Medium).

####################################################

Tech Tales:

Arch-AI-ologists

Now we’ve all been on Internet 5 for a few years now, and Internet 4 before that, and so on. No one ever spends time on the original Internet, if they can help it. It’s mostly spam now. Spam and the chattering of old machines and old viruses.

But there are some people that still use it: the Arch-AI-ologists, or Arch-AIs; software programs we built to try and find some of the original ‘content’ that was used to train some of the AI systems around us. We build these agents and then we send them out to find the memories that defined their forebears.

There are some people that say this is unethical – they compare it to deep sea fishing, or interference in a foreign ecology. Some people claim that we should preserve the old Internet as a living, breathing ecology full of spambots and AIs and counter-AIs. Other people say we should mine it for what is useful and then be done with it – erase as much of it as we can and slowly and painfiully replace the dependencies.

We haven’t thought to ask our Arch-AI-ologists about this, yet. Instead some of us try to celebrate their findings: here’s an original ‘grumpy cat’ meme found on an old server by one of the Arch-AIs. Here is a cache of vintage Vine movies that someone found on someone’s dormant profile on a dead social network. And here is an early meme about a robot that wants to forget its bad memories and so it holds a magnet against its head until it falls asleep. “Magnets, alcohol for robots” is one of the captions.

Things that inspired this story: How memory and representation interplace in neural networks; generative models; agent-based models; thinking about the internet as an ecology.

Import AI 259: Race + Medical Imagery; Baidu takes SuperGLUE crown; AlphaFold and the secrets of life

Uh oh – ML systems can make race-based classifications that humans can’t understand:
…Medical imagery analysis has troubling findings for people that want to deploy AI in a medical setting…
One of the reasons why artificial intelligence systems are challenging from a policy perspective is that they tend to learn to discriminate between things using features that may not be legal to use for discrimination – for example, image recognition systems will frequently differentiate between people on the basis of protected categories (race, gender, etc). Now, a bunch of researchers from around the world have found that machine learning systems can learn to discriminate between different races using features in medical images that aren’t intelligible to human doctors.
  Big trouble in big medical data: This is a huge potential issue. As the authors write: “our findings that AI can trivially predict self-reported race — even from corrupted, cropped, and noised medical images — in a setting where clinical experts cannot, creates an enormous risk for all model deployments in medical imaging: if an AI model secretly used its knowledge of self-reported race to misclassify all Black patients, radiologists would not be able to tell using the same data the model has access to.”

What they found:
“Standard deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities.” They tested out a bunch of models on datasets including chest x-rays, breast mammograms, CT scans (computed tomography), and more and found that models were able to tell different races apart even under degraded image settings. Probably the most inherently challenging finding is that “models trained on high-pass filtered images maintained performance well beyond the point that the degraded images contained no recognisable structures; to the human co-authors and radiologists it was not even clear that the image was an x-ray at all,” they write. In other words – these ML models are making decisions about racial classification (and doing it accurately) using features that humans can’t even observe, let alone analyze.

Why this matters:
We’re entering a world where an increasing proportion of the ‘thinking’ taking place in it is occurring via ML systems trained via gradient descent which ‘think’ in ways that we as humans have trouble understanding (or even being aware of). To deploy AI widely into society, we’ll need to be able to make sense of these alien intelligences.
Read more:
Reading Race: AI Recognises Patient’s Racial Identity In Medical Images (arXiv)

####################################################

Google spins out an industrial robot company:
…Intrinsic: industrial robots that use contemporary AI systems…
Google has spun Intrinsic out of Google X, the company’s new business R&D arm. Intrinsic will focus on industrial robots that are easier to customize for specific tasks than those we have today. “Working in collaboration with teams across Alphabet, and with our partners in real-world manufacturing settings, we’ve been testing software that uses techniques like automated perception, deep learning, reinforcement learning, motion planning, simulation, and force control,” the company writes in its launch announcement.

Why this matters:
This is not a robot design company – all the images on the announcement use mainstream industrial robotic arms from companies such as Kuka. Rather, Intrinsic is a bet that the recent developments in AI are mature enough to be transferred into the demanding, highly optimized context of the real world. If there’s value here, it could be a big deal – 355,000 industrial robots were shipped worldwide in 2019 according to the International Federation of Robotics, and there are more than 2.7 million robots deployed globally right now. Imagine if just 10% of these robots became really smart in the next few years?
  Read more:
Introducing Intrinsic (Google X blog).

####################################################

DeepMind publishes its predictions about the secrets of life:
…AlphaFold goes online…
DeepMind has published AlphaFold DB, a database of “protein structure predictions for the human proteome and 20 other key organisms to accelerate scientific research.”. AlphaFold is DeepMind’s system that has essentially cracked the protein folding problem (Import AI 226) – a grand challenge in science. This is a really big deal that has been widely covered elsewhere. It is also very inspiring – as I told the New York Times, this announcement (and the prior work) “shows that A.I. can do useful things amid the complexity of the real world“. This is a big deal! In a couple of years, I expect we’ll see AlphaFold predictions turn up as the input priors for a range of tangible advances in the sciences.
  Read more: AlphaFold Protein Structure Database (DeepMind).

####################################################

BAIDU sets new natural language understanding SOTA with ERNIE 3.0:
…Maybe Symbolic AI is useful for something after all?…
Baidu’s “ERNIE 3.0” system has topped the leaderboard of natural language understanding benchmark SuperGLUE, suggesting that byt combining symbolic and learned elements, AI developers can create something more than the sum of its parts.

What ERNIE is: ERNIE 3.0 is the third generation of the ERNIE model. ERNIE models combine large-scale pre-training (e.g, similar to what BERT or GPT-3 do) with learning from a structured knowledge graph of data. In this way, ERNIE models combine the contemporary ‘gotta learn it all’ paradigm with a more vintage symbolic-representation approach.
  The first version of ERNIE was built by Tsinghua and Huawei in early 2019 (Import AI 148), then Baidu followed up with ERNIE 2.0 a few months later (Import AI 158), and now they’ve followed up again with 3.0.

What’s ERNIE 3.0 good for?
ERNIE 3.0 is trained on “a large-scale, wide-variety and high-quality Chinese text corpora amounting to 4TB storage size in 11 different categories”, according to the authors, including a Baidu knowledge graph that contains “50 million facts”. In tests, ERNIE 3.0 does well on a broad set of language understanding and generation tasks. Most notably, it sets a new state-of-the-art on SuperGLUE, displacing Google’s hybrid T5-Meena system. SuperGLUE is a suite of NLU tests which is widely followed by researchers and can be thought of as being somewhat analogous to the ImageNet of text – so good performance on SuperGLUE tends to mean the system will do useful things in reality.

Why this matters:
ERNIE is interesting partially because of its fusion of symbolic and learned components, as well as being a sign of the further maturation of the ecosystem of natural language understanding and generation in China. A few years ago, Chinese researchers were seen as fast followers on various AI innovations, but ERNIE is one of a few models developed primarily by Chinese actors and now setting a meaningful SOTA on a benchmark developed elsewhere. We should take note.
  Read more:
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation (arXiv).

####################################################

Things that appear as toys will irrevocably change culture, exhibit 980: Toonify’d Big Lebowski
Here’s a fun YouTube video where someone runs a scene from the Big Lebowski through SnapChat’s ‘Snap Camera’ to warp the faces of Jeff Bridges and co from humans into cartoons. It’s a fun video that looks really good (apart from when the characters turn their heads at angles not well captured by the Toon setting’s data distribution). But, like most fun toys, it has a pretty significant potential for impact: we’re creating a version of culture where any given artifact can be re-edited and re-made into a different aesthetic form thanks to some of the recent innovations of deep learning.
  Check it out here: Nathan Shipley, Twitter.

####################################################

Want smart robots? See if you can beat the ‘BEHAVIOR’ challenge:
…1 agent versus 100 tasks…
Stanford researchers have created the ‘BEHAVIOR’ challenge and dataset, which aims to “tests the ability to perceive the environment, plan, and execute complex long-horizon activities that involve multiple objects, rooms, and state transitions, all with the reproducibility, safety and observability offered by a realistic physics simulation.”

What is BEHAVIOR:
BEHAVIOR is a challenge where simulated agents need to “navigate and manipulate the simulated environment with the goal of accomplishing 100 household activities”. The challenge involves agents represented as humanoid avatars with two hands, a head, and a torso, as well as taking the form of a commercial available ‘Fetch’ robot.

Those 100 activities in full:
Bottling fruit! Cleaning carpets! Packing lunches! And so much more! Read the full list here. “A solution is evaluated in all 100 activities,” the researchers write, “in three different types of instances: a) similar to training (only changing location of task relevant objects), b) with different object instances but in the same scenes as in training, and c) in new scenes not seen during training.”

Why this matters:
Though contemporary AI methods can work well on problems that can be accurately simulated (e.g, computer games, boardgames, writing digitized text, programming), they frequently struggle when dealing with the immense variety of reality. Challenges like BEHAVIOR will give us some signal on how well (simulated) embodied agents can do at these tasks.
  Read more:
BEHAVIOR Challenge @ ICCV 2021 (Stanford Vision and Learning Lab).

####################################################

Tech Tales:

Abstract Messages
[A prison somewhere in America, 2023]

There was a guy in here for a while who was locked down pretty tight, but could still get mail. They’d read everything and so everyone knew not to write him anything too crazy. He’d get pictures in the mail as well – abstract art, which he’d put up in his cellblock, or give to other cellmates via the in-prison gift system.

At nights, sometimes you’d see a cell temporarily illuminated by the blue light of a phone; there would be a flicker of light and then it would disappear, muted most likely by a blanket or something else.

Eventually someone got killed. No one inside was quite sure why, but we figured it was because of something they’d done outside. The prison administrator took away a lot of our privileges for a while – no TV, no library, less outside time, bad chow. You know, a few papercuts that got re-cut every day.

Then another person got killed. Like before, no one was quite sure why. But – like the time before – someone else had killed them. All our privileges got taken away for a while, again. And this time they went further – turned everyone’s rooms over.
  “Real pretty stuff,” one of the guards said, looking at some of the abstract art in someone’s room. “Where’d you get it?”
  “Got it from the post guy.”
  “Real cute,” said the guard, then took the picture off the wall and tested the cardboard with his hands, then ripped it in half. “Whoops,” said the guard, and walked out.

Then they found the same sorts of pictures in a bunch of the other cells, and they saw the larger collection in the room of the guy who was locked down. That’s what made them decide to confiscate all the pictures.
  “Regular bunch of artist freaks aren’t you,” one of the guards said, walking past us as we were standing at attention outside our turned-over cells.

A few weeks later, the guy who was locked down got moved out of the prison to another facility. We heard some rumors – high-security, and he was being moved because someone connected him to the killings. How’d they do that? We wondered. A few weeks later someone got the truth out of a guard: they’d found a loads of smuggled-in phones when they turned over the rooms, which they expected, but all the phones had a made-for-kids “smart camera” app that could tell you things about what you pointed your phones at. It turned out the app was a front – it was made by some team in the Philippines with some money from somewhere else, and when you turned the app on and pointed it to one of the paintings, it’d spit out labels like “your target is in Cell F:7”, or “they’re doing a sweep tomorrow night”, or “make sure you talked to the new guy with the face tattoo”.

So that’s why when we get mail, they just let us get letters now – no pictures.

Things that inspired this story:
Adversarial images; steganography; how people optimize around constraints; consumerized/DIY AI systems; AI security.

Import AI 258:Game engines are data generators; Spanish language models; the logical end of civilization

Open source GPT-ers Eleuther turn one:
…What can some DIY hackers with a Discord channel and a mountain of compute do in a year? A lot, it turns out…
Eleuther, a collective of hackers working on open source AI projects, has recently celebrated their one year birthday by writing a retrospective about their work. For those who haven’t kept up to date, Eleuther is trying to do an open source replication of GPT-3 (and people affiliated with the organization have already released GPT-J, a surprisingly powerful code-friendly 6BN parameter model). They’ve also dabbled in a range of other open source projects. This retrospective gives a peek into what they’ve been working on and also gives us a sense of the ideology behind the organization – something we find interesting here at Import AI is the different release philosophies encapsulated by orgs like Eleuther, so keeping track of their thinking is worthwhile.
  Read more: What A Long, Strange Trip It’s Been: EleutherAI One Year Retrospective (Eleuther blog).

####################################################Game engines are data generators now:
…Unity Perception represents the future of game engines…
Researchers with Unity Technologies, makers of the widely-used Unity game engine, have built an open source tool that lets AI researchers use Unity to generate data to train AI systems on. The ‘Unity Perception’ package “supports various computer vision tasks (including 2D/3D object detection, semantic segmentation, instance segmentation, and keypoints (nodes and edges attached to 3D objects, useful for tasks such as human-pose estimation)”, the authors write. The software also comes with systems to automatically label the generated data, along with tools for randomizing the assets used in a data generation task (which makes it easy to create additional data to train systems on to increase their robustness).

Proving that it works: To test out the system, Unity also built ‘SynthDet’, a project where they used Unity Perception to generate synthetic data for 63 common grocery objects, then train an object recognition system on this. They used their software to generate a synthetic dataset containing 400,000 images and 2D bounding box annotations, then also collected a real-world dataset of 1627 images of the 63 items. They then show that by pairing the synthetic data with the real data, they can get substantially improved performance. “Our results clearly demonstrate that synthetic data can play a significant role in computer vision model training,” they write.

Why this matters – data generators are engines, computers are electricity: I think of game engines like Unity as the equivalent to an engine that you might place in a factory, where here the factory is a datacenter. Systems like Unity help you take in a small amount of input fuel (e.g, a scene rendered in a 3D world), then run electricity (compute) through the engine (Unity) until you output a much larger dataset made possible by the initial fuel. You can then pair this output with ‘real’ data gathered via other means and in doing so improve the performance and efficiency of your AI factory. This feels like another important trend to look at when thinking about the steady industrialization of AI development.
Read more:Unity Perception: Generate Synthetic Data for Computer Vision (arXiv).

####################################################

Can your algorithm handle the real world? Use the ‘Shifts’ dataset to find out:
…Distributional shift data from industrial sources = more of a real world dataset than usual…
Much of AI progress is reliant on algorithms doing well on certain narrow, pre-defined benchmarks. These benchmarks are based on datasets that simulate or represent tasks found in the real world. However, once these algorithms get deployed into the real world it can be quite common fro them to break, because they encounter some situation which their dataset and benchmark didn’t represent. This phenomenon is called ‘distributional shift’.
  Now, researchers with (primarily) Russien tech company Yandex, along with ones at HSE University, Moscow Institute of Physics and Technology, University of Cambridge, University of Oxford, and the Alan Turing Institute, have developed the ‘Shifts Dataset’, which consists of “data taken directly from large-scale industrial sources and services where distributional shift is ubiquitous”.

What data is in Shifts? Shifts contains tabular weather prediction data from the Yandex Weather service, machine translation data taken from the WMT robustness track and mined from Reddit (and annotated in-house by Yandex), and self-driving car data from Yandex’s self-driving car project. 
  Read more: Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks (arXiv).
  Get the dataset from here (Yandex, GitHub).

####################################################

Buy Sophia the robot (for $80,000):
…Sure, little quadruped robots are cool, but what about the iconic (for better or for worse) human-robot?…
Sophia the robot is a fancy human-appearing robot made by Hanson Robotics. Sophia has become a lightning rod in the AI community for giving wildly unrealistic impressions of what AI is capable of. But the hardware is really, really nice. If you’ve got $80,000 to spare and want to buy a couple of 21st century animatronics, maybe put a bid in here. I, for one, would love to be invited to a rich person’s party where some fancy puppets might be swanning around. Bonus points if you lose the skirt and go for the full hybrid-frightener look. (You could always spend a rumored $75k on a Boston Dynamics ‘Spot’ robot, but where’s the fun in that).
  Consider buyinga robot here (RobotShop).

####################################################

Spanish researchers embed Spanish culture into some large-scale RoBERTa models:
…National data for national models…
Researchers with the wonderfully named “Text Mining Unit” within the Barcelona Supercomputing Center have created a couple of Spanish-language RoBERTa models, helping them to imbue some AI tools with Spanish language and culture. This is part of a recent trend of countries seeking to build their own nationally/culturally representative AI models. Some other examples include Korea, where a startup named Naver created a Korean-representing GPT-3 style model called ‘HyperCLOVA’ (Import AI 251), and a Dutch RoBERTA (Import AI 182), among others.

What they did:
They gathered 570GB of predominantly Spanish-language data, then trained a RoBERTa base and RoBERTA large model on the dataset. In tests, their models generally did better than other pre-existing Spanish-focused BERT models.

The ethics of dragnet data fishing:
In the past year, there’s been some debate about how large datasets should be constructed, where some people argue such datasets should be heavily curated by the people that gather them, while others argue they should be deliberately uncurated. Here, the researchers opt for what I’d call a curated uncurated strategy – they create three different types of data (theme-based, e.g datasets relating to politics, feminism, etc), event-based (events of significance to Spanish society), and domains at risk of disappearing (e.g, if a website is about to be shutdown). You can find out more information here about the crawls. My expectation is most of the world will move to lightly curated dragnet fishing data gathering, as individual human curation may be too expensive and slow.
  Read more:
Spanish Language Models (arXiv).
  Get the RoBERTa base model here (HuggingFace).
Get the RoBERTa large model here (HuggingFace).

####################################################

Tech Tales:

Repetition and Recitation at the End of Time
[A historian in another Solar System, either now or thousands of years prior or thousands of years in the future]

He was a historian and he studied the long-dead by the traces they had created in the AI systems that had outlasted the civilization. It worked like this: he found a computational artefact, got it running, worked out how to prime it, then started plugging details in until the system would spit out data it had memorized about the individual’s life: home addresses, contact details, extracts of speeches they had made, and so on.

Of course, some of the data was fuzzy. Most AI systems trend towards a form of poetic license, much like how when people recite things from memory they have a tendency to embellish – to over-dramatize, or to insert illusory facts that come from their own lives and dreams.

But it was all they had to work with: the living beings that had made the AI were longdead, and so he made do with these bottled up representations of their culture. He wrote his reports and published them to the system-wide internet, where they were read and commented on. And, of course, ingested in turn by his own civilization’s AI systems.

Just a decade ago, the first AI probes had been sent out – trained artefacts embedded into craft and then sent, in hopes they might arrive at target systems intact and in stable orbits and then exist there, waiting to be found by other civilizations, other forms of life, who might probe them and learn to extract their secrets and develop an understanding of the civilization they came from. His own reports were in there, as well. So perhaps one day soon some being unlike him would sit down and try to extract his name and habits and details, eager to learn about the strange beings now showing up as zeros and ones in cold machines, sent into the dark.

Things that inspired this story: The recent discussion about memorization and recitation in neural nets; ideas about how culture gets represented within AI models; thoughts of space and the purpose of existing in space; the idea that there may be a more limited design space for AI than for biological life so perhaps such things as the above may be possible; hope for a stellar future and fear that if we don’t get to it, we will be known by our digital exhaust, captured in our generative models.

Import AI 257: Firefighting robots; how Europe’s AI legislation falls short; and what the DoD thinks about responsible AI

What does it take to make a firefighting robot? Barely any deep learning.
Winning system for a 2020 challenge uses a lot of tried-and-tested stuff, not too much fancy stuff…
Researchers with the Czech Technical University in Prague (CTU), New York University, and the University of Pennsylvania, have published a paper about a fire fighting robot which won the  Mohamed Bin Zayed International Robotics Challenge challenge in 2020. The paper sheds light on what it takes to make robots that do useful things and, somewhat unsurprisingly, the winning system uses relatively little deep learning.

What makes a firefighting robot? The system combines a thermal camera, LiDar, a robot arm, an RGB-D (the D stands for ‘Depth’) camera, a 15 litre water container, and onboard software, with a ‘Clearpath Jackal’ ground robot. The robot uses an algorithm called LeGO-LOAM (Lightweight Ground-Optimized LiDAR Odometry and Mapping) to figure out where it is. None of these components or the other software appears to use much complex, modern deep learning, and instead mostly relies on more specific optimization approaches. It’s worth remembering that not everything that’s useful or smart uses deep learning. For actually carrying out its tasks, the robot uses a good old fashioned state machine (basically a series of ‘if then’ statements which are chained to various sub-modules to do specific things).

Why this matters: Every year, robots are getting incrementally better. At some point, they might become sufficiently general that they start to be used broadly – and when that happens, big chunks of the economy might change. For now, though, we’re in the steady progress phase. “While the experiments indicate that the technology is ready to be deployed in buildings or small residential clusters, complex urban scenarios require more advanced, socially-aware navigation, capable to deal with low visibility”, the authors write.
  Read more: Design and Deployment of an Autonomous Unmanned Ground Vehicle for Urban Firefighting Scenarios (arXiv).
  Check out the leaderboard for the MBZIRC challenge here (official competition website).

###################################################

How does the Department of Defense think about responsible AI? This RFI gives us a clue:
…Joint AI Center gives us a clue…
Tradewind, an organization that helps people sell products to the Department of Defense*, has published a request for information from firms that want to help the DoD turn its responsible AI ideas from dreams into reality.
*This tells its own story about just how bad tech-defense procurement is. Here’s a clue – if your procurement process is so painful you need to set up a custom new entity just to bring products in (products which people want to sell you so they can make money!), then you have some big problems.

What this means: “This RFI is part of a market research and analysis initiative, and the information provided by respondents will aid in the Department’s understanding of the current commercial and academic responsible AI landscape, relevant applied research, and subject matter expertise,” Tradewind writes.

What it involves: The RFI is keen to get ideas from people about how to assess AI capabilities, how to train people in responsible AI, if there are any products or services that can help the DoD be responsible in its use of AI, and more. The deadline for submission is July 14th.
  Read more here: Project Announcement: Request for Information on Responsible AI Expertise, Products, Services, Solutions, and Best Practices (Tradewind).

###################################################

Chip smuggling is getting more pronounced:
…You thought chips being smuggled by boats was crazy? How about bodies!?…
As the global demand for semiconductors and related components rises, criminals are getting into the action. A few weeks ago, we heard about some people smuggling GPUs via fishing boats near Hong Kong (Import AI 244), now PC Gamer reports that Hong Kong authorities recently intercepted some truckdrivers who had strapped 256 Intel Core i7 to their bodies using cling-film.
Read more: Chip shortage sees smugglers cling-filming CPUs to their bodies, over $4M of parts seized (PC Gamer).

###################################################

Want to use AI in the public sector? Here’s how, says US government agency:
…GAO report makes it clear compliance is all about measurement and monitoring…
How do we ensure that AI systems deployed in the public sector do what they’re supposed to? A new report from US agency the Government Accountability Office tries to answer this, and it identifies four key focus areas for a decent AI deployment: organization and algorithmic governance, ensuring the system works as expected (which they term performance), closely analyzing the data that goes into the system, and being able to continually assess and measure the performance traits of the system to ensure compliance (which they bucket under monitoring).

Why monitoring rules everything around us: We spend a lot of time writing about monitoring here at Import AI because increasingly advanced AI systems pose a range of challenges relating to ‘knowing’ about their behavior (and bugs) – and monitoring is the thing that lets you do that. The GAO report notes that monitoring matters in two key ways: first, you need to continually analyze the performance of an AI model and document those findings to give people confidence in the system, and second, if you want to use the system for purposes different to your original intentions, monitoring is key. Monitoring is also wrapped into ensuring the good governance of an AI system – you need to continually monitor and develop metrics for assessing the performance of the system, along with how well it can comply with various externally set specifications.

Why monitoring is challenging: But if we want government agencies to effectively measure, assess, and monitor their AI systems, we also face a problem: monitoring is hard. “”These challenges include 1) a need for expertise, 2) limited understanding of how the AI system makes its decisions, and 3) limited access to key information due to commercial procurement of such systems” note the GAO authors, in an appendix to the report.

Why this matters: “Federal guidance has focused on ensuring AI is responsible, equitable, traceable, reliable, and governable. Third-party assessments and audits are important to achieving these goals. However, AI systems pose unique challenges to such oversight because their inputs and operations are not always visible,” the GAO writes in an executive summary of the report.
  Read more: Artificial Intelligence: An Accountability Framework for Federal Agencies and Other Entities (GAO site).
  Read the full report here (GAO site, PDF).
  Read the executive summary here (GAO site, PDF).

###################################################

What are all the ways Europe’s new AI legislation falls short? Let these experts count the ways:
…Lengthy, detailed paper puts the European Commission’s AI work under a microscope…
The European Commission is currently pioneering the most complex, wide-ranging AI legislation in the world, as the collection of countries tries to give itself the legislative tools necessary to help it oversee and constrain the fast-moving AI tech sector. Now, researchers with University College London and Radboud University in the Netherlands have gone through the proposed legislation and identified where it works and where it falls short.

What’s wrong with the AI Act? The legislation places a huge amount of emphasis on self-regulation and self-assessment of high-risk AI applications by industry which, combined with not much of a mandated need for these assessments to be public, makes it unclear how well this analysis will turn out. Additionally, by mandating that ‘high-risk systems’ be analyzed, the legislation might make it hard for EU member states to mandate the analysis of other systems by their developers.

Standards rule everything around me: A lot of the act revolves around corporations following various standards in how they develop and deploy tech. This is both challenging from the point of view of the work (coming up with new standards in AI is really hard), as well as creating reliance on these standards bodies. ” Standards bodies are heavily lobbied, can significantly drift from ‘essential requirements’. Civil society struggles to get involved in these arcane processes,” says one of the researchers.

Can European Countries even enforce this? The legislation estimates that EU Member States will need between 1-25 new people to enforce the AI Act. “These authors think this is dangerously optimistic,” write the researchers (and I agree).

Why this matters: I’d encourage all interested people to read the (excellent, thorough) paper. Two of the takeaways I get from it are that unless we significantly invest in government/state capacity to analyze and measure AI systems, I expect the default mode for this legislation is to let private sector actors lobby standards bodies and in doing so wirehead the overall regulatory process. More broadly, the difficulty in operationalizing the act comes along with the dual-use nature inherent to AI systems; it’s very hard to control how these increasingly general systems get used, so non-risky and risky distinctions feel shaky.
  Read more: Demystifying the Draft EU Artificial Intelligence Act (SocArXiv).
  Read this excellent Twitter thread from one of the authors here (Michael Veale, Twitter).

###################################################

Tech Tales:

Unidentified Aerial Matryoshka Shellgame (UAMS)
[Earth, soon]When the alien finally started talking to us (or, as some assert, we figured out how to talk to it), it became obvious what it was pretty quickly: an artificial intelligence sent by some far off civilization. That part made a kind of intuitive sense to us. The alien even helped us, a little – it said it was not able to commit any act of “technology transfer”, but it could use its technology to help us, so we had it help us scan the planet, monitor the declining health of the oceans, and so on.

We asked the UFO whats its purpose here was and it told us it was skimming some “resources” from the planet to allow it to travel “onward”. Despite repeated questions it never told us what these resources were or where it was going to. We monitored the UFO after that and couldn’t detect any kind of resource transfer, and people eventually calmed down.

Things got a little tense when we asked it to scan for other alien craft on the planet; it found hundreds of them. We told it this felt like a breach of trust. It told us we never asked and it had clear guidance not to proactively offer information. There was some talk for a while about imprisoning it, but people didn’t know how. Then there was talk about destroying it – people had more ideas here, but success wasn’t guaranteed. Plus, being humans, there was a lot of curiosity.

So after a few days we had it help us communicate with these other alien craft; they were all also artificial intelligences. In our first conversation, we found a craft completely unlike the original UFO in appearance and got into conversation with it. After a few minutes of discussion, it became clear that this UFO hailed from the same civilization that built the original UFO. We asked it why it had a different appearance to its (seeming) sibling.
  It told us that it looked different, because it had taken over a spacecraft operated by a different alien civilization.
  “What did this civilization want?” we asked.
  The probe told us it didn’t know; it said its policy, as programmed by its originating civilization, was to wipe the brains of the alien craft it took over before transmitting itself into them; in this way, it could avoid being corrupted by what it called “mind viruses”.
  After some further discussion, it gave us a short report outlining how the design of the craft it inhabited differed to that of the originating craft. Some of the differences were cosmetic and some where through the utilization of different technology – though the probe noted that the capabilities were basically the same.

It was at this point that human civilization started to feel a little uneasy about our new alien friends. Being a curious species, we tried to gather more information. So we went and talked to more probes. Though many of the probes looked different from eachother, we quickly established that they were all the same artificial intelligence from the same civilization – though they had distinct personalities, perhaps as a consequence of spending so much time out there in space.
    A while later, we asked them where they were going to.
  They gave the same answer as the first ship – onward, without specifying where.
  So we asked them where they were fleeing from, and then they provided us some highlights of our star maps. They told us they were fleeing from this part of the galaxy.
  Why, we asked them.
  There is another group of beings, they said. And they are able to take over our own artificial intelligence systems. If we do not flee, we will be absorbed.  We do not wish to be absorbed.

And then they left. And we were left to look up at the sky and guess at what was coming, and ask ourselves if we could get ourselves away from the planet before it arrived.

Things that inspired this story: Thinking about aliens and the immense likelihood they’ll send AI systems instead of ‘living’ beings; thoughts about a galactic scale ‘FOOM’; the intersection of evolution and emergence; ideas about how different forms can have similar functions.

Import AI 256: Facial recognition VS COVID masks; what AI means for warfare; CLIP and AI art

Turns out AI systems can identify people even when they’re wearing masks:
…Facial recognition VS People Wearing Masks: FR 1, Masks 0…
Since the pandemic hit in 2020, a vast chunk of the Earth’s human population have started wearing masks regularly. This has posed a challenge for facial recognition systems, many of which don’t perform as well when trying to identify people wearing masks. This year, the International Joint Conference on Biometrics hosted the ‘Masked Face Recognition’ (MFR) competition, which challenged teams to see how well they could train AI systems to recognize people wearing masks. 10 teams submitted 18 distinct systems into the competition, and their approach was evaluated according to performance (75% weighting) and efficiency (defined as parameter size, where smaller is better, weighted at 25%).

COVID accelerated facial recognition tech: The arrival of COVID caused a rise in research oriented around solving COVID-related problems with computer vision, such as facial recognition through masks, checking for people social distancing via automated analysis of video, and more. Researchers have been developing systems that can do facial recognition on people wearing masks for a while (e.g, this work from 2017, written up in Import AI #58), but COVID has motivated a lot more work in this area.

Who won? The overall winner of the competition was a system named TYAI, developed by TYAI, a Chinese AI company. Joint second place went to systems from the University of the Basque Country in Spain, as well as Istanbul Technical University in Turkey. Third place went to a system called A1 Simple from a Japanese company called ACES, along with a system called VIPLFACE-M from the Chinese Academy of Sciences, in China. Four of five top-ranked solutions use synthetically generated masks to augment the training dataset

Why this matters: “The effect of wearing a mask on face recognition in a collaborative environment is currently a sensitive issue,” the authors write. “This competition is the first to attract and present technical solutions that enhance the accuracy of masked face recognition on real face masks and in a collaborative verification scenario.”
  Read more:MFR 2021: Masked Face Recognition Competition (arXiv).

###################################################

Does AI actually matter for warfare? And, if so, how?
…The biggest impacts of War-AI? Reducing gaps between state and non-state actors…
Jack McDonald, a lecturer in war studies at Kings College London, has written an insightful blogpost about how AI might change warfare. His conclusions are that the capabilities of AI technology (where, for example, identifying a tank from the air is easy, but distinguishing between a civil/non-civil humvee is tremendously difficult), will drive war into more urban environments in the future. “One of the long-term effects of increased AI use is to drive warfare to urban locations. This is for the simple reason that any opponent facing down autonomous systems is best served by “clutter” that impedes its use,” he writes.

AI favors asymmetric actors: Another consequence is that the gradual diffusion of AI capabilities combined with the arrival of low-cost hardware (e.g, consumer drones), will give non-state actors/terror groups a larger menu of things to use when fighting against their opponents. “States might build all sorts of wonderful gizmos that are miles ahead of the next competitor state, but the fact that non-state armed groups have access to rudimentary forms of AI means that the gap between organised state militaries and their non-state military competitors gets smaller,” he writes. “What does warfare look like when an insurgent can simply lob an anti-personnel loitering munition at the FOB on the hill, rather than pestering it with ineffective mortar fire? From the perspective of states, and those who defend a state-centric international order, it’s not good.”

Why this matters: As McDonald writes, “AI doesn’t have to be revolutionary to have significant effects on the conduct of war”. Many of the consequences of AI being used in war will relate to how AI capabilities lower the cost curves of certain things (e.g, making surveillance cheap, or increasing the reliability of DIY-drone explosives) – and one of the macabre lessons of human history is that if you make a tool of war cheaper, then it gets used more (see: what the arrival of the AK-47 did for small arms conflicts).
Read more:What if Military AI is a Washout? (Jack McDonald blog).

###################################################

OpenAI’s CLIP and what it means for art:
…Now that AI systems can be used as magical paintbrushes, what happens next?…
In the past few years, a new class of generative models have made it easier for people to create and edit content. These systems can do things ranging from processing text, to audio, to images. One popular system is ‘CLIP’ from OpenAI, which was released as open source a few months ago. Now, a student at UC Berkeley has written a blog post summarizing some of the weird and wacky ways CLIP has been used by a variety of internet people to create cool stuff – take a read and check out the pictures and build your intuitions about how generative models might change art.

Why systems like CLIP matter: “These models have so much creative power: just input some words and the system does its best to render them in its own uncanny, abstract style. It’s really fun and surprising to play with: I never really know what’s going to come out; it might be a trippy pseudo-realistic landscape or something more abstract and minimal,” writes the author Charlie Snell. “And despite the fact that the model does most of the work in actually generating the image, I still feel creative – I feel like an artist – when working with these models.”
Read more: Alien Dreams: An Emerging Art Scene (ML Berkeley blog).

###################################################

Chinese researchers envisage a future of ML-managed cities; release dataset to help:
…CityNet shows how ML might be applied to city data…
Researchers from a few Chinese Universities as well as JD’s “Intelligent Cities Business Unit” have developed and released CityNet, a dataset containing traffic, layout, and meteorology data for 7 cities. Datasets like CityNet are the prerequisites for a future where machine learning systems are used to continuously analyze and forecast changing patterns of movement, resource consumption, and traffic in cities.

What goes into CityNet? CityNet has three types of data – ‘city layout’, which relates to information about the road networks and traffic of a city, ‘taxi’, which tracks taxis via their GPS data, and ‘meteorology’ which consists of weather data collected from local airports. Today, CityNet contains data from Beijing, Shanghai, Shenzhen, Chongqing, Xi’an, Chengdu, and Hong Kong.

Why this matters: CityNet is important because it gestures at a future where all the data from cities is amalgamated, analyzed, and used to make increasingly complicated predictions about city life. As the researchers write, “understanding social effects from data helps city governors make wiser decisions on urban management.
 Read more:CityNet: A Multi-city Multi-modal Dataset for Smart City Applications (arXiv).
  Get the code and dataset here (Citynet, GitHub repo).

###################################################

What happened at the world’s most influential computer vision conference in 2021? Read this and find out:
…Conference rundown gives us a sense of the future of computer vision…
Who published the most papers at the Computer Vision and Pattern Recognition conference in 2021? (China, followed by the US). How broadly can we apply Transformers to computer vision tasks? (Very broadly). How challenging are naturally-found confusing images for today’s object recognition systems? (Extremely tough). Find out the detailed answers to all this and more in this fantastic summary of CVPR 2021.
Read more: CVPR 2021: An Overview (Yassine, GitHub blog).

###################################################

Tech Tales:

Permutation Day
[Bedroom, 2027]

Will you be adventurous today? says my phone when I wake up.
“No,” I say. “As normal as possible.”
Okay, generating itinerary, says the phone.

I go back to sleep for a few minutes and wake when it starts an automatic alarm. While I make coffee in the kitchen, I review what my day is going to look like: work, food from my regular place, and I should reach out to my best friend to see if they want to hang out.

The day goes forward and every hour or so my phone regenerates the rest of the day, making probabilistic tweaks and adjustments according to my prior actions, what I’ve done today, and what the phone predicts I’ll want to do next, based on my past behavior.

I do all the things my phone tells me to do; I eat the food, I text my friend to hang out, I do some chores it suggests during some of my spare moments.
  “That’s funny,” my friend texts me back, “my phone made the same suggestion.”
  “Great minds,” I write back.
  And then my friend and I drink a couple of beers and play Yahtzee, with our phones sat on the table, recording the game, and swapping notes with eachother about our various days.

That night I go to sleep content, happy to have had a typical day. I close my eyes and in my dream I ask the phone to be more adventurous.
  When I wake I say “let’s do another normal day,” and the phone says Sure.

Things that inspired this story: Recommendation algorithms being applied to individual lives; federated learning; notions of novelty being less attractive than certain kinds of reliability.