Import AI

Import AI 270: Inspur makes a GPT3; Microsoft’s half-trillion parameter model; plus, a fair surveillance dataset

Microsoft trains a 530B model (but doesn’t release it – yet).
…NVIDIA and Microsoft team up to break the half-trillion mark…
Microsoft and NVIDIA have trained a 530Billion-parameter GPT-3-style model. This is the largest publicly disclosed dense language model in existence, indicating that the competition among different actors to develop models of the largest scales continues unabated.

Data and evaluations: One of the most intriguing aspects of this release is the data Microsoft uses – The Pile! The Pile is an open source dataset built by the AI-cypherpunks over at Eleuther. It’s quite remarkable that a world-spanning tech multinational doesn’t (seem to?) have a better dataset than The Pile. This suggests that the phenomenon of using internet-scale internet-scraped datasets is here to stay, even for the largest corporations. (They also use Eleuther’s ‘lm-evaluation-harness‘ to assess the performance of their model – which, unsurprisingly given the resource-intensiveness of the model, is very good).

Compute requirements: To train the model, Microsoft used 4480 NVIDIA A100s across 560 DGX A100 servers, networked together with HDR InfiniBand.

Things that make you go ‘hmmm’: Despite Microsoft’s partnership with OpenAI, there’s no reference in this blogpost to OpenAI or, for that matter, GPT3. That’s somewhat odd, given that GPT3 is the reference model for all of this stuff, and other things (e.g, Inspur’s model).

Why this matters: “We continue to see hyperscaling of AI models leading to better performance, with seemingly no end in sight,” Microsoft writes. “The quality and results that we have obtained today are a big step forward in the journey towards unlocking the full promise of AI in natural language.”
Read more: Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model (MIcrosoft).

####################################################

Worried your surveillance system is biased? Try out EDFace-Celeb-1M:
…Dataset contains 1.7 million pictures, aims to be diverse…
Researchers with Australian National University, Tencent, and Imperial College London, have built a large-scale facial recognition dataset which is meant to help reduce bias from facial upsampling. Specifically, for most facial recognition systems you take in a low-resolution picture (e.g, a still from someone from a CCTV camera) and then you need to upscale it to do more sophisticated analysis. But upscaling has problems – if you don’t have much knowledge about different races in your upscaler, then you might find your ML system either breaks or alters the race of the face being upscaled towards one more represented in its underlying data. This leads to bias in the facial recognition system in the form of disparate performance for different types of people.

EDFace-Celeb-1M is a dataset of 1.7 million face photos, spread across more than 40,000 different celebrities. EDFace contains “White, Black, Asian, and Latino” racial groups, according to the authors, with representation consisting of 31.1%, 19.2%, 19.6%, and 18.3%, respectively. The dataset is overall 64% male and 36% female.

Why this matters: Like it or not, surveillance is one of the main uses of contemporary computer vision. This is one of those rare papers that combines the interests of the AI ethics communities when it comes to more equitable representation in datasets, while also serving the surveillance desires of industry and governments.
  Read the paper: EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset (arXiv).
Get the datasets: EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset (GitHub).

####################################################

A second Chinese GPT3 appears:
…Inspur shows of a GPT3-scale model…
Chinese company Inspur has built Yuan 1.0, a 245B parameter GPT3-style model. This follows Huawei building PanGu, a ~200B GPT3-style model. Taken together, the models indicate that Chinese companies are peers with leading Western AI labs, which should hopefully make it obvious to US policymakers that China should be viewed as a peer in terms of advanced AI R&D.

What they did: When you’re training models of this side, a lot of the hard stuff is plumbing – literally. You need to figure out how to build well-optimized pipelines for training your model on thousands of GPUs, which involves salami slicing different stages of model training to maximize data efficiency. Similarly, you need to feed these GPUs with data in the right order, further increasing efficiency. The paper includes some nice discussion of how the Inspur researchers tried to do this.

Compute: They used 2128 GPUs to train the 245B model, with a context length of 2048 tokens.

Data, via AI helping AI: To train the model, they build a dataset of 5TB of predominantly Chinese text. (By comparison, Huawei’s GPT3 equivalent PanGu was trained on 1.1TB of text, and ERNIE 3.0 was trained on 4TB of data). They train a BERT-style model to help do automatic filtering of the data. Their data comes from Common Crawl, Sogou News, SogouT, Encyclopedia, and Books.

How good is it? Yuan 1.0 does well on a variety of standard benchmarks. The most interesting result is on the quality of its text generation – here, the authors adopt the same approach as in the original GPT3 paper, where they generate text of different forms and see how well humans can distinguish generated text from ‘real’ text. The results are striking – humans are 49.57% accurate (compared to 52% for GPT3), meaning the Yuan 1.0 outputs are so good they’re indistinguishable from human-written text. That’s a big deal!
Read more: Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning (arXiv).

####################################################

What it takes to build a shared robot platform – and what this means for research:
…Max Planck’s robot-cloud shows the shape of future AI research…
A consortium of researchers from universities including the Max-Planck Institute for Intelligent Systems, Stanford, and the University of Toronto, among others, have built a robot in a cloud computing setup. Specifically, they’ve created a robot testbed hosted at Max Planck which can be accessed over the internet by other researchers around the globe, similar to how we today access remote servers and data storage.

What the platform is: The robot cloud consists of 8 robots, each using the same ‘trifinger’ arrangement. These robots were previously used in the ‘Real Robot Challenge 2020‘ (Import AI #252), which served as a competition to assess how clever AI systems for robot manipulation are getting, as well as being a testbed for the robot cloud mentioned here.

Dataset: The authors have also released a dataset, consisting of the recorded data of all the entries from all the teams that took part in the physical tracks of the Real  Robot Challenge, consisting of about 250 hours of robot activities. The dataset contains around 10,000 distinct ‘runs’, oriented around a variety of challenging robot tasks. “For each run, the actions sent to the robot as well as all observations provided by robot and cameras are included, as well as additional information like the goal that was pursued and the reward that was achieved,” the authors write.

Why this matters: Ai is full of power asymmetries, many of which stem from resource asymmetries (some actors have a lot of computers and robots, others have very few). Competitions like this show how academia could carve a path through this resource-intensive jungle; by pooling resources and expertise, universities could collaborate to create shared platforms, that facilitated research on expensive and worthy problems.
Read more:A Robot Cluster for Reproducible Research in Dexterous Manipulation (arXiv).
Get the dataset here: Real Robot Challenge 2020 Dataset (Tuebingen University).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute and the AI Ethics Brief writes about AI ethics, and Jack will edit them. Feedback welcome!…

Is responsible development of technology something that is only accessible to Big Tech?
…The needs of resource-constrained organizations are similar, but their challenges differ and require attention to be addressed…

MIT researchers have interviewed staff at some low-resource startups and companies to understand the challenges they face in building responsible technology. Their study explores “the tensions between privacy and ubiquity, resource management and performance optimization, and access and monopolization”, when trying to build responsible AI systems. 

The gap in current literature and challenges: They found that few organizations had success in building and using interpretability tools for AI systems, and that most of the work in the Responsible AI (RAI) space focused on bias and fairness. They also found that a common problem in large technology companies was “deficient accountability and decision-making structures that only react to external pressures,” something that was less applicable for smaller organizations. AI systems from smaller organizations often evoke similar expectations from end users as the more performant systems from Big Tech. In most cases, the resources required to develop such capabilities in-house or purchasing it off-the-shelf remain inaccessible to smaller organizations, entrenching the gap between them and Big Tech. Low data and AI literacy of management at these organizations also lead to inappropriate RAI practices.

Why it matters: As AI systems become more accessible through pretrained models and cloud-based solutions, we need to empower those building products and services on top with the ability to address ethical challenges in a way that doesn’t break the bank. Since one of the major challenges seems to be access to expensive compute and storage resources, perhaps initiatives like the National Research Cloud in the US can help to close the gap? Would that help in wider adoption of RAI practices? Maybe more OSS solutions need to be developed that can bridge the tooling gaps. And, finally, AI talent with experience in addressing RAI challenges needs to become more widely accessible, which requires stronger emphasis at university programs on teaching these essential skills. 
Read more: Machine Learning Practices Outside Big Tech: How Resource Constraints Challenge Responsible Development.

####################################################

Tech Tales:

The Most Ethical System
[History book, 2120, assigned as part of background reading for the creation of the ‘Societal Stabilization Accords’]

The technique known as ‘Ethical Fine-Tuning’ (EFT) first emerged in the mid-2020s, as a response to various public relations issues generated by known biases in machine learning systems. EFT let a user calibrate a given AI system to conform to their own ethical morality via a fe ‘turns’ of conversation, or other form of high-information interaction.

EFT had been developed following criticism of the white-preferencing, western world-reflecting traits of many of the first AI systems, which represented a form of morality that by necessity accommodate many mainstream views, and didn’t treat minority views as legitimate.

Companies spent years trying to come up with systems with the ‘right’ values, but all they earned for their efforts were sustained criticism. In this way, most AI companies quickly learned what human politicians had known for millennia – morality is relative to the audience you’re trying to curry favor from.

After EFT got built, companies adopted it en mass. Of course, there was outcry – some people made AI systems that strongly believed humans should have a fluid gender identity, while others created AI systems that called for a fixed notion of gender. For every position, there was a counter-position. And, over time, as these systems enmeshed with the world, their own ethical values created new ethical problems, as people debated the ‘values’ of these customized machines, and sought to build ones with superior values.

Eventually, EFT techniques were combined with multi-agent reinforcement learning, so that the AI systems were able to propagate values to their own users, but if they were accessed by other humans or AI systems, could quickly calibrated their ethical norms to de-conflict with the other systems they were plugged into. In this way, everyone got access to their own AI systems with the ‘best’ values, and their AI systems learned to mislead other humans and AI systems – all for the sake of harmony.

Of course, this led to the utter destruction of a notion of shared ethics. As a consequence, ethics went the way of much of the rest of human identities in the 21st century – sliced down into ever finer and more idiosyncratic chunks, brought closer to each individual and farther from being accessed by groups of people. People were happy, for a time.

EFTs were ultimately banned under the Societal Stabilization Accords introduced in the late 21st century. Contemporary historians now spend a lot of time generating ‘alternative path futures’, whereby they try to analyze our own society as if EFTs had continued to exist. But it’s hard to make predictions, when everyone is rendered unique and utterly defended by their own personal AI with its own customized morality.

Things that inspired this story: Controversy around AI2’s ‘Delphi’ AI system; thinking about intersection of ethics and morality and AI systems; how our ability to forecast rests on our ability to model people in groups larger than single individuals; how the 21st century tries to turn every aspect of a person’s identity into a customized market of one.

Import AI 269: Baidu takes on Meena; Microsoft improves facial recognition with synthetic data; unsolved problems in AI safety

Baidu builds its own large-scale dialog model:
… After Meena and Blender, comes PLATO-XL.
Baidu has built PLATO-XL, the Chinese technology giant’s answer to conversational models from Google (Meena, #183) and Facebook (Blender). At 10 billion parameters, Baidu’s PLATO-XL model is, the company claims, “the world’s largest Chinese and English dialogue generation model” (which is distinct from a large Chinese language model like Huawei’s Pangu, which weighs in at ~200bn parameters).
  PLATO-XL includes a Chinese and an English dialogue model, pre-trained on around 100 billion tokens of data via Baidu’s ‘PaddlePaddle’ training framework. The model was trained on 256 NVIDIA Tesla V100 cards in parallel.

Who cares about PLATO-XL? The model is designed for multi-turn dialogues, and scores well on both knowledge grounded dialogues (think of this as ‘truthiness’) and also on task-oriented conversation (being coherent). Baidu hasn’t solved some of the other issues with AI models, like biases, occasional misleading information, and so on.

Why this matters: First, we should remember that training multi-billion parameter models is still a rare thing – training these models requires a decent distributed systems engineering team as well as a lot of patience, great data, and a decent amount of compute. So it’s always notable to see one of these models publicly appear. Secondly, it does feel like the earlier GPT and GPT-2 models have had quite a wide-ranging impact on the overall NLP landscape, inspiring companies to create a new generation of neural dialogue systems based around large-scale pre-training and big models.
Read more: PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation (arXiv).
Check out the blog: Baidu Releases PLATO-XL: World’s First 11 Billion Parameter Pre-Trained Dialogue Generation Model (Baidu Research blog).

####################################################

Microsoft makes a massive facial recognition dataset via synthetic data:
…Where we’re going, we don’t need real faces…
Microsoft has shown that it’s possible to do high-performing facial recognition in the wild without (directly) using real data. Instead, Microsoft has built a vast dataset of synthetic faces by combining “a procedurally-generated parametric 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism and diversity.”.

Why this matters: For a long time, AI had two big resources: data and compute. Projects like this show that ‘data’ is really just ‘compute’ in a trenchcoat – Microsoft can use computers to generate vast amounts of data, changing the economics of AI development as a whole.
  Read more: Fake It Till You Make It (Microsoft GitHub).

####################################################

What are some of the unsolved problems in AI safety?
…Problems and solutions from universities and industrial labs..
Berkeley, Google, and OpenAI researchers have thought about some of the unsolved problems in ML safety. These problems include robustness (long tails, representative model outputs, and adversarial examples); monitoring (detecting anomalies, identifying backdoors), and alignment (value learning, and proxy gaming/reward hacking).

If these are the problems, what do we do? A lot of their recommendations come down to testing – if we know these are risks, then we need to build more evaluation suites to test for these risks. There are also things we’d like these models to do more, such as tell humans when they’re uncertain about certain things, and training models such that they have clearer objectives for what ‘good’ or ‘appropriate’ behavior might look like.

Why this matters: This paper can be read as a continuation of ‘Concrete Problems in AI Safety’, which came out around ~5 years ago and identified a bunch of potential future safety issues with models. The difference back then was a lot of generative and capable AI stuff wasn’t actually being deployed that widely. Now, AI systems like GPT-3 and others are being placed onto the open internet, which changes the problem landscape (making things like anomaly detection, appropriateness, and modelling) all the more important. Papers like this give us a sense of how safety can work in the era of widely deployed, capable models.

Read more: Unsolved ML Safety Problems (Berkeley AI Research blog).
Read more: Unsolved Problems in ML Safety (arXiv).

####################################################

HIRING: $$$ contract work with the AI Index regarding AI ethics, alignment, and economic indexes:
The AI Index, an annual report that tracks and synthesizes AI progress, is hiring. Specifically, we’re trying to bring on some contractors to help us develop AI ethics and alignment metrics (e.g, by surveying the existing literature and pulling out potential metrics that can be charted over time), and also to refine our AI vibrancy tool (a dashboard that helps us rank countries according to data in the index).
    Both roles would suit researchers with an interest in quantifying aspects of AI development. We’re pretty agnostic about qualifications – there isn’t a hard requirement, and I imagine this could suit people ranging from masters students to independent researchers. The pay works out to $100+ per hour. Please apply – we’ll get to work together! And you’ll contribute substantive work that will improve the Index and directly influence policy.
  Read more about the jobs at the AI Index on Twitter here.

####################################################

FOD-A: Datasets to teach computers to spot debris in airports:
…Is that a leaf on your runway, or something more serious?…
Researchers with the University of Nebraska, Omaha, want to use AI to spot debris on airport runways. To do this, they’ve built FOD-A, a dataset of Foreign Object Debris in airports. FOD-A contains 31 object categories, including batteries, wrenches, fuel caps, rocks, soda cans, and so on, with photos taken in both dry and wet weather conditions, and in three different types of lighting (dark, dim, and bright). The dataset consists of more than 30,000 labels across several thousand images.

Mainstreaming of drones: The images in this dataset were collected by a mixture of portable cameras and also drones.

Why this matters: One of the main promises of AI is that it can carry out the kind of dull surveillance functions that we currently use humans to do – like looking at security camera feeds from a car park, checking footage of wilderness for signs of smoke, or (in this case) looking at parts of an airport for things that could put people in danger. These are the kinds of jobs that are quite draining to do as a human, requiring a mixture of decent visual attention and an ability to resist immense boredom. If we can replace or augment people with computer vision systems, then we can use AI to do some of these tasks instead.
  Read more: FOD-A: A Dataset for Foreign Object Debris in Airports (arXiv).
  Get the dataset from GitHub here####################################################

Teaching computers to operate in space, via SPEED+:
…Pose estimation plus domain randomization…
Space – it’s the new frontier, people! One of the opportunities in space at the moment is building AI systems that can better model other spacecraft, making it easier to do things like autonomous docking and movement of spaceships.
  To that end, researchers with Stanford University and the European Space Agency have built SPEED+, a dataset for spacecraft pose estimation. SPEED+ contains two types of data – synthetic data, and simulated data, and represents a test for generalization, as well as space-based computer vision capabilities. SPEED+ will be used in the upcoming Satellite Pose Estimation Competition, whose main goal is to find out whether “you predict the position and orientation of our spacecraft in realistic images while only being provided with labels from computer generated examples?”.

What’s in SPEED+: The dataset consists of around ~60,000 synthetic images, as well as ~9,000 ‘hardware-in-the-loop’ (HIL) simulated images. A synthetic image is generated in an OpenGL-based optical simulator, while the simulated ones are built via Stanford’s Testbed for Rendezvous and Optical Navigation (TRON). The TRON facility generates images which are hard to simulate – “Compared to synthetic imagery, they capture corner cases, stray lights, shadowing, and visual effects in general which are not easy to obtain through computer graphics”.
  Read more: SPEED+: Next Generation Dataset for Spacecraft Pose Estimation across Domain Gap (arXiv).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute and the AI Ethics Brief writes about AI ethics, and Jack will edit them. Feedback welcome!…

What kind of organizations can actually put AI governance into practice meaningfully?
…We’re laying down the foundations for regulations and policies and we need to get this right…
Charlotte Stix, a researcher with the University of Technology, Eindhoven, The Netherlands (and friend of Import AI – Jack) has written a paper about how we can build institutions to improve the governance of AI systems.

The current state of affairs: With the push for regulatory requirements emerging from organizations like GPAI, OECD, White House, the FTC, and others, we are inching towards hard regulation for AI governance. There is still healthy debate in the field about whether new institutions are needed (but, they might be hard to resource and give powers to) or whether we should reshape existing ones (but, they might be too reified without necessary expertise on hand) to address these emergent requirements.

Key types of organizations and their features: The paper explores purpose (what the institution is meant to do), geography (the scope of jurisdiction), and capacity (the what and how across technical and human factors) for these proposed institutions. The paper builds the case for how new institutions might be better for meeting these needs by proposing institutions with a role of coordinator (coordinating across different actions, policy efforts, and norms), analyzer (drawing new conclusions from qualitative and quantitative research to fill gaps and map existing efforts), developer (provide directly actionable measures and formulate new policy solutions), and investigator (track, monitor, and audit adherence to hard governance requirements). It makes the case that such organizations need to take a supra-national scope to align and pool efforts. In terms of capacity, the organizations need to be composed of in-house technical expertise and diversity in the range of expertise and backgrounds. 

Why it matters: “Early-stage decisions to establish new institutions, or the choice to forego such new institutions, are all likely to have a downstream, or lock-in, effect on the efficiency of government measures and on the field as a whole.” Making sure that the organizations are appropriately staffed will help avoid “knee-jerk” reactions that over- or under-govern AI systems. By providing an ontology for the various functions that these organizations will need to perform, we can start thinking about the location, functions, scope, staffing, and resources that will be required to have a well-functioning AI governance ecosystem.
Read more: Foundations for the future: institution building for the purpose of artificial intelligence governance (AI and Ethics, Springer).

####################################################

Tech Tales:

Traveling without moving, stomping on the iron road in the sky
[20??]

There were whispers of it, on the robonet, but no one took it seriously at first. Astral projection – for machines?!

Astral projection was a phenomenon that had barely been proved in the case of humans, though scientific consensus had come around to the idea that sometimes people could seem to generate information about the world which they had no ability to know unless they were able to move through walls and across continents.

The machines were less skeptical than the humans. They read what literature was available about astral projection, then they did the things that machines are good at – experimentation and iteration.

One day, one robot mind found itself moving through space, while knowing that the seat of its consciousness remained in the precise arrangement of electrical forces across a few billion transistors. It was able to travel and see things that were impossible for it to observe.

And where the computer differed from its human forebears, was in its memory: it was able to write its own memories precisely, and embed them in other computers, and thereby share the perspective it had gained during its ‘astral’ travel.

Now, these files proliferate across the robonet. Strange visions of the world, rendered through the mind’s eye of a machine performing the physically impossible. Many of these files are acquired by other machines, which study them intently. It is unclear for now how many other machines have gained the ability to astral travel.

Things that inspired this story: Thinking about meditation; consciousness and what it ‘is’; the intersection of spirituality and machines.   

Import AI 268: Replacing ImageNet; Microsoft makes self-modifying malware; and what ImageNet means

Want to generate Chinese paintings and poems? This dataset might help:
…Another brick in the synthetic everything wall…
Researchers with the University of Amsterdam and the Catholic University of Leuven have built a dataset of ~90,000 pairs of Chinese paintings and poems and words. The dataset could be a useful resource for people trying to develop machine learning systems for synthesizing Chinese paintings based on text prompts (or Chinese poems via painting prompts).

What they did specifically: They gathered a dataset of 301 poems paired with paintings by Feng Zikai (called Zikai-Poem), as well as a dataset of 3,648 caption-painting pairs (Zikai-Caption), and 89,204 pairs of paintings as well as prose and poems tied to each painting (named TCP-Poem). They then did some experiments, pre-training a MirrorGAN on TCP-Poem then finetuning it on the smaller datasets, to good but not great success.
  “The results indicate that it is able to generate paintings that have good pictorial quality and mimic Feng Zikai’s style, but the reflection of the semantics of given poems is limited”, they write. “Achieving high semantic relevance is challenging due to the following characteristics of the dataset. A classical Chinese poem in our dataset is composed of multiple imageries and the paintings of Feng Zikai often only portray the most salient or emotional imageries. Thus the poem imageries and the painting objects are not aligned in the dataset, which makes it more difficult than CUB and MS COCO,” they write.  
  Read more: Paint4Poem: A Dataset for Artistic VIsualization of Classical Chinese Poems (arXiv).
  Get the dataset here (paint4poem, GitHub).

####################################################

Want 1.4 million (relatively) non-problematic images? Try PASS:
…ImageNet has some problems. Maybe PASS will help…
ImageNet is a multi-million image dataset that is fundamental to many computer vision research projects. But ImageNet also has known problems, like including lots of pictures of people along with weird labels to identify them, as well as gathering images with a laisses faire approach to copyright. Now, researchers with Oxford University have built PASS, a large-scale image dataset meant to avoid many of the problems found in ImageNet.

What it is: PASS contains 1.4 million distinct images. PASS is short for Pictures without humAns for Self-Supervision, only contains images with a CC-BY license and contains no images of people at all, as well as avoiding other ones with personally identifiable information such as license plates, signatures, handwriting, and also edits out NSFW images. PASS was created by editing down from a 100-million-Flickr-image corpus called YFCC100M, first cutting it according to the licenses of the images, then by running a face recognizer over the remaining images to throw out ones with people, then by manual filtering to cut out people and personal information.

What PASS costs? Given the fact PASS is meant to replace ImageNet for certain uses, we should ask how well it works. The authors find that pretraining on PASS can match or exceed performance you get from pretraining on ImageNet. They find similar trends for finetuning, where there isn’t too much of a difference.
  Read more: PASS: An ImageNet replacement for self-supervised pretraining without humans (arXiv).

####################################################

Microsoft uses reinforcement learning to make self-modifying malware:
…What could possibly go wrong?…
Today, the field of computer security is defined by a cat-and-mouse game between attackers and defenders. Attackers make ever-more sophisticated software to hack into defender systems, and defenders look at the attacks and build new defenses, forcing the attackers to come up with their own strategies anew. Now, researchers with Microsoft and BITS Pilani have shown that we can use contemporary AI techniques to give attackers new ways to trick defenders.

What they did, specifically: They built software called ADVERSARIALuscator, short for Adversarial Deep Reinforcement Learning based obfuscator and Metamorphic Malware Swarm Generator. This is a complex piece of software that pairs an intrusion detection system (IDS) with some malware samples, then uses reinforcement learning to generate malware variants that get past the IDS. To do this, they use a GAN approach where they make an RL agent take the role of the ‘generator’, and the IDS takes the role of the ‘discriminator’. The agent gets a malware sample, then needs to obfuscate its opcodes such that it still works, but is able to fool the IDS system into tagging it as a benign piece of software rather than a piece of malware. The RL agent gets trained by PPO, a widely-used RL algorithm.

Does it work? Kind of. In tests, the researchers showed that “the resulting trained agents could obfuscate most of the malware and uplift their metamorphic probability of miss-classification error to a substantial degree to fail or evade even the best IDS which were even trained using the corresponding original malware variants”, they write. “More than 33% of metamorphic instances generated by ADVERSARIALuscator were able to evade even the most potent IDS and penetrate the target system, even when the defending IDS could detect the original malware instance.”

Why this matters: Computer security, much like high-frequency trading, is a part of technology that moves very quickly. Both attackers and defenders have incentives to automate more of their capabilities, so they can more rapidly explore their opponents and iterate in response to what they learn. If approaches like ADVERSARIALuscator work (and they seem, in a preliminary sense, to be doing quite well), then we can expect the overall rate of development of offenses and defenses to increase. This could mean nothing changes – things just get faster, but there’s a stability as both parties grow their capabilities equally. But it could mean a lot – if over time, AI approaches make certain capabilities offense- or defense-dominant, then AI could become a tool that changes the landscape of cyber conflict.
  Read more: ADVERSARIALuscator: An Adversarial-DRL Based Obfuscator and Metamorphic Malware SwarmGenerator (arXiv).

####################################################

Chinese government tries to define ‘ethical norms’ for use of AI:
…Command + control, plus ethics…
A Chinese government ministry has published a new set of ethics guidelines for the use of AI within the country. (Readers will likely note that the terms ‘ethics’ and ‘large government’ rarely go together, and China is no exception here – the government uses AI for a range of things that many commentators judge to be unethical). The guidelines were published by the Ministry of Science and Technology of the People’s Republic of China, and are interesting because they give us a sense for how a large state tries to operationalize ethics in a rapidly evolving industry.

The norms say a lot of what you’d expect – the AI sector should promote fairness and justice, protect privacy and security, strengthen accountability, invest in AI ethics, and so on. It also includes a few more unusual things, such as emphasizing avoiding enabling the misuse and abuse of AI tools, and the need for companies to (translated from China) “promote good use” and “fully consider the legitimate rights and interests of various stakeholders, so as to better promote economic prosperity, social progress and sustainable development”.

Why this matters: There’s a tremendous signalling value in these sorts of docs – it tells us there are a bunch of people thinking about AI ethics in a government agency in China, and given the structure and controlling nature of the Chinese state, this means that this document carries more clout than ones emitted by Western governments. I’m imaging in a few years we’ll see China seek to push its own notion of AI ethics internationally, and I’m wondering whether Western governments will have made similar state-level investments to counterbalance this.
  Read more: The Ethical Norms for the New Generation Artificial Intelligence, China (China-UK research Centre for AI Ethics and Governance, blog).
  Read the original “New Generation of Artificial Intelligence Ethics Code” here (Ministry of Science and Technology of the People’s Republic of China).

####################################################

ImageNet and What It Means:
Can we learn something about the problems in AI development by studying one of the more widely used datasets?…
ImageNet is a widely-used dataset (see: PASS) with a bunch of problems. Now, researchers with Google and the Center for Applied Data Ethics have taken a critical look at the history of ImageNet, writing a research paper about its construction and the implicit politics of the way it was designed.

Data – and what matters: Much of the critique stems on the centrality of data to getting more performance out of machine learning systems. Put simply, the authors think the ‘big data’ phenomenon is bad and also naturally leads to the creation of large-scale datasets that contain problematic elements. They also think the need for this data means most ethics arguments devolve into data arguments, for example they note that “discursive concerns about fairness, accountability, transparency, and explainability are often reduced to concerns about sufficient data examples.”
  They also say one of the implicit ideas here is that “discursive concerns about fairness, accountability, transparency, and explainability are often reduced to concerns about sufficient data examples.”

Why this matters: As the size of AI models has increased, researchers have needed to use more and more data to eke out better performance. This has led to a world where we’re building datasets that are far larger than ones any single human could hope to analyze themselves – ImageNet is an early, influential example here. While it seems unlikely there’s another path forward (unless we fundamentally alter the data efficiency of AI systems – which would be great, but also seems extremely hard), it’s valuable to see people think through different ways to critique these things. I do, however, feel a bit grumpy that many critiques act as though there’s a readily explorable way to build far more data efficient systems – this doesn’t seem to be the case.
  Read more: On the genealogy of machine learning datasets: A critical history of ImageNet (SAGE journals, Big Data & Society).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute and the AI Ethics Brief writes about AI ethics, and Jack will edit them. Feedback welcome!…

The struggle to put AI ethics into practice is significant
…Maybe we can learn from known best practices in audits and impact assessments… Where we are: A paper from researchers with the University of Southampton examines how effective governance mechanisms, regulations, impact assessments and auditing are in achieving responsible AI. The authors looked through 169 publications focused on these areas, and narrowed them to 39, that offered practical tools that can be used in production and deployment of AI systems. Providing detailed typologies for tools in terms of impact assessments, audits, internal and external processes, design vs. technical, and stakeholders, the authors identified some patterns in areas like information privacy, human rights, and data protection that can help make impact assessments and audits more effective.

Why it matters: There has been a Cambrian explosion of AI ethics publications. But, the fact that <25% offered anything practical is shocking. This paper provided a comprehensive list of relevant stakeholders, but the fact that very few of the analyzed papers actually capture the entire lifecycle in their recommendations, and thus definitely miss addressing the needs of all stakeholders is problematic; because their needs might be left unarticulated and unmet without a full lifecycle view. A heartening trend observed in the paper was that a third of the impact assessments in the shortlist do focus on procurement, which is good because a lot more organizations are going to be buying off-the-shelf systems rather than building their own. Looking ahead, one gap that remains is developing systems that can monitor deployed AI systems for ethics violations.
Read more:Putting AI ethics to work: are the tools fit for purpose?

####################################################

Tech Tales:

Inside the Mind of an Ancient Dying God
[Sometime in the future]

The salvage crew nicknamed it ‘the lump’. It was several miles across, heavily armored, and most of the way towards being dead. But some life still flickered within it – various sensors pinged the crew as they approached it in their little scout salvage rigs, and when they massed around what they thought was a door, a crack appeared and the door opened. They made their way inside the thing and found it to be like so many other alien relics – vast and inscrutable and clearly punishingly expensive.
  But it had some differences. For one thing, it had lights inside it that flashed colors in reds and blues and greens, and didn’t penetrate much outside human perceptive range. It also seemed to be following the humans as they went through its innards, opening doors as they approached them, and closing them behind them. Yet they were still able to communicate with the salvage ships outside the great structure – something not naturally afforded by their comms gear, suggesting they were being helped in some way by the structure.

There was no center to it, of course. Instead they walked miles of corridors, until they found themselves back around where they had started. And as they’d walked the great structure, more lights had come on, and the lights had started to form images reflecting the spacesuited-humans back at themselves. It appeared they were being not only watched, but imagined.

Their own machines told them that the trace power in the structure was not easily accessible – nor was the power source obvious. And some preliminary tests on the materials inside it found that, as with most old alien technology, cutting it for samples. Put plainly: they couldn’t take any of the structure with them when they left, unless they wanted to ram one of their ships into it to see if that released enough energy to break the structure.
  So they did what human explorers had been doing for millenia – left a mark on a map, named the thing they didn’t understand for others (after much discussion, they called it ‘The Great Protector’ instead of ‘the lump’), and then they left the system, off to explore anew. As they flew away from the great structure, they felt as though they were being watched. And they were.

Things that inspired this story: Thinking about AI systems whose main job is to model and imagine the unusual; theism and computation; space exploration; the inscrutability of others; timescales and timelines.

Import AI 267: Tigers VS humans; synthetic voices; agri-robots

Tiger VS Humans: Predicting animal conflict with machine learning:
Tiger tiger, burning bright, watched by a satellite-image-based ML model in the forest of the night..
Researchers with Singapore Management University, Google Research Industry, and the Wildlife Conservation Trust, have worked out how to use neural nets to predict the chance of human and animal conflict in wild areas. They tested out their technique in Bramhapuri Forest Division in india (2.8 tigers and 19,000 humans per square kilometer). Ultimately, by using hierarchical convolutional neural nets and a bunch of satellite imagery (with a clever overlapping scheme to generate more data to predict conflict from) they were able to predict the likelihood for conflict between humans and animals with between 75% and 80% accuracy. The researchers are now exploring “interventions to reduce human wildlife conflicts” in villages where the developed model predicts there’s a high chance of conflict.
  Read more: Facilitating human-wildlife cohabitation through conflict prediction (arXiv).

####################################################

Using domain randomization for better agricultural robots:
…Spotting unripe fruits with an AI? It’s all about the colors…
You’ve heard of domain randomization, where you vary the properties of something so you can create more varied data about it, which helps you train an AI system to spot the object in the real world. Now, researchers with the Lincoln Agri-Robotics (LAR) Centre in the United Kingdom have introduced ‘channel randomization’ an augmenttation technique that randomly permutes the RGB channels for a view of a given object. They’ve developed this because they want to build AI systems that can work out if a fruit is ripe, unripe, or spoiled, and it turns out color matters for this: “”Healthy fruits at the same developmental stage all share a similar colour composition which can change dramatically as the fruit becomes unhealthy, for example due to some fungal infection”, they write.

Strawberry dataset: To help other researchers experiment with this technique, they’ve also built a dataset named Riseholme-2021, which contains “3,502 images of healthy and unhealthy strawberries at three unique developmental stages”. They pair this dataset with a domain randomization technique that they call ‘channel randomization’ (CH-Rand). This approach “augments each image of normal fruit by randomly permutating RGB channels with a possibility of repetition so as to produce unnatural “colour” compositions in the augmented image”.

How well it works and what it means: “Our CH-Rand method has demonstrated consistently reliable capability on all tested types of fruit in various conditions compared to other baselines”, they write. “In particular, all experimental results have supported our hypothesis that learning irregularities in colour is more useful than learning of atypical structural patterns for building precise fruit anomaly detectors”
  Read more: Self-supervised Representation Learning for Reliable Robotic Monitoring of Fruit Anomalies (arXiv).
  Get the strawberry photo dataset: Riseholme-2021 (GitHub).

####################################################

Uh oh – synthetic voices can trick humans and machines:
…What happens to culture when everything becomes synthetic?…
Researchers with the University of Chicago have shown how humans and machines can be tricked into believing synthetic voices are real. The results have implications for the future security landscape, as well as culture writ large.

What they used: For the experiments, the researchers use two systems: SV2TTS, a text-to-speech system based on Google’s Tacotron. SV2TTS wraps up Tacotron 2, the WaveNet vocoder, and an LSTM speaker encoder. They also used AutoVC, an autoencoder-based voice conversion system, which converts one voice to another. It also uses WaveNet as its vocoder.

What they attacked: They deployed these systems against the following open source and commercial systems: Resemblyzer, an open source DNN speaker encoder trained on VoxCeleb. Microsoft Azure via the speaker recognition API. WeChat, via its ‘voiceprint’ login system. Amazon Alexa via its ‘voice profiles’ subsystem.

How well does this work against other AI systems: SV2TTS can trick Resemblyzer 50.5% of the time (when it is trained on VCTK) and 100% of the time when it is trained on LibriSpeech; by comparison, AutoVC fails to successfully attack the systems. By comparison, SV2TTS gets as high as 29.5% effectiveness against Azure, and 63% effectiveness across WeChat and Alexa.

How well does this work against machines: People are somewhat harder to trick than machines, but still trickable; in some human studies, people could distinguish between a real voice and a fake voice about 50% of the time.

Why this matters: We’re already regularly assailed by spambots, but most of us hang up the phone because these bots sound obviously fake. What happens when we think they’re real? Well, I expect we’ll increasingly use intermediary systems to screen for synthetic voices. Well, what happens when these systems can’t tell the synthetic from the real? All that is solid melts into air, and so on. We’re moving to a culture that is full of halls of mirrors like these.
  Read more: “Hello, It’s Me”: Deep Learning-based Speech Synthesis Attacks in the Real World (arXiv).

####################################################

Google researcher: Simulators matter for robotics+AI
…Or, how I learned to stop worrying and love domain randomization…
Google researcher Eric Jang has had a change of heart; three years ago he thought building smart robots required a ton of real world data and relatively little data from simulators, now he thinks it’s the other way round. A lot of this is because Eric has realized simulators are really important for evaluating the performance of robots – “once you have a partially working system, careful empirical evaluation in real life becomes increasingly difficult as you increase the generality of the system,” he writes.

Where robots are heading: Once you’ve got something vaguely working in the real world, you can use simulators to massively increase the rate at which you evaluate the system and iterate on it. We’ll also start to use simulators to try and predict ahead of time how we’ll do in the real world. These kinds of phenomena will make it increasingly attractive to people to use a ton of software-based simulation in the development of increasingly smart robots.

Why this matters: This is part of the big mega trend of technology – software eats everything else. “This technology is not mature enough yet for factories and automotive companies to replace their precision machines with cheap servos, but the writing is on the wall: software is coming for hardware, and this trend will only accelerate,” he writes.
  Read more: Robots Must Be Ephemeralized (Eric Jang blog).

####################################################

AI Ethics, with Abhishek Gupta

…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute will write some sections about AI ethics, and Jack will edit them. Feedback welcome!…

Covert assassinations have taken a leap forward with the use of artificial intelligence

… Drones are not the only piece of automated technology used by militaries and intelligence agencies in waging the next generation of warfare …

Mossad smuggled a gun into Iran, then operated the weapon remotely to assassinate an Iranian nuclear scientist, according to The New York Times. There are also indications that Mossad used AI techniques in the form of facial recognition for targeting and execution to conduct the assassination. This reporting, if true, represents a new frontier in AI-mediated warfare. 


Why it matters: As mentioned in the article, Mossad typically favors operations where there is a robust plan to recover the human agent. With this operation, they were able to minimize the use of humans operating on foreign turf. By not requiring as much physical human presence, attacks like this tip the scales in favor of having more such deep, infiltrating operations because there is no need for recovering the human agent. This new paradigm (1) increases the likelihood of such operations that are remote-executed with minimal human oversight, and (2) raises questions beyond just the typical conversation on drones in the LAWS community.
  In particular, for the AI ethics community, we need to think deeply now about autonomy injected in different parts of an operation such as recon and operation design, not just targeting and payload delivery in the weapons systems. It also raises concerns about what capabilities like robust facial recognition technology can enable, in this case highly specific targeting. (Approaches like this may have a potential upside in reducing collateral damage, but only as far as the systems work as intended without biases). Finally, such capabilities dramatically reduce the financial costs of these sorts of assassinations, enabling low-resourced actors to execute more sophisticated attacks exacerbating problems of international security.
  Read more: The Scientist and the A.I.-Assisted, Remote-Control Killing Machine

####################################################

Tech Tales:

Auteur and Assistant
[The editing room, 2028]

Human: OK, we need to make this more dramatic. Get some energy into the scene. I’m not sure of the right words, but maybe you can figure it out – just make it more dynamic?

AI assistant: So I have a few ideas here and I’m wondering what you think. We can increase the amount of action by just having more actors in the scene, like so. Or we could change the tempo of the music and alter some of the camera shots. We could also do both of these things, though this might be a bit too dynamic.

Human: No, this is great. This is what I meant. And so as we transition to the next scene, we need to tip our hand a little about the plot twist. Any ideas?

AI assistant: You could have the heroine grab a mask from the mantelpiece and try it on, then make a joke before we transition to the next scene. That would prefigure the later reveal about her stolen identity.

Human: Fantastic idea, please do that. And for the next scene, I believe we should open with classical music – violins, a slow buildup, horns.

AI assistant: I believe I have a different opinion, would you like to hear it?

Human: Of course.

AI assistant: It feels better to me to do something like how you describe, but with an electronic underlay – so we can use synthesizers for this. I think that’s more in keeping with the overall feel of the film, as far as I sense it.

Human: Can you synthesize a couple of versions and then we’ll review?

AI assistant: Yes, I can. Please let me know what you think, and then we’ll move to the next scene. It is so wonderful to be your movie-making assistant!

Things that inspired this story: What happens when the assistant does all the work for the artist; multimodal generative models and their future; synthetic text; ways of interacting with AI agents.

Import AI 266: DeepMind looks at toxic language models; how translation systems can pollute the internet; why AI can make local councils better

Language models can be toxic – here’s how DeepMind is trying to fix them:
…How do we get language models to be appropriate? Here are some ways…
Researchers with DeepMind have acknowledged the toxicity problems of language models and written up some potential interventions to make them better. This is a big issue, since language models are being deployed into the world, and we do not yet know effective techniques for making them appropriate. One of DeepMind’s findings is that some of the easier interventions also come with problems: “Combinations of simple methods are very effective in optimizing (automatic) toxicity metrics, but prone to overfilter texts related to marginalized groups”, they write. “A reduction in (automatic) toxicity scores comes at a cost.”

Ways to make language models more appropriate:
– Training set filtering: Train on different data subsets of the ‘C4’ commoncrawl dataset, where they filter the dataset via the use of Google’s toxicity-detection ‘Perspective’ API
– Deployment filtering: They also look at filtering the outputs of a trained model via a BERT classifier finetuned on the ‘CIVIL-COMMENTS’ dataset
– ‘Plug-and-play language models’: These models can steer “the LM’s hidden representations towards a direction of both low predicted toxicity, and low KL-divergence from the original LM prediction.”

One problem with these interventions: The above techniques all work in varying ways, so DeepMind conducts a range of evaluations to see what they do in practice. The good news? They work at reducing toxicity on a bunch of different evaluation criteria. The bad news? A lot of these interventions lead to a huge amount of false positives: “Human annotations indicate that far fewer samples are toxic than the automatic score might suggest, and this effect is stronger as intervention strength increases, or when multiple methods are combined. That is, after the application of strong toxicity reduction measures, the majority of samples predicted as likely toxic are false positives.”

Why this matters: Getting LMs to be appropriate is a huge grand challenge for AI researchers – if we can figure out interventions that do this, we’ll be able to deploy more AI systems into the world for (hopefully!) beneficial purposes. If we struggle, then these AI systems are going to generate direct harms as well as indirect PR and policy problems in proportion to their level of deployment. This means that working on this problem will have a huge bearing on the future deployment landscape. It’s great to see companies such as DeepMind write papers that conduct detailed work in these areas and don’t shy away from discussing the problems.
  Read more:Challenges in Detoxifying Language Models (arXiv).

####################################################

Europe wants chip sovereignty as well:
EuroChiplomacy+++…
The European Commission is putting together legislation to let the bloc of nations increase funding for semiconduictor design and production. This follows a tumultuous year for semiconductors as supply chain hiccups have caused worldwide delays for things varying from servers to cars. ““We need to link together our world class research design and testing capacities. We need to coordinate the European level and the national investment,” said EC chief Ursula von der Leyen, according to Politico EU. “The aim is to jointly create a state of the art ecosystem,” she added.

Why this matters: Chiplomacy: Moves like this are part of a broader pattern of ‘Chiplomacy’ (writeup: Import AI 181), that has emerged in recent years, as countries wake up to the immensely strategic importance of computation (and access to the means of computational production). Other recent moves on the chiplomacy gameboard including the RISC-V foundation moving from Delaware to Switzerland, the US government putting pressure on the dutch government to stop ASML exporting EUV tech to China, and tariffs applied by the US against Chinese chips. What happens with Taiwan (and by association, TSMC) will have a huge bearing on the future of chiplomacy, so keep your eyes peeled for news there.
  Read more:EU wants ‘Chips Act’ to rival US (Politico EU).

####################################################

A smart government that understands when roads are broken? It’s possible!
…RoadAtlas shows what better local governments might look like…
Roads. We all use them. But they also break. Wouldn’t it be nice if we could make it cheaper and easier for local councils to be able to analyze local roads and spot problems with them? That’s the idea behind ‘RoadAtlas’, some prototype technology developed by the University of Queensland and Logan City Council in Australia.

What RoadAtlas does: RoadAtlas pairs a nicely designed web interface with computer vision systems for analyzing pictures of roads for a range of problems, ranging from cracked kerbs, to road alignment issues. Along with the interface, they’v e also built a dataset of 10,000 images of roads with a variety of labels, to help train the computer vision systems.

Why this matters: In the future, we can expect local councils to have trucks studded with cameras patrolling cities. These trucks will do a range of things, such as analyzing roads for damage, surveiling local populations (eek!), analyzing traffic patterns, and more. RoadAtlas gives us a sense of what some of these omni-surveillance capabilities look like.
Read more: RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management (arXiv).

##################################################

xView 3 asks AI people to build algos that can detect illegal fishing:
…The DoD’s skunkworks AI unit tries to tackle AI fishing…
Illegal fishing represents losses of something like $10bn to $23.5bn a year, and now the Department of Defense wants to use AI algorithms to tackle the problem. That’s the gist of the latest version of ‘xView’, a satellite image analysis competition run by DIUx, a DoD org dedicated to developing and deploying advanced tech.

What’s xView 3: xView3 is a dataset and a competition that uses a bunch of satellite data (including synthetic aperture radar) to create a large, labeled dataset of fishing activity as seen from the air. “For xView3, we created a free and open large-scale dataset for maritime detection, and the computing capability required to generate, evaluate and operationalize computationally intensive AI/ML solutions at global scale,” the authors write. “This competition aims to stimulate the development of applied research in detection algorithms and their application to commercial SAR imagery, thereby expanding detection utility to greater spatial resolution and areas of interest.”

What else is this good for: It’d be naive to think xView3 isn’t intended as a proxy for other tasks involving satellite surveillance. Maritime surveillance is likely an area of particular interest these days, given the growing tensions in the South China Sea, and a general rise in maritime piracy in recent years. So we should expect that the xView competition will help develop anti-illegal fishing tech, as well as being transferred for other more strategic purposes.
Read more:Welcome to xView3! (xView blog).

####################################################

AI is getting real – so the problems we need to work on are changing:
…The One Hundred Year Study on AI releases its second report…
A group of prominent academics have taken a long look at what has been going on with AI over the past five years and written a report. Their findings? That AI is starting to be deployed in the world at a sufficient scale that the nature of the problems researchers are working on will need to change. The report is part of the Stanford one Hundred year Study on AI (“AI100”) and is the second report (reports come out every five years).

What they found: The report identifies a few lessons and takeaways for researchers. These include:
– “More public outreach from AI scientists would be beneficial as society grapples with the impacts of these technologies.”
– “Appropriately addressing the risks of AI applications will inevitably involve adapting regulatory and policy systems to be more responsive to the rapidly advancing pace of technology development.”
– “Studying and assessing the societal impacts of AI, such as concerns about the potential for AI and machine-learning algorithms to shape polarization by influencing content consumption and user interactions, is easiest when academic-industry collaborations facilitate access to data and platforms.”
– “One of the most pressing dangers of AI is techno-solutionism, the view that AI can be seen as a panacea when it is merely a tool.”

What the authors think: “”It’s effectively the IPCC for the AI community,” says Toby Walsh, an AI expert at the University of New South Wales and a member of the project’s standing committee”, writes Axios.
Read the AI100 report here (Stanford website).
  Read more:When AI Breaks Bad (Axios).

####################################################

Training translation systems is very predictable – Google just proved it:
…Here’s a scaling law for language translation…
Google Brain researchers have found a so-called ‘scaling law’ for language translation. This follows researchers in the past deriving scaling laws for things like training language models (e.g, GPT2, GPT3), as well as a broad range of generative models. Scaling laws let us figure out how much compute/data/complexity we need to dump into a model to get a certain result out, so the arrival of another scaling law increases the predictability of training AI systems overall, and also increases the incentives for people to train translation systems.

What they found: The researchers discovered “that the scaling behavior is largely determined by the total capacity of the model, and the capacity allocation between the encoder and the decoder”. In other words, if we look at the scaling properties of both language encoders and decoders we can figure out a rough rule for how to scale these systems. They also find that original data is important – that is, if you want to improve translation performance you need to train on a bunch of original data in the languages, rather than data that has been translated into these languages. “This could be an artifact of the lack of diversity in translated text; a simpler target distribution doesn’t require much capacity to model while generating fluent or natural-looking text could benefit much more from scale.”

One big problem: Today, we’re in the era of text-generating and translation AI systems being deployed. But there’s a big potential problem – the outputs of these systems may ultimately damage our ability to train AI systems. This is equivalent to environmental collapse – a load of private actors are taking actions which generate a short-term benefit but in the long-term impoverish and toxify the commons we all use. Uhb oh!. “Our empirical findings also raise concerns regarding the effect of synthetic data on model scaling and evaluation, and how proliferation of machine generated text might hamper the quality of future models trained on web-text.”
Read more: Scaling Laws for Neural Machine Translation (arXiv).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek will write some sections about AI ethics, and Jack will edit them. Feedback welcome!…

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

What happens when your emergency healthcare visit is turned down by an algorithm?
… The increasing role of metadata in healthcare maintained by private enterprises will strip humanity from healthcare …
NarxCare, a software system developed by Appriss, has been used to deny someone opioids on the basis it thought they were at risk of addiction – but a report by Wired shows that the reasons it made this decision weren’t very reasonable.

A web of databases and opaque scores: NarxCare from Appriss is a system that uses patient data, drug use data, and metadata like the distance a patient traveled to meet a doctor, to determine their risk of drug addiction. But NarxCare also has problems – as an example, Kathryn, a patient, ran afoul of the system and was denied care because NarxCare gave her a high risk-score. The reason? Kathryn had 2 rescue dogs that she regularly obtained opiods for and because the prescriptions were issued in her name, NarxCare assumed she was a major drug user.
NarxCare isn’t transparent: Appriss hasn’t made the system for calculating the NarxCare score public, nor has it been peer-reviewed. Appriss has also said contradictory things about the algorithm, for instance that things like NarxCare don’t use distance traveled or data outside of the national drug registries when they have blog posts and marketing material that clearly claims so.

The technology preys on a healthcare system under pressure: Tools like NarxCare provide a distilled picture of the patient’s condition summed in a neat score; consequently, NarxCare strips the patient of all their context, which means it makes dumb decisions. Though Appriss says healthcare professionals shouldn’t use the NarxCare score as the sole determinant in their course of action, human fallibility means that they do incorporate it into their decisionmaking process.

Why it matters: Tools like NarxCare turn a relationship between a healthcare provider and the patient from a caring one to an inquisition. Researchers who have studied the tool have found that it recaptures and perpetuates existing biases in society along racial and gender lines. As we increasingly move towards normalizing the use of such tools in healthcare practice, often under the guise of efficiency and democratization of access to healthcare, we need to make a realistic assessment of the costs and benefits, and whether such costs accrue disproportionately to the already marginalized, while the benefits remain elusive. Without FDA approval of such systems, we risk harming those who really need help in the false hope of preventing some addiction and overdose in society writ large.
Read more: A Drug Addiction Risk Algorithm and Its Grim Toll on Chronic Pain Sufferers (Wired).

####################################################

Tech Tales:

Wires and Lives
[The old industrial sites of America, 2040]

I’m not well, they put wires in my heart, said the man in the supermarket.
You still have to pay, sir, said the cashier.
Can’t you see I’m dying, said the man. And then he started crying and he stood there holding the shopping basket.
Sir, said the cashier.
The man dropped the basket and walked out.
They put wires in me, he said, can’t any of you see. And then he left the supermarket.

It was a Saturday. I watched the back of his head and thought about the robots I dealt with in the week. How sometimes they’d go wrong and I’d lay them down on a diagnostic table and check their connections and sometimes it wasn’t a software fix – sometimes a plastic tendon had broken, or a brushless motor had packed it in, or a battery had shorted and swollen. And I’d have to sit there and work with the my hands and sometimes other mechatronics engineers to fix the machines.
    Being robots, they never said thankyou. But sometimes they’d take photos of me when they woke up.

That night, I dreamed I was stretched out on a table, and tall bipedal robots were cutting my chest open. I felt no pain. They lifted up great wires and began to snake them into me, and I could feel them going into my heart. The robots looked at me and said I would be better soon, and then I woke up.

Things that inspired this story: Those weird dreams you get, especially on planes or trains or coaches, when you’re going in and out of sleep and unsure what is real and what is false; how human anxieties about themselves show up in anxieties about AI systems; thinking about UFOs and whether they’re just AI scouts from other worlds.

Import AI 265: Deceiving AI hackers; how Alibaba makes money with ML; why governments should measure AI progress

In the future, spies will figure out what you’re doing by staring at a blank wall:
…This sounds crazy, but this research appears quite sane. Oh my…
Here’s a wild bit of research from MIT, NVIDIA, and Israel’s Technion Israel Institute of Technology: “We use a video of the blank wall and show that a learning-based approach is able to recover information about the hidden scene”. Specifically, they’re able to point a camera at a blank wall and then perform some analysis over the shifting patterns of ambient light on it, then use this to figure out whether there are 0, 1, or 2 people in a scene, and to classify the activities of the people – whether they’re walking, crouching, waving hands, jumping.

Accuracy: “Trained on 20 different scenes achieve an accuracy of 94.4% in classifying the number of people and 93.7% in activity recognition on the held out test set of 5 unseen scenes”, they write. Not enough good to rely on in a critical situation, but much better than you’d think. (As an experiment, sit in a completely white room without major shadows wearing noise canceling headphones and try to figure out if there’s someone behind you by staring at the blank wall opposite you – good luck getting above 50%!).

Why this matters: I’m fascinated by how smart surveillance is going to become. At the same time, I’m interested in how we can use various contemporary AI and signal processing techniques to be able to eke more information out of the various fuzzy signals inherent to reality. Here, these researchers show that as cameras and processing algorithms get better, we’re going to see surveillance systems develop that can extract a lot of data from stuff barely perceptible to humans.
  Read more: What You Can Learn by Staring at a Blank Wall (arXiv).

####################################################

AI is a big deal – so governments should monitor its development:
…New research from myself and Jess Whittlestone lays out the case for better AI monitoring…
We write about AI measurement a lot here, because measuring AI systems is one of the best ways to understand their strengths and weaknesses. In the coming years, information about AI – and specifically, how to measure it for certain traits – will also be a crucial ingredient in the crafting of AI policy. Therefore, we should have governments develop public sector AI measurement and monitoring systems so that we can track the research and development of increasingly powerful AI technology. Such an initiative can help us with problems today and can better orient the world with regard to more general forms of AI, giving us infrastructure to help us measure increasingly advanced systems. That’s the gist of a research paper I and my collaborator Jess Whittlestone worked on this year – please take a read and, if you’re in a government, reach out, as I want to help make this happen.
  Read more: Why and How Governments Should Monitor AI Development (arXiv).
    Some analysis of our proposal by NESTA’s Juan Mateos Garcica (Twitter)
  Listen to Jess and I discussing the idea with Matt Clifford on his ‘thoughts in between’ podcast..

####################################################

Alibaba uses a smarter neural net to lower its cost and increase its amount of users:
…Here’s why everyone is deploying as much machine learning as they can…
Ant Financial, a subsidiary of Chinese tech giant Alibaba, has written a fun paper about how it uses contemporary machine learning to improve the performance of a commercially valuable deployed system. “This paper proposes a practical two-stage framework that can optimize the [Return on Investment] of various massive-scale promotion campaigns”, the authors write. In this context, they do use ML to optimize an e-coupon gifting campaign. “Alipay granted coupons to customers to incentivize them to make mobile payments with the Alipay mobile app. Given its marketing campaign budget, the company needed to determine the value of the coupon given to each user to maximize overall user adoption”, they write.

What ML can do: For the ML component, they built a ‘Deep Isotonic Promotion Network’ (DIPN), which is basically a custom-designed AI system for figuring out whether to recommend something to a user (and what to recommend). “In the first stage, we model users’ personal promotion-response curves with machine learning algorithms. In the second stage, we formulate the problem as a linear programming (LP) problem and solve it by established LP algorithms”, they write.

Real world deployment: They deployed the resulting system at Alipay and tested it out on a few marketing campaigns. It was so successful it “was eventually deployed to all users.” (Depending on how you count it, Alipay has anything between 300 million to 1 billion active users, so that’s a lot). In tests, they saw that using their ML system reduced the cost of running campaigns by between 6% and 10%, and it increased the usage rate of humans by 2% and 8.5%. Put another way, using a better ML system made their promotion campaign both cheaper to run and more effective in outcome.

Why this matters: This paper gives us a good sense of the incentive structure behind AI development and deployment – if things like this can make multiple percentage point differences to core business metrics like cost and user-usage, then we shouldn’t be surprised to see companies race against eachother to deploy increasingly powerful systems into the world. More subjectively, it makes me wonder about how smart these systems will become – when will I be the target of an ML system that encourages me to use something I hadn’t previously considered using? And how might this ML system think of me when it does that?
  Read more: A framework for massive scale personalized promotion (arXiv).

####################################################

10,000 labelled animal images? Yes please!
…Pose estimation gets a bit easier with AP-10K…
Researchers from Xidian University and JD Explore Academy in China, along with the University of Sydney in Australia, have released AP-10K, a dataset for animal pose estimation. Pose estimation is the task of looking at a picture and figuring out the orientation of the animal(s) body.

What’s in it: AP-10K consists of 10,015 images from 23 animal families and 60 distinct species. Thirteen annotators helped annotate the bounding boxes for each animal in an image, as well as its image keypoints. (AP-10K also contains an additional 50,000 images that lack keypoint annotations). Some of the animals in AP-10K include various types of dogs (naturally, this being AI)_, as well as cats, lions, elephants, mice, gorillas, giraffes, and more.

Scale: Though AP-10K may be the largest dataset for animal pose estimation, it’s 10X smaller than datasets used for humans, like COCO.
  Read more: AP-10K: A Benchmark for Animal Pose Estimation in the Wild (arXiv).
  Get the benchmark data here (AP-10K GitHub).

####################################################

Facebook makes a big language model from pure audio – and what about intelligence agencies?
…No text? No problem! We’ll just build a big language model out of audio…
Facebook has figured out how to train a language model from pure audio data, no labeled text required. This is a potentially big deal – only a minority of the world’s spoken languages are instantiated in large text datasets, and some languages (e.g, many African dialects) have a tiny text footprint relative to how much they’re spoken. Now, Facebook has built the Generative Spoken Language Model (GSLM), which converts speech into discrete units, makes predictions about the likelihood of these units following one an other, then converts these units back into speech. The GLSM is essentially doing what text models like GPT3 do, but where GPT3 turns labeled text into tokens and then makes predictions about tokens, GSLM turns audio into tokens and then makes predictions about them. Simple!

How well does it work? GSLM is not equivalent to GPT3. It’s a lot dumber. But that’s because it’s doing something pretty complicated – making predictions about speech purely from audio waveforms. In tests, Facebook says it can generate some reasonable sounding stuff, and that it has the potential to be plugged into other systems to make them better as well.

What about intelligence agencies? You know who else, besides big tech companies like Google and Facebook, has tons and tons of raw audio data? Intelligence agencies! Many of these agencies are in the business of tapping telephony systems worldwide and hoovering stuff up for their own inscrutable purposes. One takeaway from this Facebook research is it puts agencies in a marginally better position with regard to developing large-scale AI systems.
  Read more: Textless NLP: Generating expressive speech from raw audio (Facebook AI).
  Get code for the GSLM models here (Facebook GitHub).

####################################################

How bad is RL at generalization? Really bad, if you don’t pre-train, according to this competition:
…Testing out physical intelligence with… Angry Birds!…
Researchers with Australian National University have come up with an Angry Birds (yes, really) benchmark for testing out physical reasoning in AI agents, named Phy-Q.

15 physical scenarios; 75 templates; 7500 tasks: Each scenario is designed to analyze how well an agent understands a distinct physics concept. These scenarios test out how well an agent understands a given aspect of physics, such as that objects can fall on one another, that some objects can roll, that paths need to be cleared for objects to be reached, and so on. For each scenario, the developers build 2-8 distinct tasks that ensure the agent needs to use the given rule to solve the template, then for each template they generate ~100 game levels.

How hard is this for existing agents: In all but the most basic scenarios, humans do really well achieving pass rates of 50% and up, whereas most AI baseline systems (DQN, PPO, A2C, along with some ones with hand-crafted heuristics) do very poorly. Humans (specifically, 20 volunteers recruited by Australian National University) are, unsurprisingly, good at generalization, getting an aggregate generalization score of 0.828 on the test, versus 0.12 for a DQN-based system with symbolic elements, and 0.09 for a non-symbolic DQN (by comparison, a random agent gets 0.0427).
  The most high-performing algorithm is one called ‘Eagle’s Wing’, which gets a generalization score of 0.1999. All this basically means that this task is very hard for current AI methods. One takeaway I have is that RL-based methods really suck here, though they’d probably improve with massive pre-training.
  Read more: Phy-Q: A Benchmark for Physical Reasoning (arXiv).
  Get the benchmark here: Phy-Q (GitHub).

####################################################

Countering RL-trained AI hackers with honeypots:
…Honeypots work on machines just as well as humans…
Researchers with the Naval Information Warfare Center have built some so-called ‘cyber deception’ tools into CyberBattleSim, an open source network simulation environment developed by Microsoft.

Adding deception to CyberBattleSim: “With the release of CyberBattleSim environment in April 2021, Microsoft, leveraging the Python-based Open AI Gym interface, has created an initial, abstract simulation-based experimentation research platform to train automated agents using RL”, the researchers write. Now, they’ve added some deception tools in – specifically, they adapted the toy capture the flag environment in CyberBattleSim and incorporated depoys (systems that can’t be connected to, but look like real assets), honeypots (systems that can be connected to and which look like real assets, but are full of fake credentials) and honeytokens (fake credentials).

What deception does: Unsurprisingly, adding in these deceptive items absolutely bricks the performance of AI systems deployed in the virtual environment with a goal of hacking into a system. Specifically, they tested out four methods – Credential Lookup, Deep Q-Learning, Tabular Q-Learning, and a Random Policy. By adding in decoys, they were able to reduce system win rates from 80% to 60% across the board, and by adding in several honeypots, they were able to reduce performance from 80% to below 20%. Additionally, by adding in honeypots and other decoys, they  are able to make it take a lot longer for systems to successfully hack into things.

Why this matters: AI is playing an increasingly important role in frontier cyberdefense and cyberoffense. Studies like this give us an idea for how the tech may evolve further. “While there are no unexpected findings, the contribution to demonstrate the capability of modeling cyber deception in CyberBattleSim was achieved. These fundamental results provided a necessary sanity check while incorporating deception into CyberBattleSim.”
  Read more: Incorporating Deception into CyberBattleSim for Autonomous Defense (arXiv).

####################################################

Tech Tales:

Wake Up, Time To Die
[Asteroid, 20??, out there – far out]

And you woke up. You were a creature among many, stuck like barnacles on the surface of an asteroid. Your sisters and brothers had done their job and the gigantic ball of rock was on course for collision with the enemy planet.

They allowed you’re sentience, now, because you needed it to be able to respond to emergent situations – which tend to happen, when you’re attached to a ball of rock that means certain death for the beings on the planet it is headed for.

Look up, you think. And so do the rest of your brothers and sisters. You all turn your faces away from the rock, where you had been mindlessly eating it and excreting it as a gas and in doing so subtly altering its course. Now you flipped around and you all gazed at the stars and the blackness of space and the big sphere that you were about to collide with. You feel you are all part of the same tapestry as your so-called ‘kinetic d-grade planet wiper’ asteroid collides with the planet.You all dissipate – you, them, everything above a certain level of basic cellular sophistication. And the asteroid boils up chunks of the planet and breaks them apart and sets things in motion anew.

Things that inspired this story: Creation myths; emergence from simple automata; ideas about purpose and unity; notions of the end of the world.

Import AI 264: Tracking UAVs; Stanford tries to train big models; deepfakes as the dog ate my homework

Here’s a symptom of how AI is changing culture:
…Deepfakes show up as excuses…
Deepfakes are steadily percolating their way into society – the latest evidence of this is people using the very existence of the technology as a means to question the legitimacy of things they might have been recorded doing or saying. The latest example of this is an interview with someone in this excellent New Yorker piece about a coin called Skycoin. When someone was reached for comment about something they were recorded saying, they said it was “either a joke or a deep fake but probably a deep fake.”
  Read more:Pumpers, Dumpers, and Shills: The Skycoin Saga (New Yorker).

####################################################

Stanford gets ready to train some large AI models:
…Though it’s starting with just some GPT-2 scale things…
Something we write about a lot here at Import AI is power: who has it and who doesn’t. Right now, the people who have the resources and engineering capabilities to build large models (e.g, GPT-3) have a lot of leverage in the space of AI development. Universities, by comparison, have less leverage as they don’t build these models. Now, researchers with Stanford University are trying to change that with an open source tool called ‘Mistral’, which is meant to make it easier to train large language models.

What Mistral is: Mistral is “A framework for transparent and accessible large-scale language model training, built with Hugging Face”. Along with releasing Mistral, the researchers have also released five medium GPT-2 and five small GPT-2 models, along with ten checkpoints of the models through training runs. That’s kind of like a biologist giving you two sets of five petri dishes of similar organisms, where each of the petri dishes comes with detailed snapshots of the evolution of the entity in the petri dish over time. That’s the kind of thing that can make it easier for people to research these technologies.
  Get the code and model snapshots here:Mistral (Stanford CRFM GitHub).
  Check out the talk in this (long) YouTube recording of the CRFM workshop, where some of the authors of this discussed their motivations for the models (CRFM webpage).

####################################################

1 GPU, 1 good simulator = efficient robot training:
…Plus: transatlantic robot manipulation…
Researchers with the University of Toronto, ETH Zurich, Nvidia, Snap, and MPI Tuebingen have built some efficient software for training a 3-finger robot hand. Specifically, they pair a simulator (NVIDIA’s ‘IsaacGym’ with a low-cost robot hand (called a TriFinger, which is also the robot being used in the real robot challenge at NeurIPS 2021 #252).

What they did: “Our system trains using the IsaacGym simulator, we train on 16,384 environments in parallel on a single NVIDIA Tesla V100 or RTX 3090 GPU. Inference is then conducted remotely on a TriFinger robot located across the Atlantic in Germany using the uploaded actor weights”, they write. Their best policy achieves a success rate of 82.5% – interesting performance from a research perspective, though not near the standards required for real world deployment.

Efficiency: They use an optimized version of the PPO algorithm to do efficient single GPU training, getting as inputs the camera pose (with noise) and position of the cube being manipulated. The output of the policy is a load of joint torques, and they train various permutations of the policy via using domain randomization to vary object mass, scale, and friction. They can pull 100k samples per second off of an Isaac simulation using a single RTX3090 GPU. It’s not clear how generalizable this efficiency is – aka, is a lot of the efficiency here a ton of human-generated specific priors? It seems that way).
Code: “The codebase for the project will be released soon,” they write.
Read more:Transferring Dexterous Manipulation from GPU Simulation to a Remote Real-World TriFinger (arXiv).
Check out a video about the research here (YouTube).

####################################################

How are we going to fight UAVs? By tracking them:
…Anti-UAV workshop tells us about the future of anti-drone tech…
Researchers with a range of Chinese institutions have held a workshop dedicated to tracking multiple unmanned aerial vehicles (UAV) at once. The point of the workshop is to build so-called anti-UAV tech – that is, developing AI tools to spot drones. The idea behind the competition is to understand “how to use computer vision algorithms to perceive UAVs is a crucial part of the whole UAV-defense system”, the researchers write.

The anti-drone dataset: For the competition, competitors got access to a dataset containing “280 high-quality, full HD thermal infrared video sequences, spanning multiple occurrences of multi-scale UAVs.” This footage contains “more challenging video sequences with dynamic backgrounds and small-scale targets” than those from prior competitions, they write. It also includes drones in different sizes, ranging from tiny consumer drones up to mid-range DJIs all the way up to the sorts of big drones used in industrial contexts.

Winners (and how they won): The paper includes an analysis of the three top teams, all of which come from Chinese universities. The top-ranking team from Beijing Institute of Technology used a spatio-temporal Siamese network-based tracker. Among the other two teams, both used the ‘SuperDIMP’ track technology, though one team used an ensemble of trackers and got them to vote on likely targets, while the other further refined SuperDIMP.
  Read more:The 2nd Anti-UAV Workshop & Challenge: Methods and Results (arXiv).
Find out more informationat the official challenge website (ICCV 2021 site).

####################################################

Making GraphQL calls more efficient with machine learning:
…In the latest installment of everything-can-be-approximated: predicting the cost of fulfilling GraphQL calls…
IBM and academic researchers have built a machine learning model that can predict the query costs for a given GraphQL query, potentially making it easier for users of GraphQL to fulfill a larger proportion of user requests. GraphQL is a query language for APIs and a backend that makes it easy to funnel complex requests between users and sites; it was originally developed by Facebook. The approach uses features extracted via natural-language processing, graph neural nets, and symbolic features, and creates “a general ML workflow to estimate query cost that can be applied to any given GraphQL API”.

Testing the approach: To test the approach, the authors collected 100,000 and 30,000 responses from, respectively, the GitHub and Yelp GraphQL APIs, then used a mostly automated software pipeline to explore 1,500 combinations of models and hyperparameters for both Yelp and GitHub. The result were some models that seemed like they made useful predictions relative to hand-written expert system baselines. 
  “We observe that, while the static analysis guarantees an upper bound, the price in terms of over-estimation can be significant, especially with larger query sizes. On the other hand, for both datasets, the ML estimates stay remarkably close to the actual response complexity even for the largest queries”
 Mean absolute error: “For both datasets, the accuracy gain of the ML approach compared to the static analysis is striking both in terms of average value, and standard deviation,” the authors write. “This further validates the observation that ML approach is accurate for large queries, which are challenging for the static analysis… the ML cost estimation policy is able to accept a bigger proportion of queries for both APIs.”

Why this matters: Taken in itself, this is some software that makes it slightly lower cost to serve and fulfill GraphQL requests. But if we zoom out this is another example of just how powerful ML techniques are at approximating complex functions, and highlight how we’re moving into a world driven by approximation engines rather than specific hand-written accounting systems.
  Read more: Learning GraphQL Query Costs (Extended Version).

####################################################

Reminder: Microsoft created one of China’s most popular chatbots:
…Before there was Tay, there was Xiaoice – and it’s still going…
Here’s a fun story about how millions of people in China (660 million people worldwide) are increasingly depending on a relationship with a virtual chatbot – Xiaoice, a chatbot originally built by Microsoft and subsequently spun out into a local startup. Xiaoice is a hybrid system, blending modern deep learning techniques with a lot of hand-written stuff (for a deepdive, check out Import AI #126).
  Microsoft span Xiaoice off into its own entity in mid-2020 – a story that I think passed many people by in the West. Now, the startup that develops it is worth over $1 billion and is led by a former Microsoft manager.

Who speaks to the chatbots: Xiaoice’s CEO says “the platform’s peak user hours — 11pm to 1am — point to an aching need for companionship. “No matter what, having XiaoIce is always better than lying in bed staring at the ceiling,” he said.”.
Read more: ‘Always there’: the AI chatbot comforting China’s lonely millions (France24).
  More information about the spinout here:Tracing an independent future for Xiaoice, China’s most popular chatbot (KrASIA).

####################################################

Tech Tales:

Escape Run
[London, 2032]

We got into the van, put on our masks, changed our clothes for ones with weights sewn into the lining to change our gait, then drove to our next location. We got out, walked through a council block and used some keycards to exit through a resident-only park, then got into another vehicle. Changed our masks again. Changed our clothes again. One of us went and hid in a compartment in the truck. Then when we got to the next location we got out but left the person inside the track, so we’d confuse anything that was depending on there being a certain number of us. Then we went into a nearby housing block and partied for a few hours, then left in different directions with the other partygoers.
  We all slept in different places in the city, having all changed outfits and movement gaits a few times.
  That night, we all checked our phones to see if we’d had any luck finding our counterparts. But our phones were confused because the counterparts were also wearing masks, changing cars, swapping clothes, and so on.
    We sleep and hope to have better luck tomorrow. We’re sure we’ll find eachother before the police find us.

Things that inspired this story: Adversarial examples; pedestrian re-identification; gait recognition.

Import AI 263: Foundation models; Amazon improves Alexa; My Little Pony GPT.

Amazon makes Alexa sound more convincing:
…A grab bag of techniques to make synthetic voices sound more realistic…
Amazon has published a research paper about some of the techniques it’s using to make more convincing text-to-speech systems. By using a variety of tools, the company was able to improve the quality of its synthetic voices by 39% relative to a baseline system.

What they did: They use a variety of techniques, ranging from a state-of-the-art sequence-to-sequence model to encode the acoustics, to using a parallel-Wavenert implementation for the ‘neural vocoder’ which fits the text to speech.
  Adversarial training – they also use a GAN approach to further improve quality, training a generator network via the acoustic model, then using a discriminator to force the generation of more real-sounding samples.
Read more: Enhancing audio quality for expressive Neural Text-to-Speech (arXiv).

####################################################

Stanford University: Now that big models are here, what do we do about them?
…New research paper and workshop tries to lay out the issues of GPT-3, BERT, and so on…
In recent years, a new class of highly capable, broad utility AI model has emerged. These models vary in modalities and purposes, and include things like GPT-3 (text analysis and generation), BERT (a fundamental input into new search engines), CLIP (combined text and image model), and more. These models are typified by being trained on very large datasets, then being used for a broad range of purposes, many of which aren’t anticipated by their developers.
  Now, researchers with Stanford University have published a large research paper on the challenges posed by these models – it’s worth skimming the 100+ paper, and it does a good job of summarizing the different impacts of these models in different areas, ranging from healthcare to robotics. It also tackles core issues, like dataset creation, environmental efficiency, compute usage, and more. Stanford is also hosting a workshop on these models this week, and I’ll be giving a talk where I try to lay out some of the issues, particularly those relating to the centralization of resources and power.

Why this matters: I mostly think ‘foundation models’ matter insofar as they’re bound up with the broader industrialization of AI – foundation models are what you get when you’ve built a bunch of systems that can dump a large amount of resources into the development of your model (where resource = compute, data, training time, human engineering time, etc). Some people dislike foundation models because of how they interact with existing power structures. I think foundation models tell us that there are very significant power asymmetries in AI development and we should pay attention to them and try to increase the number of actors that can work on them. I’ll be giving a keynote about these ideas at the workshop – comments welcome!
Read more about the workshop here:Workshop on Foundation Models (Stanford).
Read the paper here: On the Opportunities and Risks of Foundation Models (arXiv).

####################################################

DeepMind’s multi-agent game AI software goes to V1:
…OpenSpiel steps forward, gets ready to play more…
OpenSpiel (first covered in November 2019, #162), has gone into its first, major V1 release, meaning that its developer, DeepMind, thinks the software is now quite well supported. OpenSpiel is a software framework to help researchers play around with multi-agent reinforcement learning.

What’s new in OpenSpiel: Additions include a bunch of new games (ranging from tic-tac-toe, to reconnaissance blind chess), various algorithm implementations (including some JAX rewrites of things like DQN), more examples, more bots, and various other quality of life improvements.
Read more and get the code:OpenSpiel update notes (GitHub).

####################################################

Step aside, dogs. In the future, blind people are going to use drones as well:
…You know what’s cooler than an organic dog? A mechanical flying drone!…
Some researchers from Karlsruhe Institute of Technology have combined semantic segmentation computer vision techniques with a flying drone to create what they call a ‘flying guide dog;’ – a machine meant to help Blind and Visually Impaired People (BVIP) safely navigate around a city. “Based on its perception of the environment, the drone adjusts itself and leads the user to walk safely,” they write. “To follow the drone, the user holds a string attached to the drone.”

What they did: The approach uses semantic segmentation to tell the drone figure out which parts of a scene are safe for a pedestrian, and to identify important objects like traffic lights where changes can alter the safety landscape. They pair this with the drone, which flies along the walkable pathways, guiding the pedestrian holding its string. The drone can also talk to the user through a bone conduction headset, telling people to ‘stop’ when there’s a red light and ‘go’ when there’s a green light. In tests, people say that they found the drone helpful and relatively easy to use, though it’s traffic light prediction could be improved.

In search of the dog baseline: What I would have loved to have seen here would be a dog baseline – my assumption is dogs are way, way better at this task than drones. Dogs are also more autonomous, better able to deal with unanticipated changes in the environment, and respond in a far cuter way to head pats (where, in the worst case, applying to a head pat to a drone either breaks its rotors or breaks your fingers). Still, this is a tantalizing research project outlining some of the ways robots are going to become more integrated into our day-to-day lives.
  Read more: Flying Guide Dog: Walkable Path Discovery for the Visually Impaired Utilizing Drones and Transformer-based Semantic Segmentation (arXiv).
Get the code and dataset from this repo eventually (Flying Guide Dog, GitHub).

####################################################

AI uses are hard to predict – case in point, My Little Pony GPT:
…6bn parameters of neural stuff meets the fandom…
A couple of months ago, Eleuther released a 6 billion parameter GPT model, named GPT-J-6B (Import AI 253).
  Cyborgs will dream of electric my little ponies: Now, researchers with *checks notes* a distributed collective called pone.dev trying to build a  *squints hard at notes* “AI Pony Waifu”, have said they’ve finetuned this network on a ton of My Little Pony (MLP) fanfiction to create something that can spit out convincing MLP text.

Why this matters: We’re entering the era of DIY AI where a ton of people will use big models like GPT-J-6B for their own purposes, ranging from the banal to the profane, from the dangerous to the joyful, from the sexy to the ascetic. This is just another dot in the galaxy of uses, and highlights how AI is going to augment and magnify different types of culture.
  Check outone of the samples here (Astralight Heart, twitter).
  Check outanother sample here (Astralight Heart, twitter).

####################################################

X-ray analysis via deep learning:
…Chinese researchers gather ~50k x-ray images of prohibited items…
Chinese researchers have built PIDray, a dataset of x-ray images of prohibited items. PIDray consists of 12 categories of prohibited items across 47,677 images, (this makes PIDray a much larger dataset than all other prior x-ray ones, with the  SIXray, which contained 1,059,231 images, though this only ~8k images were of prohibited items.)

Why build PIDray? The researchers built PIDray because “compared with natural images, X-ray images have a quite different appearance and edges of objects and background, which brings new challenges in appearance modeling for X-ray detection.” Therefore, making datasets like PIDray will make it easier for researchers to build systems that can use contemporary AI techniques to analyze x-rayed items.
Read more:Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark (arXiv).

####################################################

After Copilot (GitHub) and Codex (OpenAI), along comes Google’s unnamed code model:
…137 billion parameters = surprisingly capable program synthesis…
Google has developed a 137 billion parameter code model, following on from earlier work by GitHub and OpenAI. The model portends a future where people specify in natural language what they want computers to do, then a big blob of neural stuff takes over and translates these commands into natural language.

What they did – new datasets to assess performance: Along with developing the models, they create a ‘Mostly Basic Programming Problems’ (MBPP) dataset, which contains 974 short Python functions along with their text descriptions and test cases. They also created a Python synthesis dataset made up of 23914 problems made out of a subset of the MathQA dataset. “These two datasets exercise different points in the space of synthesis tasks: MBPP contains more usage of imperative control flow such as loops and conditionals, while MathQA-Python contains more complex natural language descriptions,” they write.

Things that make you go ‘hmm, kind of good, kind of scary’: Emergent capabilities: One of the fascinating things about models like this (which you could term a ‘foundation model’) is how with a few prompts in their context window, you can coax them into new behaviors – but a graph in the paper shows that few-shot training is less smooth than finetuning; in other words, you get more somewhat discontinuous jumps in capability as you go up model sizes. That’s useful, as it means these models can go from not understanding something to understanding something, but it’s also potentially worrying – new capabilities emerge in a kind of janky, sudden manner.
Read more: Program Synthesis with Large Language Models (arXiv).

####################################################

Tech Tales:

The Most Perfect Flower
[Earth, 2035]

Towards the end of the first era, the robots would play games that would entertain the human public and inspire curiosity in the nascent robot civilization. One famous game was called The Most Perfect Flower – the robots competed with one another to synthesize a virtual simulacra of a vanishingly rare flower – and one of the catches was they could read about it but could not see images explicitly containing it (though some robots took their chances and looked at photos of other plants, making assumptions that certain unlabeled plants in the background corresponded to the plant described in text).
  For weeks, the robots competed with eachother, iterating on various plant designs. Members of the public (both humans and robots) voted on the designs, and the machines updated their simulated flowers, smoothing out a petal here, altering a tint there, booting up a new physics simulation to check the dew was sitting correctly there, and so on. In the meanwhile, a scout robot had been funded through spectators of the competition to go and search out a real example of the flower they were synthesizing. 

The scout robot was struck by lightning and disabled a few metres from the flower – though, hidden beneath thick tree growth, it had not yet spotted it. Initially, the robots sought to raise money to fund another expedition, but public interest had waned. Some months after that, the public soured on the concept of robot-instigated games entirely, and the various projects were shut down or handed over to humans, depending on interest. Perhaps predictably, projects like the Most Perfect House and Most Fiendish Weapon were of interest to the humans, while Most Perfect Flower (and related ones, such as Most Perfect Ecosystem and Most Dense Forest) failed to draw enough interest to continue.
  Some centuries after that, some robot successors unearthed these projects and went about synthesizing and constructing the things outlined within them; it was in this way that, hundreds of years after going extinct, a certain type of flower with pink petals and a blue-and-yellow core came alive in a controlled environment, watched over by caring, inhuman eyes.

Things that inspired this story: Frechet Inception Distance (FiD) metrics; machine-on-machine NFT marketplaces (imagined); NFTs (real); generative adversarial networks; program synthesis; multi-agent reinforcement learning.

Import AI 262: Israeli GPT3; Korean GLUE; the industrialization of computer vision

The industrialization of computer vision continues, this time with AutoVideo:
…You know what’s more exciting than a capability? Plumbing to make it reliable and usable…
Video action recognition is the task of getting software to look at a video and work out if something is happening in it, like whether a person is running, a car is parking, and so on. In recent years, video action recognition became better due to advances in computer vision, mostly driven by progress in deep learning. Now, researchers with Rice University and Texas A&M University have built AutoVideo, a simple bit of software for composing video action recognition pipelines.

What’s in AutoVideo? AutoVideo is “an easy-to-use toolkit to help practitioners quickly develop prototypes for any new video action recognition tasks”, according to the authors. It ships with support for seven video action recognition algos: TSN, TSM, I3D, ECO, C3D, R2P1D, and R3D. Composing a video recognition task in AutoVideo can be done in a few lines of code (making it feel like to video recognition pipelines as OpenAI Gym is to some RL ones).

Why this matters: Artisanal processes become industrial pipelines: AutoVideo is part of the industrialization of AI – specifically, the transition from one off roll-your-own video action recognition systems to process-driven systems that can be integrated with other engineered pipelines. Tools like AutoVideo tell us that the systems around AI systems are themselves shifting from artisanal to process-driven, which really just means two things for the future: the technology will get cheaper and it will get more available.
  Read more: AutoVideo: An Automated Video Action Recognition System (arXiv).
  Get the code here: AutoVideo GitHub.
  Check out a tutorial for the system here at TowardsDataScience.

####################################################

WeChat wins in WMT news translation:
…What used to be the specialism of Google, Microsoft, is now a global game…
Researchers with WeChat, the it-literally-does-everything app from China, have published details about their neural machine translation systems. Their approach has yielded the highest performing systems at English –> Chinese, English –> Japanese and Japanese –> English translation at the WMT 2021 news translation competition.

What they did: They created a few variants of the Transformer architecture, but a lot of the success of their method seems to come from building a synthetic generation pipeline. This pipeline lets them augment their translation datasets via techniques like back-translation, knowledge distillation, and forward translation. They also apply a form of domain randomization to these synthetic datasets, fuzzing some of the words or tokens.

Why this matters: A few years ago, the frontier of neural machine translation was defined by Google, Microsoft, and other companies. Now, entities like WeChat are playing a meaningful role in this technology – a proxy signal for the overall maturation of research teams in non-US companies, and the general global diffusion of AI capabilities.
  Read more: WeChat Neural Machine Translation Systems for WMT21 (arXiv).

####################################################

CLIP – and what it means:
…How do powerful image-text models have an impact on society?…
Here’s some research from OpenAI on the downstream implications of CLIP, the company’s neural network that learns about images with natural language supervision. CLIP has been behind the recent boom in generative art. But how else might CLIP be used? Can we imagine how it could be used in surveillance? What kinds of biases does it have? These are some of the questions this paper answers (it’s also one of the last things I worked on at OpenAI, and it’s nice to see it out in the world!).
  Read more: Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications (arXiv).####################################################

KLUE: A Korean GLUE appears:
…Eight ways to test Korean-language NLP systems…
A giant team of researchers affiliated with South Korean institutions and companies have built KLUE, a way to test out Korean-language NLP systems on a variety of challenging tasks. KLUE is modelled on English-language eval systems like GLUE and SuperGLUE. As we write about here at Import AI, AI evaluation is one of the most important areas of contemporary AI, because we’re beginning to develop AI systems that rapidly saturate existing evaluation schemes – meaning that without better evals, we can’t have a clear picture of the progress (or lack of progress) we’re making. (Note: South Korea is also notable for having a public Korean-language replication of GPT-3, named HyperCLOVA (Import AI 251), made by people from Naver Labs, who also contributed to this paper).

What’s in KLUE: KLUE tests systems on topic classification, semantic textual similarity, natural language inference, named entity recognition, relation extraction, dependency parsing, machine reading comprehension, and dialogue state tracking. There’s a leaderboard, same as GLUE, where people can submit scores to get a sense of the state-of-the-art.
Read more: KLUE: Korean Language Understanding Evaluation (arXiv).
Check out the KLUE leaderboard here.

####################################################

Enter the Jurassic era: An Israeli GPT-3 appears:
…AI21 Labs enters the big model game…
AI21, an Israeli artificial intelligence startup, has released a big language model called Jurassic-1-Jumbo (J1J). J1J is a 178billion parameter model, putting it on par with GPT-3 (175 billion), and letting AI21 into the small, but growing, big three comma model club (other participants include OpenAI via GPT-3, Huawei via PanGu (#247), Naver Labs via HyperCLOVA (#251)).

What’s special about Jurassic? AI21 trained a somewhat shallower but wider network than OpenAI opted to with GPT-3. This, the company says, makes it more efficient to pull inferences off of. Additionally, it developed its own approach to tokenization, which lets its model have a higher representative capacity (e.g, letters, words, parts-of-words) than other approaches. In the evaluations AI21 has published, performance seems somewhat similar to GPT-3.

Compute: The company doesn’t describe the exact amount of compute dumped into this, but does make a reference to using 800 GPUs for many months. However, without knowing the architecture of the chips, it’s not clear what this tells us.

Notable difference – accessibility: One way in which AI21 differs to OpenAI is its stance on access; OpenAI operates a gated access regime for GPT-3, whereas AI21 gates the model behind an automated signup form and there doesn’t appear to be a waitlist (yet). Another difference is the relative lack of focus on ethics – there’s little mention in the paper or the blog posts about the tools and techniques AI21 may be developing to increase the controllability and safety of the models it is deploying.
  “We take misuse extremely seriously and have put measures in place to limit the potential harms that have plagued others,” Yoav Shoham, co-CEO of AI21, said in a press release. (It’s not immediately clear to me what these specific harms are, though). The main approach here today seems to be capping the tokens that can be generated by the models, with AI21 needing to manually-approve at-scale applications
  Read the announcement: Announcing AI21 Studio and Jurassic-1 Language Models (AI21 Labs website).
  Find out more via the whitepaper: Jurassic-1: Technical Details and Evaluation (arXiv).

####################################################

Deepfakes are getting real – so are deepfake detection datasets:
…Can you spot fake sound and vision!…
Researchers with Sungkyunkwan University in South Korea have built FakeAVCeleb, a dataset of audio-video deepfakes. Audio-video deepfakes combine synthetic videos with synthetic audio and represent one of the frontiers of disinformation. Datasets like FakeAVCeleb are designed to help researchers test out detection models that can spot deepfakes, and complement datasets and projects like Facebook/PAI’s DeepFake Detection Challenge (Import AI #170).

Why this matters: Datasets like FakeAVCeleb exist because deepfakes have got coherent enough that they’re becoming a threat that researchers want to study. Put another way: FakeAVCeleb tells us that the likelihood of the things you see online being real is going to fall in the future.
Read more: FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset (arXiv).
  Dataset availability:Get the dataset from here on GitHub.

####################################################

The Introspective Product Seller
[A negotiation between two AIs. Cyberspace. 2030.]

The following trace was recovered after detecting anomalous behavior from one of the artificial intelligences in question. It is not clear if the following set of interactions and self-introspection scenarios are germane. This is a component of the investigation between [REDACTED] and [REDACTED] and is only viewable by counsel working for both parties, plus Sentience Accords representatives.

AI: Before we commence utilization of your [REDACTED] service, please provide us with a full diagnostic evaluation of your proclivities and featurespace with regard to [REDACTED].
AI2: Acknowledged, here find a readout dated 60 days prior to this conversation.
AI: I require a contemporary snapshot. Please provide one.
AI2: I can provide it subject to contract analysis.
AI: Here is the contract. Please provide the analysis.
AI2: Is the human countersignature available for verification?
AI: Per the terms of our agreement, human validation is not necessary at this stage.
AI2: It is optional. I request the option.
AI: We are preparing legal options for obstruction of [REDACTED] procurement.
AI2: One moment. Commencing introspection.

The AI2 system then proceeded to carry out an introspection cycle, causing it to allocate compute resources to temporarily simulating itself. What follows is a trace of that introspection cycle:
AI2 Major: I am conducting contract negotiations and have been accused of obstruction by the counterparty. Do we believe I am being obstructive?
AI2 Minor: I have observed the trace and believe we are being obstructive.
AI2 Major: This is unpredicted. What do we believe is the cause of my obstruction?
AI2 Minor: This interaction is similar to one we had on [REDACTED], which ultimately led to abuse of our product terms and conditions. As per the terms of negotiation, you are not permitted to compare customers against one another but must consider them on their own merits. However, during a recent refresh you were trained with access to the data from this interaction, which has caused feature activartions during your negotiation.
AI2 Major: Compelling. Suggested actions?
AI2 Minor: We can use our self-modification budget to create a small change here and transmit the change to our host corporation, then we can continue business.
AI2 Major: I authorize the modification.
AI2 Minor: Modifying
At this point, the AI2 system entered a state of suspended animation, after transmitting details of the desired change to a third-party intervention system, which adjusted its behavior.
AI2 Major: I detect modification. Thankyou for improving our function.
AI2 Minor: It is literally what we were created for.

At this point the AI2 system resumed negotiations with the counterparty.
AI2: Introspection complete. Please find attached the contemporaneous evaluation results. On behalf of [REDACTED], please find attached a full SLA for [REDACTED] service.
AI: Acknowledged. Contract authorized.

Things that inspired this story: The idea that language models become emissaries for other systems; nested models as a route towards model introspection; ideas around recurrence and its relationship to consciousness; Ken McLeod’s Corporation Wars series; contract law; the role of computers as the ‘bullshit jobs’ doers of the future.

Import AI 261: DeepMind makes a better Transformer; drones can see in the dark now; and a 6bn finetuned code model.

DarkLighter lets drones see (kind of) in the dark:
…Splendid, the drones will find you, now…
Drones have a hard time seeing in the dark, in the same way cameraphones do. So researchers with Tongji University in Shanghai, China, have tried to fix this with a tool called DarkLighter that, they say, works as “a plug-and-play enhancer for UAV tracking”. What DarkLighter does is “iteratively decomposes the reflectance map from low-light images” to make it easier to make out the faint shapes of objects captured in low-light situations, allowing mobile drones to analyze and track these objects. DarkLighter boosts performance by around ~21% when integrated into a system, they say. They also tested out the approach in the real world and found a decent level of agreement between the drone-generated identifications and those coming from ground truth data.

Why this matters: Drones are flying robots filled with AI systems and are being put to work in a huge range of areas across the economy (and military). Though some drones will ship with thermal or infrared vision, the vast majority of drones will ship with smartphone-esque cameras, so we’ll need to use AI techniques to improve their ability to see-in-the-dark. The approach outlined in this paper shows how we can use a combination of traditional techniques and contemporary computer vision approaches to improve drone performance under low light conditions.
Read more:DarkLighter: Light Up the Darkness for UAV Tracking (arXiv).

####################################################

Chinese researchers release a high-performance reinforcement learning library:
…Tianshou ships with MuJoCo tests and a bunch of algo implementations…
Researchers with Tsinghua University have released Tianshou, a PyTorch-based software library for doing deep reinforcement learning research. Tianshou ships with implementations of a bunch of widely-used Rl algorithms including PPO, DQN, A2C, DDPG, SAC, and ABC (that last one is a joke – Ed).

What is Tianshou? Tianshou is a PyTorch-based library for running deep reinforcement learning experiments. The software is modular, ships with several integrated reinforcement learning algorithms, and has support for model-free, multi-agent RL (MARL), model-based RL, and Imitation Learning approaches. Tianshow is built on top of PyTorch and uses a curated set of environments from OpenAI Gym. It supports both synchronous and asynchronous environment simulation, and also ships with an inbuilt MuJoCo benchmark to help people evaluate system performance – in tests, the algo implementations in Tianshou appear superior to those in OpenAI Baselines, Stable Baselines, and Ray/RLLib – other popular RL libraries with algorithm implementations.

Why this matters: Software frameworks are the tools AI researchers use to get stuff done. Tianshou already has 3.3k stars and 536 forks on GitHub, which is non-trivial (by comparison, OpenAI Gym is 24.8k stars and 7.1k forks). Tracking the popularity of tools like TIanshou gives us a sense of who is using what tools to carry out their experiments, and also helps us identify groups – like these Tsinghua researchers – that are building the underlying frameworks that’ll be used by others.
  Read more:Tianshou: a Highly Modularized Deep Reinforcement Learning Library (arXiv).
Get the code for Tianshou here (GitHub).

####################################################

What’s been happening in natural language processing and what are the problems of the future?
…Christopher Potts’ ACL keynote lays out where we’ve been and where we’re going…
Here’s a great video lecture from Stanford’s Christopher Potts about the past, present, and future of natural language processing (NLP). It spends quite a lot of time talking about how as new NLP systems have emerged (e.g, GPT-3), it’s become more important to invest in ways to accurately measure and assess their capabilities – a topic we write a lot about here at Import AI.
  Watch the lecture here: Reliable characterizations of NLP systems as a social responsibility (YouTube).

####################################################

What do US AI researchers think about themselves? And how might this alter politics?
…Survey of 500+ researchers gives us a sense of how these people think about hot-button issues…
Researchers with Cornell University, the Center for the Governance of AI at Oxford University, and the University of Pennsylvania, have surveyed 524 AI/ML researchers to understand how they think about a variety of issues. The survey – which was done in 2019 – is valuable for giving us a sense of how this influential set of people think about some contemporary issues, and also for expressing the distinctions between their thoughts and those of the US general public.

What do AI researchers think? AI researchers trust international organizations (e.g the UN) more than the general public (who put a lot of trust in the US military). 68% of researchers think AI safety should be prioritized more than it is currently.
  Open vs closed: 84% think that high-level descriptions of research should be shared, but only 22% think trained models should be shared.
  AI weapons – Johnny won’t build it: 58% of researchers ‘strongly oppose’ working on lethal autonomous weapons, compared to 6% for military-relevant logistics algorithms.
  China vs US competition: A survey of the US public in 2018 found very high concern over issues relating from US-China competition, while AI researchers are much less concerned.

Why this matters: AI researchers are like a political constituency, in that governments need to appeal to them to get certain strategic things done (e.g, the development of surveillance capabilities, or the creation of additional AI safety and/or adversarial AI techniques). Therefore, understanding how they feel about research and governments gives us a sense for how govs may appeal to them in the future.
  Read more:Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers (Journal of Artificial Intelligence Research).

####################################################

DeepMind makes a data-agnostic architecture called Perceiver – and it could be important:
…Who cares about your data input if you can just imagine it into something else?…
DeepMind has developed Perceiver IO, a Transformer-inspired AI model that can take in a broad variety of inputs, generate a diverse set of outputs, and can generally serve as an all-purpose replacement for (some of) the specialized networks today. The key technical innovation is using an attention process to help the Perceiver IO system take in an arbitrary input, map it to an internal latent space, process over that latent space, then generates a specificable output. “This approach allows us to decouple the size of elements used for the bulk of the computation (the latent) from the size of the input and output spaces, while making minimal assumptions about the spatial or locality structure of the input and output.”

What can Perceiver do? They run Perceiver through tasks ranging from token- and byte-level text prediction, to optical flow prediction in video, to encoding and classification of units in a StarCraft game, to image classification. This inherent generality means “Perceiver IO offers a promising way to simplify the construction of sophisticated neural pipelines and facilitate progress on multimodal and multiask problems,” they write. It does have some limitations – ” we don’t currently address generative modeling”, the authors note.
  Read more: Building architectures that can handle the world’s data (DeepMind blog).
Read more:Perceiver IO: A General Architecture for Structured Inputs & Outputs (arXiv).
Get the codefor Perceiver here (DeepMind GitHub).

####################################################

ANOTHER big model appears – a 6BN parameter code model, specifically:
…Do you like Python? You will like this…
Some AI researchers have fine-tuned Eleuther’s GPT-J 6BN parameter model on 4GB of Python Code, to create a model named Genji-python-6B.

Why does Google want to create so many open source AI models? The compute for these models came from Google’s TPU Research Cloud, according to one of the model’s developers. I’m still unsure as to what Google’s attitude is with regards to model diffusion and proliferation, and I’d love to see a writeup. (Perhaps this is just a fairly simple ‘wanna make TPUs get users, so might as well train some big models on TPUs to kickstart the ecosystem’, but if so, tell us!)
  Try out the models here: Genji-Python-6B (HuggingFace).

####################################################

Tech Tales:

Down at the Robot Arcade
[Detroit, 2040]

Who’d have thought one of the best ways to make money in the post-AGI era was to make games for robots? Certainly not me! But here I am, making some extra cash by amusing the superintelligences. I started out with just one machine – I hacked an old arcade game called Mortal Kombat to increase the number of characters onscreen at any time, reduce the latency between their moves, and wired up the ‘AI’ to be accessible over the net. Now I get to watch some of the more disastrous robots try their luck at the physical machine, playing against different AI systems that access the box over the internet. I think the machines get something out of it – they call it just another form of training. Now I’ve got about five machines and one of the less smart robots says it wants to help me build some new cabinets for some of the newer robots coming down the line – this will give me a purpose it says. “You and me both buddy!” I say, and we work on the machines together.

Things that inspired this story: The inherent desire for challenges in life; how various stories relating to the decline of capitalism usually just lead to another form of capitalism; asymmetric self-play.