Import AI 270: Inspur makes a GPT3; Microsoft’s half-trillion parameter model; plus, a fair surveillance dataset

by Jack Clark

Microsoft trains a 530B model (but doesn’t release it – yet).
…NVIDIA and Microsoft team up to break the half-trillion mark…
Microsoft and NVIDIA have trained a 530Billion-parameter GPT-3-style model. This is the largest publicly disclosed dense language model in existence, indicating that the competition among different actors to develop models of the largest scales continues unabated.

Data and evaluations: One of the most intriguing aspects of this release is the data Microsoft uses – The Pile! The Pile is an open source dataset built by the AI-cypherpunks over at Eleuther. It’s quite remarkable that a world-spanning tech multinational doesn’t (seem to?) have a better dataset than The Pile. This suggests that the phenomenon of using internet-scale internet-scraped datasets is here to stay, even for the largest corporations. (They also use Eleuther’s ‘lm-evaluation-harness‘ to assess the performance of their model – which, unsurprisingly given the resource-intensiveness of the model, is very good).

Compute requirements: To train the model, Microsoft used 4480 NVIDIA A100s across 560 DGX A100 servers, networked together with HDR InfiniBand.

Things that make you go ‘hmmm’: Despite Microsoft’s partnership with OpenAI, there’s no reference in this blogpost to OpenAI or, for that matter, GPT3. That’s somewhat odd, given that GPT3 is the reference model for all of this stuff, and other things (e.g, Inspur’s model).

Why this matters: “We continue to see hyperscaling of AI models leading to better performance, with seemingly no end in sight,” Microsoft writes. “The quality and results that we have obtained today are a big step forward in the journey towards unlocking the full promise of AI in natural language.”
Read more: Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model (MIcrosoft).

####################################################

Worried your surveillance system is biased? Try out EDFace-Celeb-1M:
…Dataset contains 1.7 million pictures, aims to be diverse…
Researchers with Australian National University, Tencent, and Imperial College London, have built a large-scale facial recognition dataset which is meant to help reduce bias from facial upsampling. Specifically, for most facial recognition systems you take in a low-resolution picture (e.g, a still from someone from a CCTV camera) and then you need to upscale it to do more sophisticated analysis. But upscaling has problems – if you don’t have much knowledge about different races in your upscaler, then you might find your ML system either breaks or alters the race of the face being upscaled towards one more represented in its underlying data. This leads to bias in the facial recognition system in the form of disparate performance for different types of people.

EDFace-Celeb-1M is a dataset of 1.7 million face photos, spread across more than 40,000 different celebrities. EDFace contains “White, Black, Asian, and Latino” racial groups, according to the authors, with representation consisting of 31.1%, 19.2%, 19.6%, and 18.3%, respectively. The dataset is overall 64% male and 36% female.

Why this matters: Like it or not, surveillance is one of the main uses of contemporary computer vision. This is one of those rare papers that combines the interests of the AI ethics communities when it comes to more equitable representation in datasets, while also serving the surveillance desires of industry and governments.
  Read the paper: EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset (arXiv).
Get the datasets: EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset (GitHub).

####################################################

A second Chinese GPT3 appears:
…Inspur shows of a GPT3-scale model…
Chinese company Inspur has built Yuan 1.0, a 245B parameter GPT3-style model. This follows Huawei building PanGu, a ~200B GPT3-style model. Taken together, the models indicate that Chinese companies are peers with leading Western AI labs, which should hopefully make it obvious to US policymakers that China should be viewed as a peer in terms of advanced AI R&D.

What they did: When you’re training models of this side, a lot of the hard stuff is plumbing – literally. You need to figure out how to build well-optimized pipelines for training your model on thousands of GPUs, which involves salami slicing different stages of model training to maximize data efficiency. Similarly, you need to feed these GPUs with data in the right order, further increasing efficiency. The paper includes some nice discussion of how the Inspur researchers tried to do this.

Compute: They used 2128 GPUs to train the 245B model, with a context length of 2048 tokens.

Data, via AI helping AI: To train the model, they build a dataset of 5TB of predominantly Chinese text. (By comparison, Huawei’s GPT3 equivalent PanGu was trained on 1.1TB of text, and ERNIE 3.0 was trained on 4TB of data). They train a BERT-style model to help do automatic filtering of the data. Their data comes from Common Crawl, Sogou News, SogouT, Encyclopedia, and Books.

How good is it? Yuan 1.0 does well on a variety of standard benchmarks. The most interesting result is on the quality of its text generation – here, the authors adopt the same approach as in the original GPT3 paper, where they generate text of different forms and see how well humans can distinguish generated text from ‘real’ text. The results are striking – humans are 49.57% accurate (compared to 52% for GPT3), meaning the Yuan 1.0 outputs are so good they’re indistinguishable from human-written text. That’s a big deal!
Read more: Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning (arXiv).

####################################################

What it takes to build a shared robot platform – and what this means for research:
…Max Planck’s robot-cloud shows the shape of future AI research…
A consortium of researchers from universities including the Max-Planck Institute for Intelligent Systems, Stanford, and the University of Toronto, among others, have built a robot in a cloud computing setup. Specifically, they’ve created a robot testbed hosted at Max Planck which can be accessed over the internet by other researchers around the globe, similar to how we today access remote servers and data storage.

What the platform is: The robot cloud consists of 8 robots, each using the same ‘trifinger’ arrangement. These robots were previously used in the ‘Real Robot Challenge 2020‘ (Import AI #252), which served as a competition to assess how clever AI systems for robot manipulation are getting, as well as being a testbed for the robot cloud mentioned here.

Dataset: The authors have also released a dataset, consisting of the recorded data of all the entries from all the teams that took part in the physical tracks of the Real  Robot Challenge, consisting of about 250 hours of robot activities. The dataset contains around 10,000 distinct ‘runs’, oriented around a variety of challenging robot tasks. “For each run, the actions sent to the robot as well as all observations provided by robot and cameras are included, as well as additional information like the goal that was pursued and the reward that was achieved,” the authors write.

Why this matters: Ai is full of power asymmetries, many of which stem from resource asymmetries (some actors have a lot of computers and robots, others have very few). Competitions like this show how academia could carve a path through this resource-intensive jungle; by pooling resources and expertise, universities could collaborate to create shared platforms, that facilitated research on expensive and worthy problems.
Read more:A Robot Cluster for Reproducible Research in Dexterous Manipulation (arXiv).
Get the dataset here: Real Robot Challenge 2020 Dataset (Tuebingen University).

####################################################

AI Ethics, with Abhishek Gupta
…Here’s a new Import AI experiment, where Abhishek from the Montreal AI Ethics Institute and the AI Ethics Brief writes about AI ethics, and Jack will edit them. Feedback welcome!…

Is responsible development of technology something that is only accessible to Big Tech?
…The needs of resource-constrained organizations are similar, but their challenges differ and require attention to be addressed…

MIT researchers have interviewed staff at some low-resource startups and companies to understand the challenges they face in building responsible technology. Their study explores “the tensions between privacy and ubiquity, resource management and performance optimization, and access and monopolization”, when trying to build responsible AI systems. 

The gap in current literature and challenges: They found that few organizations had success in building and using interpretability tools for AI systems, and that most of the work in the Responsible AI (RAI) space focused on bias and fairness. They also found that a common problem in large technology companies was “deficient accountability and decision-making structures that only react to external pressures,” something that was less applicable for smaller organizations. AI systems from smaller organizations often evoke similar expectations from end users as the more performant systems from Big Tech. In most cases, the resources required to develop such capabilities in-house or purchasing it off-the-shelf remain inaccessible to smaller organizations, entrenching the gap between them and Big Tech. Low data and AI literacy of management at these organizations also lead to inappropriate RAI practices.

Why it matters: As AI systems become more accessible through pretrained models and cloud-based solutions, we need to empower those building products and services on top with the ability to address ethical challenges in a way that doesn’t break the bank. Since one of the major challenges seems to be access to expensive compute and storage resources, perhaps initiatives like the National Research Cloud in the US can help to close the gap? Would that help in wider adoption of RAI practices? Maybe more OSS solutions need to be developed that can bridge the tooling gaps. And, finally, AI talent with experience in addressing RAI challenges needs to become more widely accessible, which requires stronger emphasis at university programs on teaching these essential skills. 
Read more: Machine Learning Practices Outside Big Tech: How Resource Constraints Challenge Responsible Development.

####################################################

Tech Tales:

The Most Ethical System
[History book, 2120, assigned as part of background reading for the creation of the ‘Societal Stabilization Accords’]

The technique known as ‘Ethical Fine-Tuning’ (EFT) first emerged in the mid-2020s, as a response to various public relations issues generated by known biases in machine learning systems. EFT let a user calibrate a given AI system to conform to their own ethical morality via a fe ‘turns’ of conversation, or other form of high-information interaction.

EFT had been developed following criticism of the white-preferencing, western world-reflecting traits of many of the first AI systems, which represented a form of morality that by necessity accommodate many mainstream views, and didn’t treat minority views as legitimate.

Companies spent years trying to come up with systems with the ‘right’ values, but all they earned for their efforts were sustained criticism. In this way, most AI companies quickly learned what human politicians had known for millennia – morality is relative to the audience you’re trying to curry favor from.

After EFT got built, companies adopted it en mass. Of course, there was outcry – some people made AI systems that strongly believed humans should have a fluid gender identity, while others created AI systems that called for a fixed notion of gender. For every position, there was a counter-position. And, over time, as these systems enmeshed with the world, their own ethical values created new ethical problems, as people debated the ‘values’ of these customized machines, and sought to build ones with superior values.

Eventually, EFT techniques were combined with multi-agent reinforcement learning, so that the AI systems were able to propagate values to their own users, but if they were accessed by other humans or AI systems, could quickly calibrated their ethical norms to de-conflict with the other systems they were plugged into. In this way, everyone got access to their own AI systems with the ‘best’ values, and their AI systems learned to mislead other humans and AI systems – all for the sake of harmony.

Of course, this led to the utter destruction of a notion of shared ethics. As a consequence, ethics went the way of much of the rest of human identities in the 21st century – sliced down into ever finer and more idiosyncratic chunks, brought closer to each individual and farther from being accessed by groups of people. People were happy, for a time.

EFTs were ultimately banned under the Societal Stabilization Accords introduced in the late 21st century. Contemporary historians now spend a lot of time generating ‘alternative path futures’, whereby they try to analyze our own society as if EFTs had continued to exist. But it’s hard to make predictions, when everyone is rendered unique and utterly defended by their own personal AI with its own customized morality.

Things that inspired this story: Controversy around AI2’s ‘Delphi’ AI system; thinking about intersection of ethics and morality and AI systems; how our ability to forecast rests on our ability to model people in groups larger than single individuals; how the 21st century tries to turn every aspect of a person’s identity into a customized market of one.