Import AI 239: China trains a massive 10b model, Vicarious does pick&place; the GCHQ publishes some of its thoughts on AI

China trains a 10billion parameter multimodal network… using NVIDIA’s code:
…Chinese entities train a decent 10 billion parameter multi-modal model…
A hybrid team of researchers from Alibaba and Tsinghua University have built M6, a “Multi-Modality to Multi-Modality Multitask Mega-transformer”. M6 is a multi-modal model trained on a huge corpus of text and image data, including image-text pairs (similar to recent systems like OpenAI’s CLIP). M6 has a broad capability surface and because of how it was trained, you can use M6 to search for an image or vice versa, generate media in different modalities, match images together, write poems, answer questions, and so on.

Data:  ~60 million images (with accompanying text pairs) totalling 1.9terabytes (almost twice the raw size of ImageNet), plus 292GB of text.

Facts and figures: Though the authors say they’ve trained a 10billion and 100billion parameter model, they mostly report performance statistics for the 10billion. The 100b is a mixture-of-experts model, while the 10b is based on NVIDIA’s Megatron-LM training code (Import AI 218). The model’s size and sophistication as notable – this feels like a symptom of the maturing capabilities of various Chinese AI organization. I wonder when we’ll get an M6-scale system from people affiliated with India, or regions like Europe or Africa.

Why this matters: M6 is notable for being a non-English model at equivalent scale to some of the largest primarily-English ones. We’re entering an era where there will be multiple, gigantic AI models, magnifying and minimizing different cultures with variations stemming from the organizations that trained them. It’s also interesting to consider how these models proliferate, and who will get access to them. Will students and researchers at Tsinghua get access to M6, or just Alibaba’s researchers, or both? And how might access schemes develop in other countries, as well?
…Finally, a word about bias: There’s no discussion of bias in the paper (or ethics), which isn’t typical for papers of this type but is typical of papers that come out of Chinese research organizations. If you’ve got counterexamples, please send them to me!
  Read more: M6: A Chinese Multimodal Pretrainer (arXiv).

###################################################

Facebook doesn’t even need labels to train its vision systems anymore (just your Instagram data):
…Self-supervised learning, at sufficient scale, might get us few-shot learning for free as well…
Self-supervised pre-training: SEER learns via a self-supervised method called SwAV, which lets it look at unannotated images and, given enough scale, derive features from them and cluster them itself. They train using a family of models called a RegNet. The magic of this method comes from the data they use: a billion pictures from Instagram (though they note in the paper these are “non-EU” images, likely due to GDPR compliance).

Results: The best version of SEER gets 84.2% top-1 ImageNet accuracy, nicely improving on other self-supervised approaches. (Though there’s still a ways to go before these techniques match supervised methods, which are now getting around ~90% top-1 accuracy).

Few shot learning, meet image recognition: SEER gets 77.9% top-1 accuracy on ImageNet after only seeing 10% of the images – suggesting that SEER can do a kind of few-shot learning, where by providing it with some data from a new domain it quickly adjusts itself to obtain reasonable performance. (Though several tens of thousands of images is quite different to the few sentences of text it takes to do few-shot learning in the text regime)

Why this matters: SEER is relatively simple, as is the network architecture they use. The amazing capabilities we see here (including the few-shot learning) primarily come from the scale of the datasets which are used, combined with the intentionally naive unlabelled training approach. “This result confirm that the recent progress of self-supervised learning is not specific to curated training set, like ImageNet, and could benefit a large range of applications associated with uncurated data,” they write.
  Read more: Self-supervised Pretraining of Visual Features in the Wild (arXiv).

###################################################

What does the UK’s NSA think about AI?
…Position paper hints at focus areas, discusses ethical issues, even IDs the elephant in the room…
The UK’s spy agency, GCHQ, has published a paper about how it hopes to use AI. This is notable; spy agencies rarely discuss frontier technologies. (Though don’t get too excited – the memo is unsurprisingly light on technical details.)

What information does the paper contain? GCHQ shares some thoughts for how it might use AI to aid some of its missions, these include:

  • AI for cyber threats: Use AI to identify malicious software, and also potentially to trace its distribution. 
  • AI for online safety for children: Use AI to identify online behaviors that look like adults ‘grooming’ kids for sexual exploitation, and use AI to analyze images found in the course of these investigations.(No mention, unlike the Germans (Import AI 234), of using AI to generate sexual imagery to help trap abusers). 
  • AI for human trafficking: Use AI to map out the human networks that enable trafficking, and use AI to sift through vast amounts of financial data to find connections. 
  • AI for foreign state disinformation: Use AI to do fact-checking and detect synthetically generated content (e.g, deepfakes). Also, use AI to automatically identify and block botnets that use machine-generated accounts. 

What does GCHQ think are the major AI ethics challenges? Fairness and bias is listed as one major challenge. GCHQ also lists ’empowerment’ – which it defines as figuring out how much freedom to give the AI systems themselves. GCHQ thinks AI is best used in partnership with humans: the AI comes up with answers and insights, then human experts use this to authorize or carry out actions.

AI policy is national security policy: In recent years, we’ve seen a vast migration of technology people moving from academia into industry, partially in response to skyrocketing salaries. This poses a challenge to modern spy agencies – government has a hard time paying as much as Google or Facebook, but it needs a similar caliber of talent to achieve its objectives. GCHQ says part of why it has written the paper is because of this new reality. “Most investment in the UK continues to come from the private sector rather than government and this is expected to continue,” the agency writes. “It is therefore unsurprising that GCHQ is now engaging more broadly with wider society and industry than at any other time in its history. We have much to learn from the exponential growth of AI in the outside world, and believe our specialists also have much to contribute.”
  Read more: Pioneering a New National Security, the Ethics of Artificial Intelligence (GCHQ, PDF).

###################################################

Google’s latest speech compression tech tells us that production AI is hybrid AI:
…End-to-end learning is nice, but the best things happen when you combine expertise…
Google has made Lyra, a more efficient speech codec. Lyra wraps in some recent ML advancements; it works by extracting features from input speech, quantizing that, then using a generative model to take these features and reinflate them into output speech.

Good speech with less data: Lyra is designed to operate with audio streams of as little as 3kbps – here, it does better than other codecs and compares favorably with Opus, an established speech codec. Lyra is notable because it smooshes together expert-derived stuff (which would be some of the traditional codec techniques used here) with a strategic use of a generative model and gets great performance and useful efficiency gains.

Fairness & ML: “We’ve trained Lyra with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries and then verifying the audio quality with expert and crowdsourced listeners. One of the design goals of Lyra is to ensure universally accessible high-quality audio experiences,” the company writes.

Why this matters: AI is going to be everywhere. And it’s going to be everywhere in a Lyra-like manner – as a discrete, smart component within a larger technical stack. We’re also going to see people use more generative models to distill and reinflate representations of reality – we’re entering the dumb ‘brain in a jar’ phase of AI deployment.
  Read more: Lyra: A New Very Low-Bitrate Codec for Speech Compression (Google blog).
  Read more: Generative Speech Coding with Predictive Variance Regularization (arXiv).

###################################################

AI developer: I’m afraid of what happens if my code gets released:
…One lens on the ethics of open vs closed-source…
Is it safer for an AI system to be open source or for it to be controlled by a small set of actors? Generally, the technology community has leaned towards stuff being open source by default, but in recent years, people have been experimenting with the latter. This has happened with various types of synthetic media, like language models that haven’t been fully released (e.g, NVIDIA’s Megatron LMs, GPT2[at first]), or various papers on synthetic media where the researchers don’t release the models. Now, a VP of AI faceswap App reface has written a post laying out how he thinks about the release of certain AI technologies. His post is about AI body swapping – that is, taking one person’s face and potentially body and morphing it onto someone else in a video.

Demos get shady attention: “Only after I published a [AI body swap] demo in August 2020 and different shady organizations started approaching me, I realized that AI-body-swap is a bomb. A bomb in both senses – as a technological breakthrough and as something dangerous if its code gets into the wrong hands,” he writes. “A team of high-class ML-pros would find a way around my code in about a week. In roughly six months, they’d have a production-grade full-body swap technology.”

Why this matters: “We need to make a pact, a deal that all the companies that create synthetic media must include watermarks, footprints, or provide other detectors for identifying it.”, he writes. A cynical person might say ‘business guy writes article about why his business-benefiting strategy is good, lol go figure’. There’s some merit to that. But a few years ago articles like this were a lot rarer – the AI community does seem to be becoming genuinely concerned about the consequences of its actions.
  Read more: The Implications of Open-Source AI: Should You Release Your AI Source (Hackernoon).

###################################################

Proof that robots are getting smarter: GreyOrange partners with AI startup Vicarious:
…Maybe AI+Robots is about to be a thing…
AI startup Vicarious has partnered with GreyOrange, a company that builds AI and robot systems for warehouses. Vicarious has a neuroscience-inspired approach to AI (which earlier helped it break the CAPCHA security system, #66) which means its systems exhibit different capabilities to those made with deep learning techniques.

Why Vicarious? Vicarious’s tech has typically been good at solving problems involving spatial reasoning. You can get a sense of its approach by looking at papers like “Learning a generative model for robot control using visual feedback” and “From proprioception to long-horizon planning in novel environments: A hierarchical RL model“. (I hope to cover more of this research in Import AI in the future, but I’ll need to take some time to load different approach into my brain.)

What they’re doing together: GreyOrange will integrate an AI capability from Vicarious into its ‘GreyMatter Fulfillment Operating System” tech. Vicarious’s system will handle technology for autonomous vertical picking, which involves getting a robot to perceive “the size, shape and material characteristics of inventory items, including when these are loosely positioned in an unstructured fashion”, then pick them up and approach, retrieve, and place items into order boxes. “Vicarious’ computer-vision and robotics technology is a breakthrough in the ability to handle unstructured, previously hard-to-grasp items,” said Vicarious co-founder Dileep George in a press release announcing the move.  

Why this matters: The physical world is a huge challenge for AI. Most AI systems get trained in a purely digital context (e.g, computer vision systems get trained on digitized images of the world, and then are deployed in reality against… digital images of the world), whereas robots need to be trained in simulation then taught to generalize to the real world. This is especially challenging because of things like differences in the physics fidelity between simulators and reality, or hardware issues (e.g, air pressure/temperature/etc will screw around with the motor responses of certain robots, and the sun has a nasty habit of moving across the sky which continually changes illumination in outdoor/hybrid settings, throwing off vision systems).
  GreyOrange and Vicarious partnering up is a further symptom of the success of AI being applied to robotics. That’s a big deal: if we can get more flexible AI systems to work here, we can unlock tremendous economic value. Vicarious also isn’t the only company trying to revolutionize fulfillment with robotics – that’s also the focus of the (deep learning-based) startup, Covariant, among others. 
  Read more: GreyOrange and Vicarious Launch Autonomous Vertical Picking Solution for Apparel and Omnichannel Fulfillment (GlobeNewswire press release).
  Find out more about GreyOrange (GreyOrange site).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

NSCAI publishes final report on US AI strategy
The USA’s National Security Commission on AI has delivered its final report on US AI strategy. The report warns that the US risks being overtaken in technology without an acceleration in AI adoption, supported by substantial federal investment over the next five years.

Recommendations:

  • US military should work towards achieving ‘AI readiness’ by 2025: increasing DoD AI R&D spending to $8bn/year in 2026 (vs. $1.5bn in 2021); establishing a Digital Service Academy and National Digital Reserve Corps to address the talent deficit; more research into ensuring AI systems are robust and reliable.
  • US should embrace autonomous weapons and work with other nations to establish international standards and mitigate risks, while reaffirming DoD’s policy that human judgement be involved in any decision to kill. 
  • Overall federal funding for R&D should climb to at least 1% of GDP by 2026 (vs 0.6% in 2017).
  • Non-defense AI R&D funding should increase to $32bn/year (vs. $1.5bn in 2021); $32bn investment over five years in domestic semiconductor capacity (see Import 238).
  • To build a stronger AI workforce, the US should offer green cards to all STEM PhD graduates at US universities and double the number of employment-based visas, alongside substantially more funding for STEM education at all levels.
  • Establishing a Technology Competitiveness Council, tasked with developing and overseeing a National Technology Strategy, and coordinating efforts across government.

Read more: NSCAI report in full

————–

FAccT suspends Google sponsorship

ACM’s FAccT conference has paused its sponsorship by Google, following the turmoil and departures at the company’s Ethical AI team. Lead researchers Timnit Gebru and Margaret Mitchell were forced out earlier this year, after disputes around the company’s suppression of ethics research (see Import 226; 235).

   Read more: AI ethics research conference suspends Google sponsorship (VentureBeat) 


————–

Highlights from semiconductor substack– Mule’s Musings on Heterogeneous Compute; ASML and lithography; vertical monopolies; GPT-3.
– Deep Forest’s primers on semiconductor foundries (pt 1, pt 2).
– Employ America on the economics of the current chip shortage.

###################################################

Tech Tales:

The Speech for the Rebels
[2030: A country in central Africa where the US and China are fighting a proxy war primarily via the stoking of local political tensions]

They’d spent a few million dollars to digitize everything – and I do mean everything – that they’d gathered from the rebels. Then they started writing out speeches and testing the AI against it. The idea was that if you said something and the AI, which had been finetuned on all the digitized data, thought what you were saying had a low probability, then that told you that your speech was out of sync with the ‘mood’ inherent to the rebel group. On the other hand, if your speech was being predicted as likely by the AI system, that told you it might resonate.

Rhetorical Finetuning, the analysts called it, or R-FT.
Silver Tongue, was the name of the system we used.
The Mouth – that’s what we called it.
– “Go see how well The Mouth works on them”.
– “Oh, you’re back, I guess the mouth worked for you”.
– “Just tell ’em what the Mouth says and see what happens”.
– “Don’t improvise, the Mouth works”.

The strangest truth about The Mouth was it worked – really well. One classified document noted that “campaigns which utilized R-FT via Silver Tongue saw a 15% improvement in our post-engagement de-escalation metrics, resulting in a lowered casualty rate for warfighters in the region”.

So that’s why we ended up sticking The Mouth on our wrists. The AI runs inside a wearable which has a microphone – the bracelet glows green when we’re saying things that The Mouth predicts are probably and it glows red when we say things that aren’t. We spend a lot of time in training getting taught to not look directly at our wrists while talking, but rookies do it anyway. Now, when I give my talks – even improvised ones, after an incident, or to resolve something, or ask for a favor – I get my little signals from the bracelet and I say the words and keep it green.

I don’t know the language the local rebels speak. My translator picks up most of it, but not the things they whisper to eachother, hands over their mouths, looking at me as I use the mouth to talk. What are they saying, I wonder?

Look, when the American tries to speak like us, their wrist flashes green.
What does the red mean, Father?
The red is when they are telling the truth, my Son.

Things that inspired this story: The miniaturization of communication technology for conflict; thinking about language models and how they could be weaponized for purposes of propaganda or other state-level objectives; thoughts about how AI might get integrated with warfare; various loosely connected ideas around how AI influences culture through re-magnification of things the AI picked up; the natural skepticism of all humans in all places to unfamiliar people giving them a convincing speech.