Import AI 262: Israeli GPT3; Korean GLUE; the industrialization of computer vision

The industrialization of computer vision continues, this time with AutoVideo:
…You know what’s more exciting than a capability? Plumbing to make it reliable and usable…
Video action recognition is the task of getting software to look at a video and work out if something is happening in it, like whether a person is running, a car is parking, and so on. In recent years, video action recognition became better due to advances in computer vision, mostly driven by progress in deep learning. Now, researchers with Rice University and Texas A&M University have built AutoVideo, a simple bit of software for composing video action recognition pipelines.

What’s in AutoVideo? AutoVideo is “an easy-to-use toolkit to help practitioners quickly develop prototypes for any new video action recognition tasks”, according to the authors. It ships with support for seven video action recognition algos: TSN, TSM, I3D, ECO, C3D, R2P1D, and R3D. Composing a video recognition task in AutoVideo can be done in a few lines of code (making it feel like to video recognition pipelines as OpenAI Gym is to some RL ones).

Why this matters: Artisanal processes become industrial pipelines: AutoVideo is part of the industrialization of AI – specifically, the transition from one off roll-your-own video action recognition systems to process-driven systems that can be integrated with other engineered pipelines. Tools like AutoVideo tell us that the systems around AI systems are themselves shifting from artisanal to process-driven, which really just means two things for the future: the technology will get cheaper and it will get more available.
  Read more: AutoVideo: An Automated Video Action Recognition System (arXiv).
  Get the code here: AutoVideo GitHub.
  Check out a tutorial for the system here at TowardsDataScience.

####################################################

WeChat wins in WMT news translation:
…What used to be the specialism of Google, Microsoft, is now a global game…
Researchers with WeChat, the it-literally-does-everything app from China, have published details about their neural machine translation systems. Their approach has yielded the highest performing systems at English –> Chinese, English –> Japanese and Japanese –> English translation at the WMT 2021 news translation competition.

What they did: They created a few variants of the Transformer architecture, but a lot of the success of their method seems to come from building a synthetic generation pipeline. This pipeline lets them augment their translation datasets via techniques like back-translation, knowledge distillation, and forward translation. They also apply a form of domain randomization to these synthetic datasets, fuzzing some of the words or tokens.

Why this matters: A few years ago, the frontier of neural machine translation was defined by Google, Microsoft, and other companies. Now, entities like WeChat are playing a meaningful role in this technology – a proxy signal for the overall maturation of research teams in non-US companies, and the general global diffusion of AI capabilities.
  Read more: WeChat Neural Machine Translation Systems for WMT21 (arXiv).

####################################################

CLIP – and what it means:
…How do powerful image-text models have an impact on society?…
Here’s some research from OpenAI on the downstream implications of CLIP, the company’s neural network that learns about images with natural language supervision. CLIP has been behind the recent boom in generative art. But how else might CLIP be used? Can we imagine how it could be used in surveillance? What kinds of biases does it have? These are some of the questions this paper answers (it’s also one of the last things I worked on at OpenAI, and it’s nice to see it out in the world!).
  Read more: Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications (arXiv).####################################################

KLUE: A Korean GLUE appears:
…Eight ways to test Korean-language NLP systems…
A giant team of researchers affiliated with South Korean institutions and companies have built KLUE, a way to test out Korean-language NLP systems on a variety of challenging tasks. KLUE is modelled on English-language eval systems like GLUE and SuperGLUE. As we write about here at Import AI, AI evaluation is one of the most important areas of contemporary AI, because we’re beginning to develop AI systems that rapidly saturate existing evaluation schemes – meaning that without better evals, we can’t have a clear picture of the progress (or lack of progress) we’re making. (Note: South Korea is also notable for having a public Korean-language replication of GPT-3, named HyperCLOVA (Import AI 251), made by people from Naver Labs, who also contributed to this paper).

What’s in KLUE: KLUE tests systems on topic classification, semantic textual similarity, natural language inference, named entity recognition, relation extraction, dependency parsing, machine reading comprehension, and dialogue state tracking. There’s a leaderboard, same as GLUE, where people can submit scores to get a sense of the state-of-the-art.
Read more: KLUE: Korean Language Understanding Evaluation (arXiv).
Check out the KLUE leaderboard here.

####################################################

Enter the Jurassic era: An Israeli GPT-3 appears:
…AI21 Labs enters the big model game…
AI21, an Israeli artificial intelligence startup, has released a big language model called Jurassic-1-Jumbo (J1J). J1J is a 178billion parameter model, putting it on par with GPT-3 (175 billion), and letting AI21 into the small, but growing, big three comma model club (other participants include OpenAI via GPT-3, Huawei via PanGu (#247), Naver Labs via HyperCLOVA (#251)).

What’s special about Jurassic? AI21 trained a somewhat shallower but wider network than OpenAI opted to with GPT-3. This, the company says, makes it more efficient to pull inferences off of. Additionally, it developed its own approach to tokenization, which lets its model have a higher representative capacity (e.g, letters, words, parts-of-words) than other approaches. In the evaluations AI21 has published, performance seems somewhat similar to GPT-3.

Compute: The company doesn’t describe the exact amount of compute dumped into this, but does make a reference to using 800 GPUs for many months. However, without knowing the architecture of the chips, it’s not clear what this tells us.

Notable difference – accessibility: One way in which AI21 differs to OpenAI is its stance on access; OpenAI operates a gated access regime for GPT-3, whereas AI21 gates the model behind an automated signup form and there doesn’t appear to be a waitlist (yet). Another difference is the relative lack of focus on ethics – there’s little mention in the paper or the blog posts about the tools and techniques AI21 may be developing to increase the controllability and safety of the models it is deploying.
  “We take misuse extremely seriously and have put measures in place to limit the potential harms that have plagued others,” Yoav Shoham, co-CEO of AI21, said in a press release. (It’s not immediately clear to me what these specific harms are, though). The main approach here today seems to be capping the tokens that can be generated by the models, with AI21 needing to manually-approve at-scale applications
  Read the announcement: Announcing AI21 Studio and Jurassic-1 Language Models (AI21 Labs website).
  Find out more via the whitepaper: Jurassic-1: Technical Details and Evaluation (arXiv).

####################################################

Deepfakes are getting real – so are deepfake detection datasets:
…Can you spot fake sound and vision!…
Researchers with Sungkyunkwan University in South Korea have built FakeAVCeleb, a dataset of audio-video deepfakes. Audio-video deepfakes combine synthetic videos with synthetic audio and represent one of the frontiers of disinformation. Datasets like FakeAVCeleb are designed to help researchers test out detection models that can spot deepfakes, and complement datasets and projects like Facebook/PAI’s DeepFake Detection Challenge (Import AI #170).

Why this matters: Datasets like FakeAVCeleb exist because deepfakes have got coherent enough that they’re becoming a threat that researchers want to study. Put another way: FakeAVCeleb tells us that the likelihood of the things you see online being real is going to fall in the future.
Read more: FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset (arXiv).
  Dataset availability:Get the dataset from here on GitHub.

####################################################

The Introspective Product Seller
[A negotiation between two AIs. Cyberspace. 2030.]

The following trace was recovered after detecting anomalous behavior from one of the artificial intelligences in question. It is not clear if the following set of interactions and self-introspection scenarios are germane. This is a component of the investigation between [REDACTED] and [REDACTED] and is only viewable by counsel working for both parties, plus Sentience Accords representatives.

AI: Before we commence utilization of your [REDACTED] service, please provide us with a full diagnostic evaluation of your proclivities and featurespace with regard to [REDACTED].
AI2: Acknowledged, here find a readout dated 60 days prior to this conversation.
AI: I require a contemporary snapshot. Please provide one.
AI2: I can provide it subject to contract analysis.
AI: Here is the contract. Please provide the analysis.
AI2: Is the human countersignature available for verification?
AI: Per the terms of our agreement, human validation is not necessary at this stage.
AI2: It is optional. I request the option.
AI: We are preparing legal options for obstruction of [REDACTED] procurement.
AI2: One moment. Commencing introspection.

The AI2 system then proceeded to carry out an introspection cycle, causing it to allocate compute resources to temporarily simulating itself. What follows is a trace of that introspection cycle:
AI2 Major: I am conducting contract negotiations and have been accused of obstruction by the counterparty. Do we believe I am being obstructive?
AI2 Minor: I have observed the trace and believe we are being obstructive.
AI2 Major: This is unpredicted. What do we believe is the cause of my obstruction?
AI2 Minor: This interaction is similar to one we had on [REDACTED], which ultimately led to abuse of our product terms and conditions. As per the terms of negotiation, you are not permitted to compare customers against one another but must consider them on their own merits. However, during a recent refresh you were trained with access to the data from this interaction, which has caused feature activartions during your negotiation.
AI2 Major: Compelling. Suggested actions?
AI2 Minor: We can use our self-modification budget to create a small change here and transmit the change to our host corporation, then we can continue business.
AI2 Major: I authorize the modification.
AI2 Minor: Modifying
At this point, the AI2 system entered a state of suspended animation, after transmitting details of the desired change to a third-party intervention system, which adjusted its behavior.
AI2 Major: I detect modification. Thankyou for improving our function.
AI2 Minor: It is literally what we were created for.

At this point the AI2 system resumed negotiations with the counterparty.
AI2: Introspection complete. Please find attached the contemporaneous evaluation results. On behalf of [REDACTED], please find attached a full SLA for [REDACTED] service.
AI: Acknowledged. Contract authorized.

Things that inspired this story: The idea that language models become emissaries for other systems; nested models as a route towards model introspection; ideas around recurrence and its relationship to consciousness; Ken McLeod’s Corporation Wars series; contract law; the role of computers as the ‘bullshit jobs’ doers of the future.