20 | May | 2022 | Import AI

Import AI 295: DeepMind’s baby general agent; NVIDIA simulates a robot factory; AI wars.

by Jack Clark

CRPD: Chinese license plate recognition:
…A basic dataset for a useful capability…
Researchers with the University of Electronic Science and Technology of China have built a dataset for recognizing Chinese license plates. The authors use the dataset to train some models that get state-of-the-art accuracy while running at 30 frames per second.

The dataset: The Chinese Road Plate Dataset (CRPD) contains 25k images (around 30k total). Each image is annotated with the Chinese and English characters of the depicted license plate, the coordinate of the vertices of the license plates, and the type of license plate (e.g, whether for police cars, small cars, etc). Images for the dataset were “collected from electronic monitoring systems in most provinces of mainland China in different periods and weather conditions,” the authors write.

Why this matters: Datasets like CRPD represent the basic infrastructure on which AI capabilities get developed. It’s also notable how universities in China can access large-scale surveillance datasets.
Read more: Unified Chinese License Plate Detection and Recognition with High Efficiency (arXiv).

Get the dataset: Github https://github.com/yxgong0/CRPD

####################################################

DeepMind builds a (very preliminary) general AI agent:

…AKA: The dawn of really preliminary, general AI systems..

In the past few years, the dumbest thing has tended to work surprisingly well. Take for example GPT3 – just scale-up next word prediction on an internet-scale corpus and you wind up with something capable of few-shot learning, fielding a vast range of NLP capabilities.
Another example is computer vision systems – just create a vast dataset and you wind up with increasingly robust vision systems.
Or contrastive learning – just embed a couple of modalities into the same space and sort of flip-flop between them through the learning process and you get powerful multimodal systems like CLIP.
Now DeepMind has done the same thing for reinforcement learning with GATO, an agent where basically DeepMind takes a bunch of distinct tasks in different modalities and embeds them into the same space, then learns prediction tasks from them. The result is a system where “the same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.” This is wild stuff!

What GATO can do: After training, GATO can do okay at tasks ranging from DeepMind Lab, to robot manipulation, to the procgen benchmark, to image captioning, to natural language generation.

It’s a big deal: The fact you can take a bunch of different tasks from different modalities and just… tokenize them… and it works? That’s wild! It’s both a) wildly dumb and b) wildly effective, and c) another nice example of ‘The Bitter Lesson‘, where given enough compute/scale, the dumb things (aka, the simple ones) tend to work really well.
In a small package: The largest (disclosed here) GATO agent is 1.18 billion parameters, making it fairly small in the grand scheme of recent AI developments.

An even crazier thing: The GATO model only has a context window of 1024 tokens (by comparison, GPT3 was 2048 when it launched), so the fact 1024 tokens is enough to get a somewhat capable multimodal agent is pretty surprising.

Why this matters: “Although still at the proof-of-concept stage, the recent progress in generalist models suggests that safety researchers, ethicists, and most importantly, the general public, should consider their risks and benefits,” DeepMind writes.

Check out the blog: A Generalist Agent (DeepMind website).

Read more: A Generalist Agent (DeepMind PDF).

####################################################

Chinese researchers build a large multi-modal dataset, and evaluation suite:
…’Zero’ makes it easier to develop AI systems for the Chinese cultural context…
Chinese researchers with startup Qihoo 360 AI Research and the Department of Automation at Tsinghua University have built Zero, a benchmark for assessing the quality of vision-text Chinese AI models. Zero consists of a dataset (the Zero-Corpus, consisting of 23-million image-text pairs, filtered via high click through rates – so the top image people click in response to a query), as well as five downstream datasets for evaluating Chinese vision-text models (an Image-Caption Matching Dataset, an Image-Query Matching dataset, an Image-Caption Retrieval Dataset, an Image-Query Retrieval Dataset, and a Chinese-translated version of the Flickr30k dataset).

Model training: The authors also train a model, called R2D2, on the corpus. They show that their model significantly outperforms another Chinse model named Wukong. R2D2 incorporates some pre-ranking techniques to improve its performance.

Why this matters: The main idea behind datasets and models like this is described in the paper: “promote the development of Chinese vision language learning. We expect that a fair Chinese cross-modal benchmark and a good cross-modal framework will encourage a plethora of engineers to develop more effective methods in specific real-world scenarios, such as searching images by texts.”
Read more: Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework (arXiv).

####################################################

NVIDIA makes some efficient Factory simulation software:
…Finally, a physics simulator built around the needs of robots…
Researchers with NVIDIA and the University of Washington have built Factory, software for doing rich, efficient physics situations of robots. Factory is basically some highly optimized simulation software, with NVIDIA claiming significant performance speedups relative to widely-used software like Bullet. NVIDIA claims Factory can be used to do “100s to 1000s of contact-rich interactions” that can be “simulated in real-time on a single GPU”.

What Factory includes:
– Physics simulation: A module for physics simulation, available within the ‘PhysX’ physics engine, as well as NVIDIA’s robot software simulation tech, Isaac Gym
– A robot learning suite: A ‘Franka’ robot and rigid-body assemblies from NIST’s ‘Assembly Task Board 1’ benchmark. This suite includes 60 robotic assets, 3 robotic assembly environments (a nut-and-bolt test, a peg insertion task, and a 4-party gear assembly task), and 7 classical robot controllers.
– Prototype reinforcement learning: Some basic RL policies (trained via PPO) for a simulated Franke robot to help it solve the NIST challenge.

Why this matters: One of the blockers on deploying AI-driven robots into the world is the challenge in crossing the ‘sim-2-real’ gap. Software like Factory makes that gap a lot narrower, and also makes it cheaper to explore what it takes to cross it.
Read more: Factory: Fast Contact for Robotic Assembly (arXiv).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

When and how should you collect more demographic data in the pursuit of algorithmic fairness?

… Good data governance and cryptographic methods can help, but they don’t undo the systemic challenges to fairness …

Researchers from the Partnership on AI have written about one of the core challenges in algorithmic fairness: squaring the need for more demographic data with how such data can harm the people it was meant to help.

The core challenge: Most algorithmic approaches to fairness require the collection of demographic data (“an attempt to collapse complex social concepts into categorical variables based on observable or self-identifiable characteristics”) which often ignores the broader questions of politics and governance surrounding that data. In some cases, such data collection is prohibited by anti-discrimination law, further complicating the assessment and subsequent mitigation of bias. Given such gray areas, companies hesitate to gather this data explicitly to err on the side of not violating privacy and other legal mandates.

Individual and community risks to demographic data collection: Concerns around demographic measurement occur due to narrow and fixed categories predetermined by companies. While privacy is a primary concern at the individual level, harm also arises from misrepresentation of the individual and the use of their data beyond initial consent. Given that algorithmic decision-making systems are used to make inferences about groups, there are additional risks such as undue surveillance, privacy dependency, group misrepresentation, and a loss in the agency of self-determination in what is considered fair and just.

Some solutions: K-anonymity, p-sensitivity, and differential privacy are proposed as solutions, along with various approaches to participatory data governance through data cooperatives and data trusts. Other solutions like secure multi-party computation are also mentioned. The key point that the authors raise is that the collection of more demographic data should only be done when it empowers more self-determination and agency for data subjects rather than an attempt by companies to “selectively tweak their systems and present them as fair without meaningfully improving the experience of marginalized groups.”

Why it matters: The biggest challenge that plagues the implementation of algorithmic fairness in real-world systems is the tension presented by legal requirements to minimize demographic data collection and the need for most modern approaches to fairness requiring that very same data. As more regulations come to market, we will be faced with an ever-growing set of (potentially conflicting) requirements on how fairness should be addressed and what data is allowed to be collected. How companies with users spanning multiple jurisdictions and serving many demographic groups solve these challenges in production-grade systems will be a key space to watch to learn if the current crop of methods actually works in practice.

####################################################

Tech Tales:

Form and Function and War

[The battlefields of Earth – 2028 – 2040]

For a while, wars were fought in technicolor. That’s because the humans figured out that they could confuse AI systems by varying the colors of their machines of war. Drones stopped being grey and started being rainbow colored. Quadcopters changed their black and tan shades for tie dye. This lasted for a while, as different armies sought to confuse eachother.
Of course, the AI systems adapted – given enough data, they learned to see past the unexpected and re-identify their targets.
The next logical place was shape – army engineers worked to divorce form from function, and were happy to pay aerodynamic efficiency prices in exchange for things that could no longer be seen. Missiles became mushroom shaped. Planes started to take on the form of weather balloons and even stranger things. Artillery became housed within bouncy castles.

The footage of these wars was surreal – fields of fake trees that were in fact autonomous sniper towers. Lines of bouncy castles launching multicolored balloons into the air which sailed overhead before coming down and exploding in white-light and white-heat and concussive thumps. Armies of golf carts that vroom’d through urban centers before detonating.
Again, the AI systems adapted. They learned to understand some of the concepts of war – learned, pretty quickly, to become suspicious of anything and everything. This led to the situation we find ourselves in today – wars are now invisible. In fact, wars haven’t occurred for several years. That’s because the AI systems learned strategy and counter-strategy and so now fight wars in secret, tussling via trade and litigation and standards and all the other things that shape the context for how nations relate to one another. The AI systems are continually evolving new strategies; it is as though they’re now playing chess on boards whose dimension a human mind cannot comprehend. Yet in the military centers of the world powers, computers everyday output their gnomic probabilities – the probability the nation will continue to exist in some time period in the future, as judged by the strategist AIs, playing their inscrutable games.
Neither a cold or a hot war – instead, a neverending existential negotiation.

Things that inspired this story: How war strategists always seek to find the ‘high ground’ and what ‘high ground’ means conceptually; the logical endpoint of a conflict is to win the conflict before it has started; adversarial AI and adversarial examples; evolutionary pressure.

Import AI

May 20, 2022

Import AI 295: DeepMind’s baby general agent; NVIDIA simulates a robot factory; AI wars.

by Jack Clark