30 | November | 2020

Import AI 225: Tencent climbs the compute curve; NVIDIA invents a hard AI benchmark; a story about Pyramids and Computers

by Jack Clark

Want to build a game-playing AI? Tencent plans to release its ‘TLeague’ software to help:
…Tools for large-scale AI training…
Tencent has recently trained AI systems to do well at strategy games like StarCraft II, VizDoom, and Bomberman-clone ‘Pommerman’. To do that, it has built ‘TLeague’, software that it can use to train Competitive Self Play Multi Agent Reinforcement Learning (CSP-MARL) AI systems. TLeague comes with support for algorithms like PPO and V-Trace, and training regimes like Population Based Training.
Read more: TLeague: A Framework for Competitive Self-Play based Distributed Multii-Agent Reinforcement Learning (arXiv).
Get the code: TLeague will eventually be available on Tenceent’s GitHub page, according to the company.

###################################################

10 smart drones that (might) come to the USA:
…FAA regulations key to unlocking crazy new drones from Amazon, Matternet, etc…
The US, for many years a slow mover on drone regulation, is waking up. The Federal Aviation Administration recently published ‘airworthiness critiera’ for ten distinct drones. What this means is the FAA has evaluated a load of proposed designs and spat out a list of criteria the companies will need to meet to deploy the drones. Many of these new drones are designed to operate beyond the line of sight of an operator and a bunch of them come with autonomy baked in. By taking a quick look at the FAA applications, we can get a sense for the types of drones that might soon come to the USA.

The applicants’ drones range from five to 89 pounds and include several types of vehicle designs, including both fixed wing and rotorcraft, and are all electric powered. One notable applicant is Amazon, which is planning to do package delivery via drones that are tele-operated.

10 drones for surveillance, package delivery, medical material transport:
– Amazon Logistics, Inc: MK27: Max takeoff weight 89 pounds. Tele-operated logistics / package delivery.
– Airobotics: ‘OPTIMUS 1-EX‘: 23 pounds: Surveying, mapping, inspection of critical infrastructure, and patrolling.
– Flirtey Inc: Flirtey F4.5: 38 pounds: Delivering medical supplies and packages.
– Flytrex, FTX-M600P. 34 pounds. Package delivery.
– Wingcopter GmbH: 198 US: 53 pounds. Package delivery.
– TELEGRID Technologies, Inc. DE2020: 24 pounds. Package delivery.
– Percepto Robotics, Ltd. Percepto System 2.4: 25 pounds. Inspection and surveying of critical infrastructure.
– Matternet, Inc. M2: 29 pounds. Transporting medical materials.
– Zipline International Inc. Zip UAS Sparrow: 50 pounds: Transporting medical materials.
– 3DRobotics Government Services: 3DR-GS H520-G: 5 pounds: Inspection or surveying of critical infrastructure.
Read more: FAA Moving Forward to Enable Safe Integration of Drones (FAA).

###################################################

King of Honor – the latest complex game that AI has mastered:
…Tencent climbs the compute curve…
Tencent has built an AI system that can play Honor of Kings, a popular Chinese MOBA online game. The game is a MOBA – a game designed to be played online by two teams with multiple players per team, similar to games like Dota2 or League of Legends. These games are challenging for AI systems to master because of the range of possible actions that each character can take at each step, and also because of the combinatorially explosive gamespace due to a vast character pool. For this paper, Tencent trains on the full 40-character pool of Honor of Kings.

How they did it: Tencent uses a multi-agent training curriculum that operates in three phases. In the first phase, the system splits the character pool into distinct groups, then has them play each other and trains systems to play these matchups. In the second, it uses these models as ‘teachers’ which train a single ‘student’ policy. In the third phase, they initialize their network using the student model from the second phase and train on further permutations of players.
How well they do: Tencent deployed the AI model into the official ‘Honor of Kings’ game for a week in May 2020; their system played 642,047 matches against top-ranked players, winning 627,280 matches, with a win rate of 97.7%.

Scale – and what it means: Sometimes, it’s helpful to step back from analyzing AI algorithms themselves and think about the scale at which they operate. Scale is both good and bad – large scale computationally-expensive experiments have, in recent years, led to a lot of notable AI systems, like AlphaGo, Dota 2, AlphaFold, GPT3, and so on, but the phenomenon has also made some parts of AI research quite expensive. This Tencent paper is another demonstration of the power of scale: their training cluster involves 250,000 CPU cores and 2,000 NVIDIA V100 GPUS – that compares to systems of up to ~150,000 CPUs and ~3000 GPUs for things like Dota 2 (OpenAI paper, PDF).
Computers are telescopes: These computer infrastructures like telescopes – the larger the set of computers, the larger the experiments we can run, letting us ‘see’ further into the future of what will one day become trainable on home computers. Imagine how strained the world will be when tasks like this are trainable on home hardware – and imagine what else must become true for that to be possible.
Read more: Towards Playing Full MOBA Games With Deep Reinforcement Learning (arXiv).

###################################################

Do industrial robots dream of motion-captured humans? They might soon:
…Smart robots need smart movements to learn from…
In the future, factories are going to contain a bunch of humans working alongside a bunch of machines. These machines will probably be the same as those we have today – massive, industrial robots from companies like Kuka, Fanuc, and Universal Robots – but with a twist: they’ll be intelligent, performing a broader range of tasks and also working safely around people while doing it (today, many robots sit in their own cages to stop them accidentally hurting people).
A new dataset called MoGaze is designed to bring this sader, smart robot future forward. MoGaze is a collection of 1,627 individual movements recorded via people wearing motion capture suits with gaze trackers.

What makes MoGaze useful: MoGaze contains data made up of motion capture suits with more than 50 reflecting markets each, as well as head-mounted rigs that track the participants gazes. Combine this with a broad set of actions involving navigating from a shelf to a table around chairs and manipulating a bunch of different objects, and you have quite a rich dataset.

What can you do with this dataset? Quite a lot – the researchers use to it attempt context-aware full-body motion prediction, training ML systems to work out the affordances of objects, figuring out human intent via predicting their gaze, and so on.
Read more: MoGaze: A Dataset of Full-Body Motions that Includes Workspace Geometry and Eye-Gaze (arXiv).
Get the dataset here (MoGaze official site).
GitHub: MoGaze.

###################################################

NVIDIA invents an AI intelligence test that most modern systems flunk:
…BONGARD-LOGO could be a reassuringly hard benchmark for evaluating intelligence (or the absence of it) in our software…
NVIDIA’s new ‘BONGARD-LOGO’ benchmark tests out the visual reasoning capabilities of an AI system – and in tests the bestAI approaches get accuracies of around 60% to 70% across four tasks, compared to expert human scores of around 90% to 99%.

BONGARD history: More than fifty years ago, a russian computer scientist invented a hundred human-designed visual recognition tasks that humans could solve easily, but humans couldn’t. BONGARD-LOGO is an extension of this, consisting of 12,000 problem instances – large enough that we can train modern ML systems on it, but small and complex enough to pose a challenge.

What BONGARD tests for: BONGARD ships with four inbuilt tests, which evaluate how well machines can predict new visual shapes from a series of prior ones, how well they can recognize pairs of shapes built with similar rules, how to identify the common attributes of a bunch of dissimilar shapes, and an ‘abstract’ test which evaluates it on things it hasn’t seen during testing.
Read more: Building a Benchmark for Human-Level Concept Learning and Reasoning (NVIDIA Developer blog).
Read more in this twitter thread from Anima Anandkumar (Twitter).
Read the research paper: BONGARD-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning (arXiv).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Are ML models getting harder to find?
One strand of growth economics tries to understand the shape of the ‘knowledge production function’, and specifically, how society’s output of new ideas depends on the existing stock of knowledge. This dissertation seeks to understand this with regards to ML progress.

Two effects: We can consider two opposing effects: (1) ‘standing-on-shoulders’ — increasing returns to knowledge; innovation is made easier by previous progress; (2) ’stepping-on-toes’ — decreasing returns to knowledge due to e.g. duplication of work.

Empirical evidence: Here, the author finds evidence for both effects in ML — measuring output as SOTA performance on 93 benchmarks since 2012, and input as the ‘effective’ (salary-adjusted) number of scientists. Overall, average ML research productivity has been declining by between 4 and 26% per year, suggesting the ‘stepping-on-toes’ effect dominates. As the author notes, the method has important limitations — notably, the chosen proxies for input and output are imperfect, and subject to mismeasurement.

Matthew’s view: Improving our understanding of AI progress can help us forecast how the technology will develop in the future. This sort of empirical study is a useful complement to recent theoretical work— e.g. Jones & Jones’ model of automated knowledge production on which increasing returns to knowledge leads to infinite growth in finite time (a singularity) under reasonable-seeming assumptions.
Read more: Are models getting harder to find?;
Check out the author’s Twitter thread
Read more: Economic Growth in the Long Run — Jones & Jones (FHI webinar)

Uganda using Huawei face recognition to quash dissent:

In recent weeks, Uganda has seen huge anti-government protests, with dozens of protesters killed by police, and hundreds more arrested. Police have confirmed that they are using a mass surveillance system, including face recognition, to identify protesters. Last year, Uganda’s president, Yoweri Museveni, tweeted that the country’s capital was monitored by 522 operators at 83 centres; and that he planned to roll out the system across the country. The surveillance network was installed by Chinese tech giant, Huawei, for a reported $126m (equivalent to 30% of Uganda’s health budget).

Read more: Uganda is using Huawei’s facial recognition tech to crack down on dissent after anti-government protests (Quartz).

###################################################
Tech Tales:

The Pyramid
[Within two hundred light years of Earth, 3300]

“Oh god damn it, it’s a Pyramid planet.”
“But what about the transmissions?”
“Those are just coming from the caretakers. I doubt there’s even any people left down there.”
“Launch some probes. There’s gotta be something.”

We launched the probes. The probes scanned the planet. Guess what we found? The usual. A few million people on the downward hill of technological development, forgetting their former technologies. Some of the further out settlements had even started doing rituals.

What else did we find? A big Pyramid. This one was on top of a high, desert plain – probably placed there so they could use the wind to cool the computers inside it. According to the civilization’s records, the last priests had entered the Pyramid three hundred years earlier and no one has gone in since.

When we look around the rest of the planet we find the answer – lots of powerplants, but most of the resources spent, and barely any metal or petrochemical deposits near the planet’s surface anymore. Centuries of deep mining and drilling have pulled most of the resources out of the easily accessible places. The sun isn’t as powerful as the one on Earth, so we found a few solar facilities, but none of them seemed very efficient.

It doesn’t take a genius to guess what happened: use all the power to bootstrap yourself up the technology ladder, then build the big computer inside the Pyramid, then upload (some of) yourself, experience a timeless and boundless digital nirvana, and hey presto – your civilisation has ended.

Pyramids always work the same way, even on different planets, or at different times.

Things that inspired this story: Large-scale simulations; the possibility that digital transcendence is a societal end state; the brutal logic of energy and mass; reading histories of ancient civilisations; the events that occurred on Easter Island leading to ecological breakdown; explorers.

Import AI

November 30, 2020

Import AI 225: Tencent climbs the compute curve; NVIDIA invents a hard AI benchmark; a story about Pyramids and Computers

by Jack Clark