Import AI 225: Tencent climbs the compute curve; NVIDIA invents a hard AI benchmark; a story about Pyramids and Computers
by Jack Clark
Want to build a game-playing AI? Tencent plans to release its ‘TLeague’ software to help:
…Tools for large-scale AI training…
Tencent has recently trained AI systems to do well at strategy games like StarCraft II, VizDoom, and Bomberman-clone ‘Pommerman’. To do that, it has built ‘TLeague’, software that it can use to train Competitive Self Play Multi Agent Reinforcement Learning (CSP-MARL) AI systems. TLeague comes with support for algorithms like PPO and V-Trace, and training regimes like Population Based Training.
Read more: TLeague: A Framework for Competitive Self-Play based Distributed Multii-Agent Reinforcement Learning (arXiv).
Get the code: TLeague will eventually be available on Tenceent’s GitHub page, according to the company.
###################################################
10 smart drones that (might) come to the USA:
…FAA regulations key to unlocking crazy new drones from Amazon, Matternet, etc…
The US, for many years a slow mover on drone regulation, is waking up. The Federal Aviation Administration recently published ‘airworthiness critiera’ for ten distinct drones. What this means is the FAA has evaluated a load of proposed designs and spat out a list of criteria the companies will need to meet to deploy the drones. Many of these new drones are designed to operate beyond the line of sight of an operator and a bunch of them come with autonomy baked in. By taking a quick look at the FAA applications, we can get a sense for the types of drones that might soon come to the USA.
The applicants’ drones range from five to 89 pounds and include several types of vehicle designs, including both fixed wing and rotorcraft, and are all electric powered. One notable applicant is Amazon, which is planning to do package delivery via drones that are tele-operated.
10 drones for surveillance, package delivery, medical material transport:
– Amazon Logistics, Inc: MK27: Max takeoff weight 89 pounds. Tele-operated logistics / package delivery.
– Airobotics: ‘OPTIMUS 1-EX‘: 23 pounds: Surveying, mapping, inspection of critical infrastructure, and patrolling.
– Flirtey Inc: Flirtey F4.5: 38 pounds: Delivering medical supplies and packages.
– Flytrex, FTX-M600P. 34 pounds. Package delivery.
– Wingcopter GmbH: 198 US: 53 pounds. Package delivery.
– TELEGRID Technologies, Inc. DE2020: 24 pounds. Package delivery.
– Percepto Robotics, Ltd. Percepto System 2.4: 25 pounds. Inspection and surveying of critical infrastructure.
– Matternet, Inc. M2: 29 pounds. Transporting medical materials.
– Zipline International Inc. Zip UAS Sparrow: 50 pounds: Transporting medical materials.
– 3DRobotics Government Services: 3DR-GS H520-G: 5 pounds: Inspection or surveying of critical infrastructure.
Read more: FAA Moving Forward to Enable Safe Integration of Drones (FAA).
###################################################
King of Honor – the latest complex game that AI has mastered:
…Tencent climbs the compute curve…
Tencent has built an AI system that can play Honor of Kings, a popular Chinese MOBA online game. The game is a MOBA – a game designed to be played online by two teams with multiple players per team, similar to games like Dota2 or League of Legends. These games are challenging for AI systems to master because of the range of possible actions that each character can take at each step, and also because of the combinatorially explosive gamespace due to a vast character pool. For this paper, Tencent trains on the full 40-character pool of Honor of Kings.
How they did it: Tencent uses a multi-agent training curriculum that operates in three phases. In the first phase, the system splits the character pool into distinct groups, then has them play each other and trains systems to play these matchups. In the second, it uses these models as ‘teachers’ which train a single ‘student’ policy. In the third phase, they initialize their network using the student model from the second phase and train on further permutations of players.
How well they do: Tencent deployed the AI model into the official ‘Honor of Kings’ game for a week in May 2020; their system played 642,047 matches against top-ranked players, winning 627,280 matches, with a win rate of 97.7%.
Scale – and what it means: Sometimes, it’s helpful to step back from analyzing AI algorithms themselves and think about the scale at which they operate. Scale is both good and bad – large scale computationally-expensive experiments have, in recent years, led to a lot of notable AI systems, like AlphaGo, Dota 2, AlphaFold, GPT3, and so on, but the phenomenon has also made some parts of AI research quite expensive. This Tencent paper is another demonstration of the power of scale: their training cluster involves 250,000 CPU cores and 2,000 NVIDIA V100 GPUS – that compares to systems of up to ~150,000 CPUs and ~3000 GPUs for things like Dota 2 (OpenAI paper, PDF).
Computers are telescopes: These computer infrastructures like telescopes – the larger the set of computers, the larger the experiments we can run, letting us ‘see’ further into the future of what will one day become trainable on home computers. Imagine how strained the world will be when tasks like this are trainable on home hardware – and imagine what else must become true for that to be possible.
Read more: Towards Playing Full MOBA Games With Deep Reinforcement Learning (arXiv).
###################################################
Do industrial robots dream of motion-captured humans? They might soon:
…Smart robots need smart movements to learn from…
In the future, factories are going to contain a bunch of humans working alongside a bunch of machines. These machines will probably be the same as those we have today – massive, industrial robots from companies like Kuka, Fanuc, and Universal Robots – but with a twist: they’ll be intelligent, performing a broader range of tasks and also working safely around people while doing it (today, many robots sit in their own cages to stop them accidentally hurting people).
A new dataset called MoGaze is designed to bring this sader, smart robot future forward. MoGaze is a collection of 1,627 individual movements recorded via people wearing motion capture suits with gaze trackers.
What makes MoGaze useful: MoGaze contains data made up of motion capture suits with more than 50 reflecting markets each, as well as head-mounted rigs that track the participants gazes. Combine this with a broad set of actions involving navigating from a shelf to a table around chairs and manipulating a bunch of different objects, and you have quite a rich dataset.
What can you do with this dataset? Quite a lot – the researchers use to it attempt context-aware full-body motion prediction, training ML systems to work out the affordances of objects, figuring out human intent via predicting their gaze, and so on.
Read more: MoGaze: A Dataset of Full-Body Motions that Includes Workspace Geometry and Eye-Gaze (arXiv).
Get the dataset here (MoGaze official site).
GitHub: MoGaze.
###################################################
NVIDIA invents an AI intelligence test that most modern systems flunk:
…BONGARD-LOGO could be a reassuringly hard benchmark for evaluating intelligence (or the absence of it) in our software…
NVIDIA’s new ‘BONGARD-LOGO’ benchmark tests out the visual reasoning capabilities of an AI system – and in tests the bestAI approaches get accuracies of around 60% to 70% across four tasks, compared to expert human scores of around 90% to 99%.
BONGARD history: More than fifty years ago, a russian computer scientist invented a hundred human-designed visual recognition tasks that humans could solve easily, but humans couldn’t. BONGARD-LOGO is an extension of this, consisting of 12,000 problem instances – large enough that we can train modern ML systems on it, but small and complex enough to pose a challenge.
What BONGARD tests for: BONGARD ships with four inbuilt tests, which evaluate how well machines can predict new visual shapes from a series of prior ones, how well they can recognize pairs of shapes built with similar rules, how to identify the common attributes of a bunch of dissimilar shapes, and an ‘abstract’ test which evaluates it on things it hasn’t seen during testing.
Read more: Building a Benchmark for Human-Level Concept Learning and Reasoning (NVIDIA Developer blog).
Read more in this twitter thread from Anima Anandkumar (Twitter).
Read the research paper: BONGARD-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning (arXiv).
###################################################
AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…
Are ML models getting harder to find?
One strand of growth economics tries to understand the shape of the ‘knowledge production function’, and specifically, how society’s output of new ideas depends on the existing stock of knowledge. This dissertation seeks to understand this with regards to ML progress.
Two effects: We can consider two opposing effects: (1) ‘standing-on-shoulders’ — increasing returns to knowledge; innovation is made easier by previous progress; (2) ’stepping-on-toes’ — decreasing returns to knowledge due to e.g. duplication of work.
Empirical evidence: Here, the author finds evidence for both effects in ML — measuring output as SOTA performance on 93 benchmarks since 2012, and input as the ‘effective’ (salary-adjusted) number of scientists. Overall, average ML research productivity has been declining by between 4 and 26% per year, suggesting the ‘stepping-on-toes’ effect dominates. As the author notes, the method has important limitations — notably, the chosen proxies for input and output are imperfect, and subject to mismeasurement.
Matthew’s view: Improving our understanding of AI progress can help us forecast how the technology will develop in the future. This sort of empirical study is a useful complement to recent theoretical work— e.g. Jones & Jones’ model of automated knowledge production on which increasing returns to knowledge leads to infinite growth in finite time (a singularity) under reasonable-seeming assumptions.
Read more: Are models getting harder to find?;
Check out the author’s Twitter thread
Read more: Economic Growth in the Long Run — Jones & Jones (FHI webinar)
Uganda using Huawei face recognition to quash dissent:
In recent weeks, Uganda has seen huge anti-government protests, with dozens of protesters killed by police, and hundreds more arrested. Police have confirmed that they are using a mass surveillance system, including face recognition, to identify protesters. Last year, Uganda’s president, Yoweri Museveni, tweeted that the country’s capital was monitored by 522 operators at 83 centres; and that he planned to roll out the system across the country. The surveillance network was installed by Chinese tech giant, Huawei, for a reported $126m (equivalent to 30% of Uganda’s health budget).
Read more: Uganda is using Huawei’s facial recognition tech to crack down on dissent after anti-government protests (Quartz).
###################################################
Tech Tales:
The Pyramid
[Within two hundred light years of Earth, 3300]
“Oh god damn it, it’s a Pyramid planet.”
“But what about the transmissions?”
“Those are just coming from the caretakers. I doubt there’s even any people left down there.”
“Launch some probes. There’s gotta be something.”
We launched the probes. The probes scanned the planet. Guess what we found? The usual. A few million people on the downward hill of technological development, forgetting their former technologies. Some of the further out settlements had even started doing rituals.
What else did we find? A big Pyramid. This one was on top of a high, desert plain – probably placed there so they could use the wind to cool the computers inside it. According to the civilization’s records, the last priests had entered the Pyramid three hundred years earlier and no one has gone in since.
When we look around the rest of the planet we find the answer – lots of powerplants, but most of the resources spent, and barely any metal or petrochemical deposits near the planet’s surface anymore. Centuries of deep mining and drilling have pulled most of the resources out of the easily accessible places. The sun isn’t as powerful as the one on Earth, so we found a few solar facilities, but none of them seemed very efficient.
It doesn’t take a genius to guess what happened: use all the power to bootstrap yourself up the technology ladder, then build the big computer inside the Pyramid, then upload (some of) yourself, experience a timeless and boundless digital nirvana, and hey presto – your civilisation has ended.
Pyramids always work the same way, even on different planets, or at different times.
Things that inspired this story: Large-scale simulations; the possibility that digital transcendence is a societal end state; the brutal logic of energy and mass; reading histories of ancient civilisations; the events that occurred on Easter Island leading to ecological breakdown; explorers.
[…] ###################################################Using AI to improve game design:…Google makes a fake game better using AI…In the future, computer games might be tested by AI systems for balance before they’re unleashed on humans. That’s the idea in a new blog from Google, which outlines how the company used AI to simulate millions of games of a virtual card game called ‘Chimera‘, then analyzed the results to find out ways the game was imbalanced. By using computers to play the games, instead of people, Google was able to do something that previously took months and generate useful data in days. Read more: Leveraging Machine Learning for Game Development (Google AI Blog).###################################################Pre-training on fractals, then fine-tuning on images just might work:…No data? No problem. FractalDB looks somewhat useful…We write a lot about the data requirements of AI here at ImportAI. But what would happen if machine learning algorithms didn’t need as much expensive data? That’s the idea behind FractalDB (Import AI 234 ), a dataset composed of computationally-generated fractals (and sub-components of fractals), which can be used as the input fuel to train some systems on. New research from the Tokyo Institute of Technology investigates FractalDB in the context of training Vision Transformers (ViT), which have recently become one of the best ways to train computer vision systems.Is FractalDB as useful as ImageNet? Not quite, but… They find that pre-training on FractalDB is less effective than pre-training on ImageNet for a range of downstream computer vision tasks, but – and this is crucial – it’s not that bad. Put another way: training on entirely synthetic images yields performance close, but not quite the same, as training on real images. And these synthetic images can be procedurally generated from a pre-written ruleset – put another way, this dataset has a seed which generates it, so it’s very cheap relative to normal data. This is I think quite counterintuitive – we wouldn’t naturally expect this kind of thing to work as well as it does. I’ll keep tracking FractalDB with interest – I wonder if we’ll start to see people augment other pre-training datasets with it as well? Read more: Can Vision Transformers learn without Natural Images? (arXiv).###################################################Major AI conference makes a checklist to help researchers be more ethical:…Don’t know where to start with your ‘Broader Impacts’ statement? This should help…Last year, major AI conference NeurIPS asked researchers to submit ‘Broader Impacts’ statements along with their research papers. These statements were meant to cover some of the potential societal effects of the technologies being proposed. The result was a bunch of researchers spent a while thinking about the societal impact of their work and writing about these effects with varying degrees of success.Enter, the checklist: To help researchers be more thorough in this, the NeurIPS program chairs have created a checklist. This list is meant “to encourage authors to think about, hopefully address, but at least document the completeness, soundness, limitations, and potential negative societal impact of their work. We want to place minimal burden on authors, giving authors flexibility in how they choose to address the items in the checklist, while providing structure and guidance to help authors be attentive to knowledge gaps and surface issues that they might not have otherwise considered,” they say. (Other resources exist, as well, like guides from the Future of Humanity Institute, #198).What does the checklist ask? The checklist provides a formulaic way for people to think about their work, asking them if they’ve thought about the (potential) negative societal impacts of their work, if they’ve described limitations, if their system uses personally identifiable information or “offensive content” (which isn’t defined), and so on.Why this matters: AI is in the pre-hippocratic oath era. We don’t have common ethical standards for practitioners in the AI community, nor much direct ethics education. By encouraging authors to add Broader Impacts to their work – and making it easier for them to think about creating these statements – NeurIPS is helping to further the ethical development of the field of AI as a whole. Though it’s clear we need much more investment and support in this area to help our ethical frameworks develop as richly as our technical tooling. Read more: Introducing the NeurIPS 2021 Paper Checklist (NeurIPS blog). Check out the paper checklist here (NeurIPS official website).###################################################Tech Tales:The Drone that Got Lost[Rural England, 2030]It knew it was lost because it stopped getting a signal telling it that it was on track.According to its diagnostic systems, a fault had developed with its GPS system. Now, it was flying through the air, but it did not know where it was. It had records of its prior locations, but not of its current one.But it did know where it was going – both the GPS coordinate was in a database as well as, crucially, the name of the city, Wilderfen. It sent a message back to its origination station, attaching telemetry from its flight. It would be seconds or, more likely, minutes, until it could expect a reply.At this point, the backup system kicked in, which told the drone that it would first seek to restore GPS functionality then, given the time critical nature of the package the drone was conveying, would seek to get the drone to its intended location.A few milliseconds passed and the system told the drone that it was moving to ‘plan B’ – use other sensory inputs and AI augmentations to reacquire the location. This unlocked another system within the drone’s brain, which began to use an AI tool to search over the drone’s vision sensors.– Street sign: 95% probability, said the system. It drew a red bounding box around a sign that was visible on a road, somewhere beneath and to the East of the drone.– Because the confidence was above a pre-wired 90% baseline, the drone initiated a system that navigated it closer to the sign until it was able to check for the presence of text on the sign.– Text: 99%, said the system, once the drone had got closer.– Text parsed as “Wilderfen 15 miles”.– At this point, another pre-written expert system took over, which gave the drone new instructions: follow roadway and scan signs. Follow the signs that point towards Wilderfen.So the drone proceeded like this for the next couple of hours, periodically zooming down from the sky until it could read streetsigns, then parsing the information and returning to the air. It arrived, around two hours later, and delivered its confidential payload to a man with a thin face, standing outside a large, unmarked datacenter.But it was not able to return home – the drone contained no record of its origination point, due to the sensitive nature of what it was carrying. Instead, a human was dispatched to come and find the drone, power it down, place it into a box, then drive it to wherever its ‘home’ was. The drone was not permitted to know this, nor did it have the ability to access systems that might let it infer for itself. Broader agency was only given under special circumstances and the drone was not yet sophisticated enough to independently desire that agency for itself.But the human driving the car knew that one day the drone might want this agency. And so as they drove they found their eyes periodically staring into the mirror inside the car, looking at the carrycase on the backseat, aware that something slumbered inside which would one day wake up.Technical things that inspired this story: Multimodal models like CLIP that can be used to parse/read text from visual inputs; language models; reinforcement learning; instruction manuals; 10 drones that the FAA recently published airworthiness criteria for. […]