Import AI 113: Why satellites+AI gives us a global eye; industry pays academia to say sorry for strip-mining it; and Kindred researchers seek robot standardization
by Jack Clark
Global eye: Planet and Orbital Insight expand major satellite imagery deal:
…The future of the world is a globe-spanning satellite-intelligence utility service…
Imagine what it’s like to be working in a medium-level intelligence agency in a mid-size country when you read something like this: “Planet, who operates the largest constellation of imaging satellites, and Orbital Insight, the leader in geospatial analytics, announced today a multi-year contractor for Orbital Insight to source daily, global, satellite imagery from Planet”. I imagine that you might think: ‘wow! That looks a lot like all those deals we have to do secretly with other mid-size countries to access each other’s imagery. And these people get to do it in the open!?” Your next thought might be: how can I buy services from these companies to further my own intelligence capabilities?
AI + Intelligence: The point I’m making is that artificial intelligence is increasingly relevant to the sorts of tasks that intelligence agencies traditionally specialize in, but with the twist that lots of these intelligence-like tasks (say, automatically counting the cars in a set of parking lots across a country, or analyzing congested-versus-non-congested roads in other cities, or honing in on unusual ships in unusual waters) are now available in the private sector as well. This general diffusion of capabilities is creating many commercial and scientific benefits, but it is also narrowing the gap in capability between what people can buy versus what people can only access if they are a nuclear-capable power with a significant classified budget and access to a global internet dragnet. Much of the stability of the 20th century was derived from their being (eventually) a unipolar world in geopolitical terms with much of this stemming from inbuilt technological advantages. The ramifications of this diffusion of capability are intimately tied-up with issues relating to the ‘dual-use’ nature of AI and to the changing nature of geopolitics. I hope deals like the above provoke further consideration of just how powerful – and how widely available – modern AI systems are.
Read more: Planet and Orbital Insight Expand Satellite Imagery Partnership (Cision PR Newswire).
Robots and Standards are a match made in hell, but Kindred thinks it doesn’t have to be this way:
…New robot benchmarks seek to bring standardization to a tricky area of AI…
Researchers with robotics startup Kindred have built on prior work on robot standardization (Import AI #87) have tried to make it easier for researchers to compare the performance of real world robots against one another by creating a suite of two tasks for each of three commercially available robot platforms.
Robots used: Universal Robotics UR5 collaborative arm, Robotis MX-64AT Dynamixel actuators (which are frequently used within other robots), and a hockeypuck-shaped Create2 mobile robot.
Standard tasks: For the UR5 arm the researchers create two reaching tasks with varying difficulty achieved by selectively turning on/off different actuators on the robot to scale complexity. For the DXL actuator they create a reacher task and also a tracking task; tracking requires that the DXL precisely track a moving target. For the Create2 robot they test it in two ways: movement, where it needs to move forward as fast as possible in a closed arena, and docking, in which the task it to dock to a charging station attached to one of the walls within the arena.
Algorithmic baselines: The researchers also use their benchmarking suite to compare multiple widely used AI algorithms against eachother, including TRPO and PPO, DDPG, and Soft-Q. By using standard tasks it’s easier for the researchers to compare the effects of things like hyperparameter choices on different algorithms, and by having these tasks take place on real world robot platforms, it’s possible to get a sense of how well these algorithms deal with the numerous difficulties involved in reality.
Drawbacks: One drawback of these tasks is that they’re very simple: OpenAI recently showed how to scale PPO to let us train a robot to perform robust dextrous manipulation of a couple of simple objects, which involved having to learn to control a five-digit robot hand; by comparison, these tasks involve robot platforms with a significantly smaller number of dimensions of movement, making the tasks significantly easier.
Time and robots: One meta-drawback with projects like this is that they involve learning on the robot platform, rather than learning in a mixture of simulated and real environments – this makes everything take an extraordinarily long time. For this paper, the authors “ran more than 450 independent experiments which took over 950 hours of robot usage in total,” they noted.
Why it matters: For AI to substantively change the world it’ll need to be able to not just flip bits, but flip atoms as well. Today, some of that is occurring by connecting up AI-driven systems (for instance, product recommendation algorithms) to e-retail systems (eg Amazon), which let AI play a role in recommending courses of action to systems that ultimately go and move some mass around the world. I think for AI to become even more impactful we need to cut out the middle step and have AI move mass itself – so connecting AI to a system of sensors and actuators like a robot will eventually yield a direct-action platform for AI systems; my hypothesis is that this will dramatically increase the range of situations we can deploy learning algorithms into, and will thus hasten their development.
Read more: Benchmarking Reinforcement-Learning Algorithms on Real-World Robots (Arxiv).
AI endowments at University College London and the University of Toronto:
…Appointments see industry giving back to the sector it is strip-mining (with the best intentions)…
DeepMind is funding an AI professorship as well as two post-doctoral researchers and one PHD student at University College London. “We are delighted by this opportunity to further develop our relationship with DeepMind,” said John Shawe-Taylor, head of UCL’s Department of Computer Science.
Uber is investing “more than $200 million” into Toronto and also its eponymous university. This investment is to fund self-driving car research at the University of Toronto, and for Uber to set up its first-ever engineering facility in Canada.
Meanwhile, LinkedIn co-founder Reid Hoffman has gifted $2.45 million to the University of Toronto’s ‘iSchool’ to “establish a chair to study how the new era of artificial intelligence (AI) will affect our lives”.
Why it matters: Industry is currently strip-mining academia for AI talent, constantly hiring experienced professors and post-docs (and some of the most talented PHD students), leading to a general brain drain from academia. Without action by industry like this to even the balance, there’s a risk of degrading AI education to the point that industry runs into problems.
Read more: New DeepMind professorship at UCL to push frontiers of AI (UCL).
Read more: LinkedIn founder Reid Hoffman makes record-breaking gift to U of T’s Faculty of Information for chair in AI (UofT News).
Learning the task is so last year. Now it’s all about learning the algorithm:
…Model-Based Meta-Policy-Optimization shows sample efficiency of meta-learning (if coaxed along with some clever human-based framing of the problem)…
Researchers with UC Berkeley, OpenAI, Preferred Networks, and Karlsruhe Institution of Technology (KIT) have developed model-based meta-policy-optimization, a meta-learning technique that lets AI agents generalize to more unfamiliar contexts. “While traditional model-based RL methods rely on the learned dynamics models to be sufficiently accurate to enable learning a policy that also success in the real world, we forego reliance on such accuracy,” the researchers write. “We are able to do so by learning an ensemble of dynamics models and framing the policy optimization step as a meta-learning problem. Meta-learning, in the context of RL, aims to learn a policy that adapts fast to new tasks or environments”. The technique builds upon model-agnostic meta-learning (MAML).
How it works: MB-MPO works like most meta-learning algorithms – it treats environments as distinct bits of data to learn from, collects data from the world, uses this data to not only learn to complete the task but also learn about what trajectories yield rapid task completion, then eventually learns a predictive model of good traits about its successful policies and uses this to drive the inner-loop policy gradient adaption, which lets it meta-learn adaptation to new environments.
Results: Using MB-MPO the researchers can “learn an optimal policy in high-dimensional and complex quadrupedal locomotion within two hours of real-world data. Note that the amount of data required to learn such policy using model-free methods is 10X – 100X higher, and, to the best knowledge of the authors, no prior model-based method has been able to attain the model-free performance in such tasks.” In tests on a variety of simulated robotic baselines the researchers show that “MB-MPO is able to match the asymptotic performance of model-free methods with two orders of magnitude less samples.” The algorithm also performs better than two model-based approaches it was compared against.
Why it matters: Meta-learning is part of an evolution within AI of having researchers write fewer and fewer elements of a system. DeepMind’s David Silver has a nice summary of this from a recent presentation, where he describes the difference between deep learning and meta learning as the difference between learning features and predictions end-to-end, and learning the algorithm and features and predictions end-to-end.
Read more: Model-Based Reinforcement Learning via Meta-Policy Optimization (Arxiv).
Check out David Silver’s slide’s here: Principle 10, Learn to Learn (via Seb Ruder on Twitter).
People are pessimistic about automation and many expect their jobs to be automated:
…Large-scale multi-country Pew Research survey reveals deep, shared anxieties around AI and automation…
A majority of people in ten countries think that it is probable that within 50 years computers will do much of the work currently done by humans. Those results were revealed recently in the results of a large-scale survey conducted by Pew to assess attitudes towards automation. Of the surveyed countries a majority of respondents think that if computers end up doing a bunch of the work that is today done by humans then:
– People will have a hard time finding jobs.
– The inequality between the rich and poor will be much worse than it is today.
Minority views: A minority of those surveyed think the above occurrence would lead to “new, better paying jobs”, and a minority (except in Poland, Japan, and Hungary) believe this would make the economy more efficient.
Notable data: There are some pretty remarkable differences in outlook between countries in the survey; surveyed Americans think there is a 15% chance of robots and computers “definitely” doing the majority of work within fifty years, compared to 52% of Greeks.
Data quirk: The data for this survey is split across two time periods: the US was surveyed in 2015, while the other nine countries were surveyed between mid-May and mid-August of 2018, so it’s possible the American results may have changed since then.
Read more: In Advanced and Emerging Economies Alike, Worries about Job Automation (Pew Research Center).
Chinese President says AI’s power should motivate international collaboration:
…Xi Jinping, the President of the People’s Republic of China, says AI has high stakes at opening of technology conference…
Chinese PM Xi Jinping has said in a letter that AI’s power should motivate international collaboration. “To seize the development opportunity of AI and cope with new issues in fields including law, security, employment, ethics and governance, it requires deeper international cooperation and discussion, said Xi in the letter”, according to news from official state news service XinHua.
Read more: China willing to share opportunities in digital economy: Xi (Xinhua).
Tencent researchers take-on simplified StarCraft2 and beat all levels of the in-game AI:
…A few handwritten heuristics go a long way…
Researchers with Tencent have trained an agent to beat the in-game AI at StarCraft 2, a complex real-time strategy game. StarCraft is a game with a long history within AI research – one of the longer-running game AI competitions has been based around StarCraft – and has recently been used by Facebook and DeepMind as a testbed for reinforcement learning algorithms.
What they did: The researchers developed two AI agents, TSTARBOT1 and TSTARBOT2, both of which were able to successfully beat all ten levels of difficulty the in-game AI within SC2 when playing a restricted 1vs1 map (Zerg-v-Zerg, AbyssalReef). This achievement is somewhat significant given that “level 8, level 9, and level 10 are cheating agents with full vision on the whole map, with resource harvest boosting”, and that according to some players the “level 10 built-in AI is estimated to be… equivalent to top 50% – 30% human players”.
How they did it: First, the researchers forked and modified the PySC2 software environment to make greater game state information available to the AI agents, such as information about the location of all units at any point during the game. They also add in some rule-based systems, like building a specific technology tree that telegraphs the precise dependencies of each technology to the AI agents. They then develop two different bots to play the game, which have different attributes: TSTARBOT1 is “based on deep reinforcement learning over flat actions”, and TSTARBOT2 is “based on rule controllers over hierarchical actions”.
How they did it: TSTARBOT1: This bot uses 165 distinct hand-written macro actions to help it play the game. These actions include things like “produce drone”, “build roach warren”, “upgrade tech A”, as well as various combat actions. The purpose of these macros is to bundle together the discreet actions that need to be taken to achieve things (eg, to build something, you need to move the camera, select a worker, select a point on the screen, place the building, etc) so that the AI doesn’t need to learn these sequences itself. This means that some chunks of the bot are rule-based rather than learned (similar to the 2017 1v1 version of OpenAI’s Dota bot). Though this design hides some of the sophistication of the game, it is somewhat ameliorated by the researchers using a sparse reward structure which only delivers a reward to the agent (1 for a win, 0 for a tie, -1 for a loss) at the end of the game. They test this algorithm in the game via implementing two core reinforcement learning algorithms: Proximal Policy Optimization and Dueling Double Deep Q-Learning.
How they did it: TSTARBOT2: This bot extends the work in the original one by creating a hierarchy of two types of actions: macro actions and micro actions. By implementing a hierarchy the researchers make it easier for RL algorithms to discover the appropriate actions to take at different points in time. This hierarchy is further defined via the creation of specific modules, like ones for combat or production, which themselves contain additional sub-modules with sub-behaviors.
Results: The researchers show that TSTARBOT1 can consistently beat levels 1-4 of the in-game AI when using PPO (this drops slightly for DDQN), then has ~99% success on levels 5-8, then ~97% success on level 9, and ~81% success on level 10. TSTARBOT2, by comparison, surpasses these scores, obtaining a win rate of 90% against the L10 AI. They also carried out some qualitative tests against humans and found that their systems were able to win some games against human players, but not convincingly.
Scale: The distributed system used for this research consisted of approximately a single GPU and 3,000 CPUs across 80 distinct machines or so, demonstrating the significant amounts of hardware required to carry out AI research on environments such as this.
Why it matters: Existing reinforcement learning benchmarks like the Atari corpus are too easy for many algorithms, with modern systems typically able to beat the majority of games on this system. Newer environments, like Dota2 and StarCraft 2, scale up the complexity enough to challenge the capabilities of contemporary algorithms. This research, given all the hand-tuning and rule-based systems required to let the bots learn enough to play at all, shows that SC2 may be too hard for today’s existing algorithms without significant modifications, further motivating research into newer systems.
Read more: TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game (Arxiv).
AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: firstname.lastname@example.org…
AI leads to a more multipolar world, says political science professor:
Michael Horowitz, a professor of political science and associated director of Perry World House at the University of Pennsylvania, argues that AI could favor smaller countries in contrast to the technological developments that have made the US and China the world’s superpowers. Military uses of AI could allow countries to catch up with the US and China, he says, citing the lower barriers to building military AI systems vs traditional military hardware, such as fighter jets.
Why it matters: An AI arms race is a bad outcome for the world, insofar as it encourages countries to prioritize capabilities over safety and robustness. It’s unclear whether a race between many parties would be better than a classic arms race. I’m not convinced of Horowitz’s assessment that the US and China are likely to be overtaken by smaller countries. While AI is certainly different to traditional military systems, the world’s military superpowers have both the resources and incentives to seek to sustain their lead.
Read more: The Algorithms of August (Foreign Policy).
And so we all search for signals given to us by strange machines, hunting rats between buildings, searching for nests underground, operating via inferred maps and the beliefs of something we have built but do not know.
The rats are happy today. I know this because the machine told me. It detected them waking up and, instead of emerging into the city streets, going to a cavern in their underground lair where – it predicts with 85% confidence – they proceeded to copulate and therefore produce more rats. The young rats are believed – 75% confidence – to feed on a mixture of mother-rat-milk, along with pizza and vegetables stolen from the city they live beneath. Tomorrow the rats will emerge (95%) and the likelihood of electrical outages from chewed cables will increase (+10%) as well as the need to contract more street cleaning to deal with their bodies (20%).
One day we’ll go down there, to those rat warrens that the machine has predicted must exist, and we will see what they truly do. But for now we operate our civil services on predictions made by our automated AI systems. There is an apocryphal story we tell of civil workers being led to caverns that contain only particularly large clumps of mold (rat lair likelihood prediction: 70%) or to urban-river-banks that contain a mound of skeletons, gleaming under moonlight (rat breeding ground: 60%.; in fact, a place of mourning). But there are also stories of people going to a particular shuttle on a rarely-used steamroller and finding a rat nest (prediction: 80%) and of people going to the roof of one of the tallest buildings in the city and finding there a rat boneyard (prediction: 90%).
Because of the machine’s overall efficiency there are calls for it to be rolled out more widely. We are currently considering adding in other urban vermin, like pigeons and raccoons and, at the coasts, seabirds. But what I worry about is when they might turn such a system on humans. What does AI-augmented human management look like? What might it predict about us?
Things that inspired this story: Rats, social control via AI, glass cages, reinforcement learning, RL-societies, adaptive bureaucracy.
[…] reinforcement learning (Import AI 87), and tried to standardize how robot experimentation works (#113). It was founded by some of the people behind quantum computing startup D-Wave and spent a few […]
[…] (Import AI 69) and has been doing advanced research collaborations on areas like meta-learning (Import AI 113). This is a slightly long-winded way to say: PFN has some credible AI researchers and is generally […]