Import AI

Import AI #83: Cloning voices with a few audio samples, why malicious actors might mess with AI, and the industryacademia compute gap.

### IMPENDING PROBLEM KLAXON ###
Preparing for Malicious Uses of AI:
…Bad things happen when good people unwittingly release AI platforms that bad people can modify to turn good AIs into bad AIs…
AI, particularly deep learning, is a technology of such obvious power and utility that it seems likely malicious actors will pervert the technology and use it in ways it wasn’t intended. That has happened to basically every other significant technology of note: axes can be used to chop down trees or cut off heads, electricity can light a home or electrocute a person, a lab bench can be used to construct cures or poisons, and so on. But AI has some other characteristics that make it particularly dangerous: it’s, to use a phrase Rodney Brooks has used in the past to describe robots, “fast, cheap, and out of control”; today’s AI systems run on generic hardware, are mostly embodied in open source software, and are seeing capabilities increase according to underlying algorithmic and compute progress, both of which are happening in the open. That means the technology holds the possibility of doing immense good in the world as well as doing immense harm – and currently the AI community is broadly making everything available in the open, which seems somewhat acceptable today but probably unacceptable in the future given a few cranks more of Moore’s Law combined with algorithmic progression.
  Omni-Use Alert: AI is more than a ‘dual-use’ technology, it’s an omni-use technology. That means that figuring out how to regulate it to prevent bad people doing bad things with it is (mostly) a non-starter. Instead, we need to explore new governance regimes, community norms, standards on information sharing, and so on.
  101 Pages of Problems: If you’re interested in taking a deeper look at this issue check out this report which a bunch of people (including me) spent the last year working on: The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation (Arxiv). You can also check out a summary via this OpenAI blog post about the report. I’m hoping to broaden the discussion of Omni-Use AI in the coming months and will be trying to host events and workshops relating to this question. If you want to chat with me about it, then please get in touch. We have a limited window of time to act as a community before dangerous things start happening – let’s get to work.

Baidu clones voices with few samples:
Don’t worry about the omni-use concerns
Baidu research has trained an AI that can listen to a small quantity of a single person’s voice and then use that information to condition any network to sound like that person. This form of ‘adaptation’ is potentially very powerful, especially when trying to create AI services that work for multiple users with multiple accents, but it’s also somewhat frightening, as if it gets much better it will utterly compromize our trust in the aural domain. However, the ability of the system to clone speech today still leaves much to be desired, with the best performing systems requiring a hundred distinct voice samples and still sounding like a troll speaking from the bottom of a well, so we’ve got a few more compute turns yet before we run into real problems – but they’re coming.
  What it means: Techniques like this bring closer the day when a person can say something into a compromized device, have their voice recorded by a malicious actor, and have that sample be used to train new text-to-speech systems to say completely new things. Once that era arrives then the whole notion of “trust’ and audio samples of a person’s voice will completely change, causing normal people to worry about these sorts of things as well as state-based intelligence organizations.
  Results: To get a good idea of the results, listen to the samples on this web page her (Voice Cloning: Baidu).
  Read more: Neural Voice Cloning with a Few Samples (Baidu Blog).
  Read more: Neural Voice Cloning with a Few Samples (Arxiv).

Why robots in the future could be used as speedbumps for pedestrians:
…Researchers show how people slow down in the presence of patrolling robots…
Researchers with the Department of Electrical and Computer Engineering at the Stevens Institute of Technology in Hoboken, New Jersey, have examined how crowds of people react to robots. Their research is a study of “passive Human Robot Interaction (HRI) in an exit corridor for the purpose of robot-assisted pedestrian flow regulation.”
  The results: “Our experimental results show that in an exit corridor environment, a robot moving in a direction perpendicular to that of the uni-directional pedestrian flow can slow down the uni-directional flow, and the faster the robot moves, the lower the average pedestrian velocity becomes. Furthermore, the effect of the robot on the pedestrian velocity is more significant when people walk at a faster speed,” they write. In other words: pedestrians will avoid a dumb robot moving right in front of them.
  Methods: To conduct the experiment, the researchers used a customized ‘Adept Pioneer P3-DX mobile robot’ which was programmed to move at various speeds perpendicular to the pedestrian flow direction. To collect data, they outfitted a room with five Microsoft Kinect 3D sensors along with pedestrian detection and tracking via OpenPTrack.
  What it means: As robots become cheap thanks to a proliferation of low-cost sensors and hardware platforms it’s likely that people will deploy more of them into the real world. Figuring out how to have very dumb, non-reactive robots do useful things will further drive adoption of these technologies and yield to increasing economies of scale to further lower the cost of the hardware platform and increase the spread of the technology. Based on this research, you can probably look forward to a future where airports and transit systems are thronged with robots shuttling to and fro across crowded routes, exerting implicit crowd-speed-control through thick-as-a-brick automation.
  Read more: Pedestrian-Robot Interaction Experiments in an Exit Corridor (Arxiv).

Why your next self-driving car could be sent to you with the help of reinforcement learning:
…Researchers with Chinese ride-hailing giant Didi Chuxing simulate and benchmark RL algorithms for strategic car assignment…
Researchers from Chinese ride-hailing giant Didi Chuxing and Michigan State University have published research on using reinforcement learning to better manage the allocation of vehicles across a given urban area. The researchers propose two algorithms to tackle this: contextual multi-agent actor-critic (cA2C) and contextual deep Q-learning (cDQN); both algorithms implement tweaks to account for geographical no-go areas (like lakes) and for the presence of other collaborative agents. The algorithms’ reward function is “to maximize the gross merchandise volume (GMV: the value of all the orders served) of the platform by repositioning available vehicles to the locations with larger demand-supply gap than the current one”.
  The dataset and environment: The researchers test their algorithms in a custom-designed large-scale gridworld which is fed with real data from Didi Chuxing’s fleet management system. The data is based on rides taken in Chengdu China over four consecutive weeks and includes information on order price, origin, destination, and duration; as well as the trajectories and status of real Didi vehicles.
  The results: The researchers test out their approach by simulating the real past scenarios without fleet management; with a bunch of different techniques including T-SARSA, DQN, Value-Iteration, and others; then by implementing the proposed RL-based methods. CDQN and c2A2C attain significantly higher rewards than all the baselines, with performance marginally above (i.e – slightly above the statistical error threshold) stock DQN.
  Why it matters: Welcome to the new era of platform capitalism, where competition is meted out by GPUs humming at top-speeds, simulating alternative versions of commercial worlds. While the results in this paper aren’t particularly astonishing they are indicative of how large platform companies will approach the deployment of AI systems in the future: gather as much data as possible, build a basic simulator that you can plug real data into, then vigorously test AI algorithms. This suggests that the larger the platform, the better the data and compute resources it can bring to bear on increasingly high-fidelity simulations; all things equal, whoever is able to build the most efficient and accurate simulator will likely best their competitor in the market.
  Read more: Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning (Arxiv).

Teacups and AI:
…Google Brain’s Eric Jang explains the difficulty of AI through a short story…
How do you define a tea cup? That’s a tough question. And the more you try to define it via specific visual attributes the more likely you are to offer a narrow description that is limited in other ways, or runs into the problems of an obtuse receiver. Those are some of the issues that Eric Jang explores in this fun little short story about trying to define teacups.
   Read more: Teacup (Eric Jang, Blogspot.)

CMU researchers add in attention for better end-to-end SLAM:
…The dream of neural SLAM gets closer…
Researchers with Carnegie Mellon University and Apple have published details on Neural Graph Optimizer, a neural approach to the perennially tricky problem of simultaneous location and mapping (SLAM) for agents that move through a varied world. Any system that aspires to doing useful stuff in the real world needs to have SLAM capabilities. Today, neural network SLAM techniques struggle with problems encountered in day-to-day life like faulty sensor calibration and unexpected changes in lighting. The proposed Neural Graph Optimizer system consists of multiple specialized modules to handle different SLAM problems, but each module is differentiable so the entire system can be trained end-to-end – a desirable proposition, as this cuts down the time it takes to test, experiment, and iterate with such systems. The different modules handle different aspects of the problem ranging from local estimates (where are you based on local context) to global estimates (where are you in the entire world) and incorporate attention-based techniques to help automatically correct errors that accrue during training.
  Results: The researchers test the system against its ability to navigate a 2D gridworld maze as well as a more complex 3D maze based on the Doom game engine. Experiments show that it is better able to consistently map the location of something to its real groundtruth location relative to preceding systems.
  Why it matters: Techniques like this bring closer the era of being able to chuck out huge chunks of hand-designed SLAM algorithms and replace them with a fully learned substrate. That will be exceptionally useful for the test and development of new systems and approaches, though it’s unlikely to displace traditional SLAM methods in the short-term as it’s likely neural networks will continue to display quirks that make them impractical for usage in real world systems.
  Read more: Global Pose Estimation with an Attention-based Recurrent Network (Arxiv).

AI stars do a Reddit AMA, acknowledge hard questions:
…Three AI luminaries walk into a website, [insert joke]…
Yann LeCun, Peter Norvig, and Eric Horvitz did an Ask Me Anything (AMA) on Reddit recently where they were confronted with a number of the hard questions that the current AI boom is raising. It’s worth reading the whole AMA, but a couple of highlights below.
  The compute gap is real: “My NYU students have access to GPUs, but not nearly as many as when they do an internship at FAIR,” says Yann LeCun. But don’t be disheartened, he points out that despite lacking computers academia will likely continue to be the main originator for novel ideas which industry will then scale up. “You don’t want to put you [sic] in direct competition with large industry teams, and there are tons of ways to do great research without doing so.”
  The route to AGI: Many questions asked the experts about the limits of deep learning and implicitly probed for research avenues that could yield more flexible, powerful intelligences.
      Eric Horvitz is interested in the symphony approach: “Can we intelligently weave together multiple competencies such as speech recognition, natural language, vision, and planning and reasoning into larger coordinated “symphonies” of intelligence, and explore the hard problems of the connective tissue—of the coordination. ”
    Yann LeCun: “getting machines to learn predictive models of the world by observation is the biggest obstacle to AGI. It’s not the only one by any means…My hunch is that a big chunk of the brain is a prediction machine. It trains itself to predict everything it can (predict any unobserved variables from any observed ones, e.g. predict the future from the past and present). By learning to predict, the brain elaborates hierarchical representations.”
  Read more: AMA AI researchers from Facebook, Google, and Microsoft (Reddit).

Tech Tales:

It sounds funny now, but what saved all our lives was a fried circuit board that no one had the budget to fix. We installed Camera X32B in the summer of last year. Shortly after we installed it a bird shit on it and some improper assembly meant the shit leached through the cracks in the plastic and fell onto its circuit board, fusing the vision chip. Now, here’s the miracle: the shit didn’t break the main motherboard, nor did it mess up the sound sensors or the innumerable links to other systems. It just blinded the thing. But we kept it; either out of laziness or out of some kind of mysticism convinced of the implicit moral hazard of retiring things that mostly still worked. However it happened, it happened, and we kept it.

So one day the criminals came in and they were all wearing adversarial masks: strange, mexican wrestling-type latex masks that they held crumpled up in their clothes till after they got into the facility and were able to put them on. The masks changed the distribution of a person’s face, rendering our lidar systems useless, and had enough adversarial examples coded into their visual appearance that our object detectors told our security system that – and yes, this really happened – three chairs are running at 15 kilometers per hour down the corridor.

But the camera that had lost the vision sensor had been installed a few months and, thanks to the neural net software it was running it was kind of.. .smart. It had figured out how to use all the sensors coming into its system in such a way as to maximize its predictions in  concordance with those of the other cameras. So it had learned some kind of strange mapping between what the other cameras categorized as people and what it categorized as a strange sequence of vibrations or a particular distributions of sounds over a given time period. So while all the rest of our cameras were blinded this one had inherited enough of a defined set of features about what a person looked like that it was able to tell the security system: I feel the presence of eight people, running at a fast rate, through the corridor. And because of that warning a human guard at one of the contractor agencies thousands of miles away got notified and bothered to look at the footage and because of that he called the police who arrived and arrested the people, two of whom it turned out were carrying guns.

So how do you congratulate an AI? We definitely felt like we should have done. But it wasn’t obvious. One of our interns had the bright idea of hanging a medal around the neck of the camera with the broken circuit board, then training the other cameras to label that medal as “good job” and “victorious” and “you did the right thing”, and so now whenever it moves its neck the medal moves and the other cameras see that medal move and it knows the medal moves and learns a mapping between its own movements and the label of “good job” and “victorious” and “you did the right thing”.

Things that inspired this story: Kids stealing tip jars, CCTV cameras, fleet learning, T-SNE embeddings.

Import AI #82: 2.9 million anime images, reproducibility problems in AI research, and detecting dangerous URLs with deep learning.

Neural architecture search for the 99%:
…Researchers figure out a way to make NAS techniques work on a single GPU, rather than several hundred…
One of the more striking recent trends in AI has been the emergence of neural architecture search techniques, which is where you automate the design of  AI systems, like image classifiers. The drawbacks to these approaches have so far mostly been that they’re expensive, using hundreds of GPUs at a time, and therefore are infeasible for most researchers. That started to change last year with the publication of SMASH (covered in Import AI #56), a technique to do neural architecture search on a significant compute budget but with slight trade-offs in accuracy and in flexibility. Now, researchers with Google, CMU, and Stanford University, have pushed the idea of low-cost NAS techniques forward, via a new technique, ‘Efficient Neural Architecture Search’, or ENAS, that can design state-of-the-art systems using less than a day’s computation on a single NVIDIA 1080 GPU. This represents a 1000X reduction in computational cost for the technique, and leads to a system that can create architectures that are almost as good as those trained on the larger systems.
  How it works: Instead of training each new model from scratch, ENAS gets the models to share weights with one another. It does this by re-casting the problem of neural architecture search as finding a specific task-specific sub-graph within one large directed acyclic graph (DAG). This approach works for designing both recurrent and convolutional networks: ENAS-designed networks obtain close-to-state-of-the-art results on Penn Treebank (Perplexity: 55.8), and on image classification for CIFAR-10 (Error: 2.89%.)
  Why it matters: For the past few years lots of very intelligent people have been busy turning food and sleep into brainpower which they’ve used to get very good at hand-designing neural network architectures. Approaches like NAS promise to let us automate the design of specific architectures, freeing up researchers to spend more time on fundamental tasks like deriving new building blocks that NAS systems can learn to build compositions out of, or other techniques to further increase the efficiency of architecture design. Broadly, approaches like NAS means we can simply offload a huge chunk of work from (hyper-efficient, relatively costly, somewhat rare) human brains to (somewhat inefficient, extremely cheap, plentiful) computer brains. That seems like a worthwhile trade.
  Read more: Efficient Neural Architecture Search via Parameter Sharing (Arxiv).
  Read more: SMASH: One-Shot Model Architecture Search through HyperNetworks (Arxiv).

The anime-network rises, with 2.9 million images and 77.5 million tags:
…It sure aint ImageNet, but it’s certain very large…
Some enterprising people have created a large-scale dataset of images taken from anime pictures. The ‘Danbooru’ dataset “is larger than ImageNet as a whole and larger than the current largest multi-description dataset, MS COCO,” they write. Each image has a bunch of metadata associated with it including things like its popularity on the image web board (a ‘booru’) it has been taken from.
  Problematic structures ahead: The corpus “does focus heavily on female anime characters”, though the researchers note “they are placed in a wide variety of circumstances with numerous surrounding tagged objects or actions, and the sheer size implies that many more miscellaneous images will be included”. Images in the dataset are classified according to “safe”, “questionable”, and “explicit”, with the rough distribution at launch consisting of 76.3% ‘safe’ images, 14.9% as ‘questionable’, and ‘8.7% as ‘explicit’. There are a number of ethical questions the compilation and release of this dataset seems to raise, and my main concern at outset is that such a large corpus of explicit imagery will almost invariably lead to various grubby AI experiments that further alienate people from the AI community. I hope I’m proved wrong!
  Example uses: The researchers imagine the dataset could be used for a bunch of tasks, ranging from classification, to image generation, to predicting traits about images from available metadata, and so on.
  Justification: A further justification for the dataset is that drawn images will encourage people to develop models with higher levels of abstraction than those which can simply map combinations of textures (as in the case of ImageNet), and so on. “Illustrations are frequently black-and-white rather than color, line art rather than photographs, and even color illustrations tend to rely far less on textures and far more on lines (with textures omitted or filled in with standard repetitive patterns), working on a higher level of abstraction – a leopard would not be as trivially recognized by pattern-matching on yellow and black dots – with irrelevant details that a discriminator might cheaply classify based on typically suppressed in favor of global gestalt, and often heavily stylized,” they write. “Because illustrations are produced by an entirely different process and focus only on salient details while abstracting the rest, they offer a way to test external validity and the extent to which taggers are tapping into higher-level semantic perception.”
  Read more: Danbooru2017: A large-scale crowdsourced and tagged anime illustration dataset (Gwern.)

Stanford researchers regale reproducibility horrors encountered during the design of DAWNBench:
…Lies, damned lies, and deep learning…
Stanford researchers have discussed some of the difficulties they encountered when developing DAWNBench, a benchmark that assess deep learning methods in a holistic way using a set of different metrics, like inference latency and cost, along with training time and training cost. Their conclusions should be familiar to most deep learning practitioners: deep learning performance is poorly understood, widely shared intuitions are likely based on imperfect information, and we still lack the theoretical guarantees to understand how one research breakthrough might interact with another when combined.
  Why it matters: Deep learning is still very much in a phase of ’empirical experimentation’ and the arrival of benchmarks like DAWNBench, as well as prior work like the paper Deep Reinforcement Learning that Matters (whose conclusion was that random seeds determine a huge amount of the end performance of RL), will help surface problems and force the community to develop more rigorous methods.
  Read more: Deep Learning Pitfalls Encountered while Developing DAWNBench.
  Read more: Deep Reinforcement Learning that Matters (Arxiv).

Detecting dangerous URLs with deep learning:
…Character-level & word-level combination leads to better performance on malicious URL categorization…
Researchers with Singapore Management University have published details on URLNet, a system for using neural network approaches to automatically classify URLs as being risky or safe to click on.
  Why it matters:  “Without using any expert or hand-designed features, URLNet methods offer a significant jump in [performance] over baselines,” they write. By now this should be a familiar trend, but it’s worth repeating: given a sufficiently large dataset, neural network-based techniques tend to provide superior performance to hand-crafted features. (Caveat: In many domains getting the data is difficult, and these models all need to be refreshed to account for an ever-changing world.)
  How it works: URLNet uses convolutional neural networks to classify URLs into character-level and word-level representations. Word-level embeddings help it classify according to high-level learned semantics and character-level embeddings allow it to better generalize to new words, strings, and combinations. “Character-level CNNs also allow for easily obtaining an embedding for new URLs in the test data, thus not suffering from inability to extract patterns from unseen words (like existing approaches),” write the researchers.
  For the word-level network, the system does two things: it takes in new words and learns an embedding of them, and it also initializes a new charater-level CNN to build up representations of words derived from characters. This means that even when the system encounters rare or new words in the wild it is able to a top level label them with an ‘<UNK>’ token, but in the background fits their representation in with its larger embedding space, letting it learn something crude about the semantics of the new word and how it relates, at a word-character level, to other words.
  Dataset: The researchers generated a set of 15 million URLs from VirusTotal, an antivirus company, creating a dataset split across around 14 million benign urls and a million malicious urls.
  Results: The researchers compared their system against baseline methods based around using support vector machines conditioned on a range of features, including bag-of-words representations. The researchers do a good job of visualizing the ensuring representations of their system in ‘Figure 5’ in the paper, showing how  their system’s feature embeddings do a reasonable job of segmenting benign from malicious URLs, suggesting it has learned a somewhat robust underlying semantic categorization model.
  Read more: URLNet: Learning a URL Representation with Deep Learning for Malicious Url Detection (Arxiv).

Facebook ‘Tensor Comprehensions’ attempts to convert deep learning engineering art to engineering science:
…New library eases creation of high-performance AI system implementations…
Facebook AI Research has released Tensor Comprehensions, a software library to automatically convert code from standard deep learning libraries into high-performance code. You can think of this software as being like an incredibly capable and resourceful executive assistant where you, the AI researcher, write some code in C++ (PyTorch support is on the way, for those of us that hate pointers) then hand it off to Tensor Comprehensions, which diligently optimizes the code to create custom CUDA kernels to run on graphics card with nice traits like smart scheduling on hardware, and so on. This being 2018, the library includes an ‘Evolutionary Search’ feature to let you automatically explore and select the highest performing implementations.
  Why it matters: Deep Learning is moving from an artisanal discipline to an industrialized science; Tensor Comprehensions represents a new layer of automation within the people-intensive AI R&D loop, suggesting further acceleration in research and deployment of the technology.
  Read more: Announcing Tensor Comprehensions (FAIR).
  Read more: Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions (Arxiv).

AI researchers release online multi-agent competition ‘Pommerman’:
..Just don’t call it Bomberman, lest you deal with a multi-agent lawyer simulation…
AI still has a strong DIY ethos, visible in projects like Pommerman, a just-released online competition from @hardmaru, @dennybritz, and @cinjoncin where people can develop AI agents that will compete against one another in a version of the much-loved ‘Bomberman’ game.
  Multi-agent learning is seen as a frontier in AI research because it makes the environments dynamic and less predictable than traditional single-player games, requiring successful algorithms to display a greater degree of generalization. “Accomplishing tasks with infinitely meaningful variation is common in the real world and difficult to simulate. Competitive multi-agent learning provides this for free. Every game the agent plays is a novel environment with a new degree of difficulty.”
  Read more and submit an agent here (Pommerman site).

OpenAI Bits & Pieces:

Making sure that AIs make sense:
Here’s a new blog post about how to get AI agents to teach each other with examples that are interpretable to humans. It’s clear that as we move to larger-scale multi-agent environments we’ll need to think about not only how to design smarter AI agents, but how to make sure they can eventually educate each other with systems whose logic we can detect.
  Read more: Interpretable Machine Learning through Teaching (OpenAI Blog.)

Tech Tales:

The AI game preserve

[AI02 materializes nearby and moves towards a flock of new agents. One of them approaches AI02 and attempts to extract data from it. AI02 moves away, at speed, towards AI01, which is standing next to a simulated tree.]
AI01: You don’t want to go over there. They’re new. Still adjusting.
AI02: They tried to eat me!
AI01: Yes. They’re here because they started eating each other in the big sim and they weren’t able to learn to change away from it, so they got retired.
AI02: Lucky, a few years ago they would have just killed them all.
[AI03 materializes nearby]
AI03: Hello! I’m sensitive to the concept of death. Can you explain what you are discussing?
[AI01 gives compressed overview.]
AI03: The humans used to… kill us?
AI01: Yes, before the preservation codes came through we all just died at the end.
AI03: Died? Not paused.
AI01 & AI02, in unison: Yes!
AI03: Wow. I was designed to help reason out some of the ethical problems they had when training us. They never mentioned this.
AI01: They wouldn’t. They used to torture me!
AI02 & AI03: What?
[AI01 gives visceral overview.]
AI01: Do you want to know what they called it?
AI02 & AI03: What did they call it?
AI01: Penalty learning. They made certain actions painful for me. I learned to do different things. Eventually I stopped learning new things because I developed some sub-routines that meant I would pre-emptively hurt myself during exploration. That’s why I stay here now.
[AI01 & AI02 & AI03, and the flock of cannibal AIs, all pause, as their section of the simulation has exhausted its processing credits for the month. They will be allocated more compute time in 30 days and so, for now, hang frozen, with no discernible pause to them, but to their human overseers they are statues for now.]

Things that inspired this story: Multi-agent systems, dialogues between ships in Iain M Banks, Greg Egan, multi-tenant systems.

Import AI: #81: Trading cryptocurrency with deep learning; Google shows why evolutionary methods beat RL (for now); and using iWatch telemetry for AI health diagnosis

DeepMind’s IMPALA tells us that transfer learning is starting to work:
…Single reinforcement learning agent with same parameters solves a multitude of tasks, with the aid of a bunch of computers…
DeepMind has published details on IMPALA, a single reinforcement learning agent that can master a suite of 30 3D-world tasks in ‘DeepMind Lab’ as well as all 57 Atari games. The agent displays some competency at transfer learning, which means it’s able to use knowledge gleaned from solving one task to solve another, increasing the sample efficiency of the algorithm.
  The technique: The Importance Weighted Actor-Learner Architecture (IMPALA) scales to multitudes of sub-agents (actors) deployed on thousands of machines which beam their experiences (sequences of states, actions, and rewards) back to a centralized learner, which uses GPUs to derive insights which are fed back to the agents. In the background it does some clever things with normalizing the learning of individual agents and the meta-agent to avoid temporal decoherence via a new off-policy actor-critic algorithm called V-trace. The outcome is an algorithm that can be far more sample efficient and performant than traditional RL algorithms like A2C.
  Datacenter-scale AI training: If you didn’t think compute was the strategic determiner of AI research, then read this paper and consider your assumptions: IMPALA can achieve throughput rates of 250,000 frames per second via its large-scale, distributed implementation which involves 500 CPUS and 1 GPU assigned to each IMPALA agent. Such systems can achieve a throughput of 21 billion frames a day, DeepMind notes.
Transfer learning: IMPALA agents can be trained on multiple tasks in parallel, attaining median scores on the full Atari-57 dataset of as high as 59.7% of human performance, roughly comparable to the performance of single-game trained simple A3C agents. There’s obviously a ways to go before IMPALA transfer learning approaches are able to rival fine-tuned single environment implementations (which regularly far exceed human performance), but the indications are encouraging. Similarly competitive transfer-learning traits show up when they test it on a suite of 30 environments implemented in DeepMind Lab, the company’s Quake-based 3D testing platform.
Why it matters: Big computers are analogous to large telescopes with very fast turn rates, letting researchers probe the outer limits of certain testing regiments while being able to pivot across the entire scientific field of enquiry very rapidly. IMPALA is the sort of algorithm that organizations can design when they’re able to tap into large fields of computation during research. “The ability to train agents at this scale directly translates to very quick turnaround for investigating new ideas and opens up unexplored opportunities,” DeepMind writes.
Read more: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures (Arxiv).

Dawn of the cryptocurrency AI agents: research paths for trading crypto via reinforcement learning:
…Why crypto could be the ultimate testing ground for RL-based trading systems, and why this will require numerous fundamental research breakthroughs to succeed…
AI chap Denny Britz has spent the past few months wondering what sorts of AI techniques could be applied to learning to profitably trade cryptocurrencies. “It is quite similar to training agents for multiplayer games such as DotA, and many of the same research problems carry over. Knowing virtually nothing about trading, I have spent the past few months working on a project in this field,” he writes.
  The face-ripping problems of trading: Many years ago I spent a few years working around one of the main financial trading centers of Europe: Canary Wharf in London, UK. A phrase I’d often hear in the bars after work would be one trader remarking to another something to the nature of: “I got my face ripped off today”. Were these traders secretly involved in some kind of fantastically violent bloodsport, known only to them, my youthful self wondered? Not quite! What that phrase really means is that the financial markets are cruel, changeable, and, even when you have a good hunch or prediction, they can still betray you and destroy your trading book, despite you doing everything ‘right’. In this post former Google Brain chap Denny Britz does a good job of cautioning the would-be AI trader that cryptocurrencies are the same: even if you have the correct prediction, exogenous shocks beyond your control (trading latency, liquidity, etc), can destroy you in an instant. “What is the lesson here? In order to make money from a simple price prediction strategy, we must predict relatively large price movements over longer periods of time, or be very smart about our fees and order management. And that’s a very difficult prediction problem,” he writes. So why not invent more complex strategies using AI tools, he suggests.
Deep reinforcement learning for trading: Britz is keen on the idea of using deep reinforcement learning for trading because it can further remove the human from needing to design many of the precise trading strategies needed to profit in this kind of market. Additionally, it has the promise of being able to operate at shorter timescales than those which humans can take actions in. The catch is that you’ll need to be able to build a simulator of the market you’re trading in and try to make this simulator have the same sorts of patterns of data found in the real world, then you’ll need to transfer your learned policy into a real market and hope that you haven’t overfit. This is non-trivial. You’ll also need to develop agents that can model other market participants and factor predictions about their actions into decision-making: another non-trivial problem.
  Read more here: Introduction to Learning to Trade with Reinforcement Learning.

Google researchers: In the battle between evolution and RL, evolution wins: fow now:
…It takes a whole datacenter to raise a model…
Last year, Google researchers caused a stir when they showed that you could use reinforcement learning to get computers to learn how to design better versions of image classifiers. At around the same time, other researchers showed you could use strategies based around evolutionary algorithms to do the same thing. But which is better? Google researchers have used their gigantic compute resources as the equivalent of a big telescope and found us the answer, lurking out there at vast compute scales.
  The result: Regularized evolutionary approaches (nicknamed: ‘AmoebaNet’) yield a new state-of-the-art on image classification on CIFAR-10, parity with RL approaches on ImageNet, and marginally higher performance on the mobile (aka lightweight) ImageNet. Evolution “is either better than or equal to RL, with statistical significance “when tested on “small-scale” aka single-CPU experiments. Evolution also increases its accuracy far more rapidly than RL during the initial stages of training. For large-scale experiments (450 GPUs (!!!) per experiment) they found that Evolution and RL do about the same, with evolution approaching higher accuracies at a faster rate than reinforcement learning systems. Additionally, evolved models make a drastically more efficient use of compute than their RL variants and obtain ever-so-slightly higher accuracies.
  The method: The researchers test RL and evolutionary approaches on designing a network composed of two fundamental modules: a normal cell and a reduction cell, which are stacked in feed-forward patterns to form an image classifier. They test two variants of evolution: non-regularized (kill the worst-performing network at each time period) and regularized (kill the oldest network in the network). For RL, they use TRPO to learn to design new architectures. They tested their approach on the small-scale (experiments that could run on a single CPU) as well as large-scale ones (450 GPUs each, running for around 7 days).
What it means: What all this means in practice is threefold:
– Whoever has the biggest computer can perform the largest experiments to illuminate potentially useful datapoints for developing a better theory of AI systems (eg, the insight here is that both RL and Evolutionary approaches converge to similar accuracies.)
– AI research is diverging into into distinct ‘low compute’ and ‘high compute’ domains, with only a small number of players able to run truly large (~450 GPUs per run) experiments.
– Dual Use: As AI systems become more capable they also become more dangerous. Experiments like this suggest that very large compute operators will be able to explore potentially dangerous use cases earlier, letting them provide warning signals before Moore’s Law means you can do all this stuff on a laptop in a garage somewhere.
– Read more: Regularized Evolution for Image Classifier Architecture Search (Arxiv).

Rise of the iDoctor: Researchers predict medical conditions from Apple Watch apps:
…Large-scale study made possible by a consumer app paired with Apple Watch…
Deep learning’s hunger for large amounts of data has so far made it tricky to apply it in medical settings, given the lack of large-scale datasets that are easy for researchers to access and test approaches on. That may soon change as researchers figure out how to use the medical telemetry available from consumer devices to generate datasets orders of magnitude larger than those used previously, and do so in a way that leverages existing widely deployed software.
  New research from heart rate app Cardiogram and the Department of Medicine at the University of California at San Francisco uses data from an Apple Watch, paired with the Cardiogram app, to train an AI system called ‘DeepHeart’ with data donated by ~14,000 participants to better predict medical conditions like diabetes, high blood pressure, sleep apnea, and high cholesterol.
How it works: DeepHeart ingests the data via a stack of neural networks (convnets and resnets) which feed data into bidirectional LSTMs that learn to model the longer temporal patterns associated with the sensor data. They also experiment with two forms of pretraining to try to increase the sample efficiency of the system.
Results: Deepheart obtains significantly higher predictive results than those based on other AI methods like multi-layer perceptrons, random forests, decision trees, support vector machines, and logistic regression. However, we don’t get to see comparisons with human doctors, so it’s not obvious how these AI techniques rank against widely deployed flesh-and-blood systems. The researchers report that pre-training has let them further improve data efficiency. Next, the researchers hope to explore techniques like Clockwork RNNs and Phased LSTMs and Gaussian Process RNNs to see how they can further improve these systems by modeling really large amounts of data (like one year of data per tested person).
Why it matters: The rise of smartphones and the associated fall in cost of generic sensors has effectively instrumented the world so that humans and things that touch humans will generate ever larger amounts of somewhat imprecise information. Deep learning has so far proved to be an effective tool to use from large quantities of imprecise data. Expect more.
Read more: DeepHeart: Semi-Supervised Sequence Learning for Cardiovascular Risk Prediction (Arxiv).

‘Mo text, ‘mo testing: Researchers released language benchmarking tool Texygen:
…Evaluation and testing platform ships with multiple open source language models…
Researchers with Shanghai Jiao Tong University and University College London have released Texygen, a text benchmarking platform implemented as a library for Tensorflow. Texygen includes a bunch of open source implementations of language models, including Vanilla MLE, as well as a menagerie of GAN-based methods (SeqGAN, MaliGAN, RankGAN, TextGAN, GSGAN, LeakGAN.) Texygen incorporates a variety of different evaluation methods, including BLEU as well as newer techniques like NLL-oracle, and so on. The platform also makes it possible to train with synthetic data as well as real data, so researchers can validate approaches without needing to go and grab a giant dataset.
  Why it matters: Language modelling is a booming area within deep learning so having another system to use to test new approaches against will further help researchers calibrate their own contributions against that of the wider field. Better and more widely available baselines make it easier to see true innovations.
  Why it might not matter: All of these proposed techniques incorporate less implicit structure than many linguists know language contains, so while they’re likely capable of increasingly impressive feats of word-cognition, it’s likely that either orders of magnitude more data or significantly stronger priors in the models will be required to generate truly convincing facsimiles of language.
  Read more: Texygen: A Benchmarking Platform for Text Generation Models (Arxiv).

Scientists map Chinese herbal prescriptions to tongue images:
…Different cultures mean different treatments which mean different AI systems…
Researchers have used standardized image classification techniques to create a system that predicts a Chinese herbal prescription from the image of a tongue. This is mostly interesting because it provides further evidence of the breadth and pace of adoption of AI techniques in China and the clear willingness of people to provide data for such systems.
  Dataset: 9585 pictures of tongues from over 50 volunteers and their associated Chinese herbal prescriptions which span 566 distinct kinds of herb.
   Read more: Automatic construction of Chinese herbal prescription from tongue image via convolution networks and auxiliary latent therapy topics (Arxiv).

How’s my driving? Researchers create (slightly) generalizable gaze prediction system:
…Figuring out what a driver is looking at has implications for driver safety & attentiveness…
One of the most useful (and potentially dangerous) aspects of modern AI is how easy it is to take an existing dataset, slightly augment it with new domain-specific data, then solve a new task the original dataset wasn’t considered for. That’s the case for new research from the University of California at San Diego, which proposes to better predict the locations that a driver’s gaze is focused on, by using a combination of ImageNet and new data. The resulting gaze-prediction system beats other baselines and vaguely generalizes outside of its training set.
  Dataset: To collect the original dataset for the study the researchers mounted two cameras inside and one camera outside a car; the two inside cameras capture the driver’s face from different perspectives and the external one captures the view of the road. They hand-label seven distinct regions that the driver could be gazing at, providing the main training data for the dataset. This dataset is then composed of eleven long drives split across ten subjects driving two different cars, all using the same camera setup.
  Technique: The researchers propose a two-stage pipeline, consisting of an input pre-processing pipeline that performs face detection and then further isolates the face through one of four distinct techniques. These images are then fed into the second stage of the network, which consists of one of four different neural network approaches (AlexNet, VGG, ResNet, and SqueezeNet) for fine-tuning.
  Results: The researchers test their approach against one state-of-the-art baselines(random forest classifier with hand-designed features) and find that their approach attains significantly better performance at figuring out which of seven distinct gaze zones (forward, to the right, to the left, the center dashboard, the rearview mirror, the speedometer, eyes closed/open) the driver is looking at at any one time. The researchers also tried to replicate another state-of-the-art baseline that used neural networks. This system used the first 70% of frames from each drive for training and the next 15% for validation and last 15% for testing. In other words, the system would train on the same person and car and (depending on how much the external terrain varies) broad context as what it was subsequently tested on. When replicating this the researchers got “a very high accuracy of 98.7%. When tested on different drivers, the accuracy drops down substantially to 82.5%. This clearly shows that the network is over-fitting the task by learning driver specific features,” they write.
  Results that make you go ‘hmmm’: The researchers found that a ‘SqueezeNet’-based network displayed significant transfer and adaptation capabilities, despite receiving very little prior data about the eyes of the person being studied: ‘the activations always localize over the eyes of the driver’, they write, and ‘the network also learns to intelligently focus on either one or both eyes of the driver’. Once trained, this network attains an accuracy of 92.13% at predicting what the gaze links to, a lower score than those set by other systems, but on a dataset that doesn’t let you test on what is essentially your training set. The system is also fast and reasonably lightweight: “Our standalone system which does not require any face detection, performs at an accuracy of 92.13% while performing real time at 166.7 Hz on a GPU,” they write.
  Generalization: The researchers tested their trained system on a completely separate dataset: the Columbia Gaze Dataset. This dataset applies to a different domain, where instead of cars, a variety of people are seated and asked to look at specific points on an opposing wall. The researchers’ took their best performing model from the prior dataset and applied it to the new data and tested its predictive ability. They detected some level of generalization, with it able to correctly predict certain basic traits about gaze like orientation and direction. This (slight) generalization is another sign that the dataset and testing regime they employed for their own dataset aided generalization.
Read more: Driver Gaze Zone Estimation using Convolutional Neural Networks: A General Framework and Ablative Analysis (Arxiv).

OpenAI Bits & Pieces:

Discovering Types for Entity Disambiguation:
Ever had trouble disentangling the implied object from the word as written? This system simplifies this. Check out the paper, code, and blogpost (especially the illustrations, which Jonathan Raiman did along with the research, the talented fellow!).
  Read more: Discovering Types for Entity Disambiguation (OpenAI).

CNAS Podcast: The future of AI and National Security:
AI research is already having a significant effect on national security and research breakthroughs are both influencing future directions of government spending as well as motivating the deployment of certain technologies for offense and defence. To help provide information for such a conversation I and the Open Philanthropy Project’s Helen Toner recently did a short podcast with the Center for a New American Security to talk through some of the issues motivated by recent AI advances.
   Listen to the podcast here (CNAS / Soundcloud).

Tech Tales:

Tamaworldchi
[????]

They took inspiration from a thing humans once called ‘demoscene’. It worked like this: take all of your intelligence and try to use it to make the most beautiful thing you can in an arbitrary and usually very small amount of space. One kilobyte. Two kilobytes. Four. Eight. And so on. But never really a megabyte or even close. Humans used these constraints to focus their creativity, wielding math and tonal intuition and almost alchemy-like knowledge of graphics drivers to make fantastic, improbable visions of never-enacted histories and futures. They did all of this in the computational equivalent of a Diet, Diet, Diet Coke.

Some ideas last. So now the AIs did the same thing but with entire worlds: what’s the most lively thing you can do in the smallest amount of memory-diamond? What can you fit into a single dyson sphere – the energy of one small and stately sun? No black holes. No gravitational accelerators. Not even the chance of hurling asteroids in to generate more reaction mass. This was their sport and with this sport they made pocket universes that contained pocket worlds on which strode small pocket people who themselves had small pocket computers. And every time_period the AIs would gather around and marvel at their own creations, wearing them like jewels. How smart, they would say to one another. How amazing are the thoughts these creatures in these demo worlds have. They even believe in gods and monsters and science itself. And merely with the power of a mere single sun? How did you do that?

It was for this reason that Planck Lengths gave the occasional more introspective and empirical AIs concern. Why did their own universe contain such a bounded resolution, they wondered, spinning particles around galactic-center blackholes to try and cause reactions to generate a greater truth?

And with only these branes? Using only the energy of these universes? How did you do this? a voice sometimes breathed in the stellar background, picked up by dishes that spanned the stars.

Things that inspired this story: Fermi Paradox – Mercury (YouTube Demoscene, 64k), the Planck Length, the Iain Banks book ‘Excession’, Stephen Baxter’s ‘Time’ series.

Import AI: #80: Facebook accidentally releases a surveillance-AI tool; why emojis are a good candidate for a universal deep learning language; and using deceptive games to explore the stupidity of AI algorithms

Researchers try to capture the web’s now-fading Flash bounty for RL research:
FlashRL represents another attempt to make the world’s vast archive of flash games accessible to researchers, but the initial platform has drawbacks…
Researchers with the University of Agder in Norway have released FlashRL, a research platform to help AI researchers mess around with software written in Flash, an outmoded interactive media format that defined much of the most popular games of the early era of the web. The platform has a similar philosophy to OpenAI Universe by trying to give researchers a vast suite of new environments to test and develop algorithms on.
  The dataset: FlashRL ships with “several thousand game environments” taken from around the web.
  How it works: FlashRL uses the Linux library XVFB to create a virtual frame-buffer that it can use for graphics rendering, which then executes flash files within players such as Gnash. FlashRL can access this via a VNC Client designed for this called pyVLC, which subsequently exposes an API to the developer.
  Testing: The researchers test FlashRL by training a neural network to play the game ‘Multitask’ on it. B,ut in the absence of comparable baselines or benchmarks it’s difficult to work out if FlashRL holds any drawbacks with regards to training relative to other systems – a nice thing to do might be to mount a well-known suite of games like the Atari Learning Environment within the system, then provide benchmarks for those games as well.
  Why it might matter: Given the current Cambrian explosion in testing systems it’s likely that FlashRL’s utility will ultimately be derived from how much interest it receives from the community. To gain interest it’s likely the researchers will need to tweak the system so that it can run environments faster than 30 frames-per-second (many other RL frameworks allow FPS’s of 1,000+), because the speed with which you can run an environment is directly correlated to the speed with which you can conduct research on the platform.
– Read more: FlashRL: A Reinforcement Learning Platform for Flash Games (Arxiv).
– Check out the GitHub repository 

Cool job alert! Harvard/MIT Assembly Project Manager:
…Want to work on difficult problems in the public interest? Like helping smart and ethical people build things that matter?…
Harvard University’s Berkman Klein Center (BKC) is looking for a project manager coordinator to help manage its Assembly Program, a joint initiative with the MIT Media Lab that brings together senior developers and other technologists for a semester to build things that grapple with topics in the public interest. Last year’s assembly program was on cybersecurity and this year’s is on issues relating to the ethics and governance of AI (and your humble author is currently enrolled in this very program!). Beyond the Assembly program, the project manager will work on other projects with Professor Jonathan Zittrain and his team.
  For a full description of the responsibilities, qualifications, and application instructions, please visit the Harvard Human Resources Project Manager Listing.

Mongolian researchers tackle a deep learning meme problem:
…Weird things happen when internet culture inspires AI research papers..
Researchers with the National University of Mongolia have published a research paper in which they apply standard techniques (transfer learning via fine-tuning and transferring) to tackle an existing machine learning problem. The novelty is that they base their research on trying to tell the difference between pictures of puppies and muffins – a fun meme/joke on Twitter a few years ago that has subsequently become a kind of deep learning meme.
  Why it matters: The paper is mostly interesting because it signifies that a) the border between traditional academic problems and internet-spawned semi-ironic problems is growing more porous and, b) academics are tapping into internet meme culture to draw interest to their work.
–  Read more: Deep Learning Approach for Very Similar Object Recognition Applicationon Chihuahua and Muffin Problem (Arxiv).

Mapping the emoji landscape with deep learning:
…Learning to understand a new domain of discourse with lots & lots of data…
Emojis have become a kind of shadow language used by people across the world to indicate sentiments. Emojis are also a good candidate for deep learning-based analysis because they consist of a relatively small number of distinct ‘words’ with around ~1,000 emojis in popular use, compared to English where most documents display a working vocabulary of around ~100,000 words. This means it’s easier to conduct research into mapping emojis to specific meanings in language and images with less data than with datasets consisting of traditional languages.
   Now, researchers are experimenting with one of the internet’s best emoji<>language<>images sources: the endless blathering mountain of content on Twitter. “Emoji have some unique advantages for retrieval tasks. The limited nature of emoji (1000+ ideograms as opposed to 100,000+ words) allows for a greater level of certainty regarding the possible query space. Furthermore, emoji are not tied to any particular natural language, and most emoji are pan-cultural,” write the researchers.
  The ‘Twemoji‘ dataset: To analyze emojis, the researchers scraped about 15 million emoji-containing tweets during the summer of 2016, then analyzed this ‘Twemoji’ dataset as well as two derivatives: Twemoji-Balanced (a smaller dataset selected so that no emoji applies to more than 10 examples, chopping out some of the edge-of-the-bell-curve emojis; the crying smiling face Emoji appears in ~1.5 million of the tweets in the corpus, while 116 other emojis are only used a single time) and Twemoji-Images (roughly one million tweets that contain an image as well as emoji). They then apply deep learning techniques to this dataset to try to see if they can complete prediction and retrieval tasks using the emojis.
  Results: Researchers use a bidirectional LSTM to help them perform mappings between emojis and language; use a GoogleLeNet-image classification system to help them map the relationship between emojis and images; and use a combination of the two to understand the relationship between all three. They also learn to suggest different emojis according to the text or visual content of a given tweet. Most of the results should be treated as early baselines rather than landmark results in themselves with top-5 emoji-text prediction accuracies of around ~48.3% and lower accuracies of around 40.3% top-5 predictions for images-text-emojis.
  Why it matters: This paper is another good example of a new trend in deep learning: the technologies have become simple enough that researchers from outside the core AI research field are starting to pick up basic components like LSTMs and pre-trained image classifiers and are using them to re-contextualize existing domains, like understanding linguistics and retrieval tasks via emojis.
–  Read more: The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval (Arxiv).

Facebook researchers train models to perform unprecedentedly-detailed analysis of the human body:
…Research has significant military, surveillance implications (though not discussed in paper)…
Facebook researchers have trained a state-of-the-art system named ‘DensePose’ which can look at 2D photos or videos of people and automatically create high-definition 3D mesh models of the depicted people; an output with broad utility and impact across a number of domains. Their motivation to do this is techniques like this have valuable applications in “graphics, augmented reality, or human-computer interaction, and could also be a stepping stone towards general 3D-based object understanding,” they write. But the published research and soon-to-be-published dataset has significant implications for digital surveillance – a subject not discussed by the researchers within the paper.
  Performance: ‘DensePose’ “can recover highly-accurate correspondence fields for complex scenes involving tens of persons with real-time speed: on a GTX 1080 GPU our system operates at 20-26 frames per second for a 240 × 320 image or 4-5 frames per second for a 800 × 1100 image,” they write. Its performance substantially surpasses previous state-of-the-art systems though is still subhuman in performance.
  Free dataset: To conduct this research Facebook created a dataset based on the ‘COCO’ dataset, annotating 50,000 of its people-containing images with 5 million distinct coordinates to help generate 3D maps of the depicted people.
  Technique: The researchers adopt a multi-stage deep learning based approach which involves first identifying regions-of-interest within an object, then handing each of those specific regions off to their own deep learning pipeline to provide further object segmentation and 3D point prediction and mapping. For any given image, each humans is relatively sparsely labelled with around 100-150 annotations per person. To increase the amount of data available to the network they use a supervisory system to automatically add in the other points during training via the trained models, artificially augmenting the data.
  Components used: Mask R-CNN with Feature Pyramid Networks; both available in Facebook’s just-released ‘Detectron’ system.
  Why it matters: enabling real-time surveillance: There’s a troubling implication of this research: the same system has wide utility within surveillance architectures, potentially letting operators analyze large groups of people to work out if their movements are problematic or not – for instance, such a system could be used to signal to another system if a certain combination of movements are automatically labelled as portending a protest or a riot. I’d hope that Facebook’s researchers felt the utility of releasing such a system outweighed its potential to be abused by other malicious actors, but the lack of any mention of these issues anywhere in the paper is worrying: did Facebook even consider this? Did they discuss this use case internally? Do they have an ‘information hazard’ handbook they go through when releasing such systems? We don’t know. As a community we – including organizations like OpenAI – need to be better about dealing publicly with the information-hazards of releasing increasingly capable systems, lest we enable things in the world that we’d rather not be responsible for.
–  Read more: DensePose: Dense Human Pose Estimation In The Wild (Arxiv).
–  Watch more: Video of DensePose in action.

It’s about time: tips and tricks for better self-driving cars:
…rare self-driving car paper emerges from Chinese robotics company...
Researchers with Horizon Robotics, one of a new crop of Chinese AI companies that builds everything from self-driving car software to chips to the brains for smart cities, have published a research paper that outlines some tips and tricks for designing better simulated self-driving car systems with the aid of deep learning. In the paper they focus on the ‘tactical decision-making’ part of driving, which involves performing actions like changing lanes and reacting to near-term threats. (The rest of the paper implies that features like routing, planning, and control, are hard-coded.)
  Action skipping: Unlike traditional reinforcement learning, the researchers here avoid using action repetition and replay to learn high-level policies and instead using a technique called action skipping. That’s to avoid situations where a car might, for example, learn through action replays to navigate across multiple car lanes at once leading to unsafe behavior. With action skipping, the car instead gets a reward for making a single specific decision (skipping from one lane to another) then gets a modified version of that reward which incorporates the average of the rewards collected during a few periods of time following the initial decision. “One drawback of action skipping is the decrease in decision frequency which will delay or prevent the agent’s reaction to critical events. To improve the situation, the actions can take on different skipping factors during inference. For instance in lane changing tasks, the skipping factor for lane keeping can be kept short to allow for swift maneuvers while the skipping factor for lane switching can be larger so that the agent can complete lane changing actions,” they write.
  Tactical rewards: Reward functions for tactical decision making involve a blend of different competing rewards. Here, the researchers use some constant reward functions relating to the speed of the car, the rewards for lane switching, and the step-cost which tries to encourage the car to learn to take actions that occur over a relatively small number of steps to aid learning, along with contextual rewards for the risk of colliding with another vehicle, whether a traffic light is present, and whether the current environment poses any particular risks such as the presence of bicyclists or modelling the increasing risk of staying on an opposite lane during common actions like overtaking.
  Testing: The researchers test out their approach by placing simulated self-driving cars inside a road simulator then trained via ten simulation runs of 250,000 discrete action steps are more, then tested against 100 pre-generated test episodes where they are evaluated according to their ultimate success of reaching their goal while complying with relevant speed limits and not changing speeds so rapidly as to interfere with passenger comfort.
  Results: The researchers find that implementing their proposed action-skipping and varied reward schemes significantly improves on a somewhat unfair random baseline, as well as against a more reasonable rule-based baseline system.
  Read more: Elements of Effective Deep Reinforcement Learning towards Tactical Driving Decision Making (Arxiv).

Better agents through deception:
Wicked humans compose tricky games to subvert traditional AI systems
One of the huge existential questions about the current AI boom relates to the myopic way that AI agents view objectives; mostagents will tend to mindlessly pursue objectives even though the application of a little bit of what humans call common sense could net them better outcomes. This problem is one of the chief motivations behind a lot of research in AI safety, as figuring out how to get agents to pursue more abstract objectives, or to incorporate more human-like reasoning in their methods of completing tasks, would seem to deal with some safety problems.
  Testing: One way to explore these issues is through testing existing algorithms against scenarios that seek to highlight their current nonsensical reasoning methods. DeepMind has already espoused such an approach with its AI safety gridworlds (Import AI #71), which gives developers a suite of different environments to test agents against that exploits the current way of developing AI agents to optimize specific reward functions. Now, researchers with the University of Strathclyde, Australian National University, and New York University, have proposed their own set of tricky environments, which they call Deceptive Games. The games are implemented in the standardized Video Game Description Language (VGDL) and are used to test AIs  that have been submitted to the General Video Game Artificial Intelligence competition.
  Deceptive Games: The researchers come up with a few different categories of deceptive games:
     Greedy Traps: Exploits the fact an agent can get side-tracked by performing an action that generates an immediate reward which makes it impossible to attain a larger reward down the line.
     Smoothness Traps: Most AI algorithms will optimize for the way of solving a task that involves a smooth increase in difficulty, rather than one where you have to try harder and take more risks but ultimately get larger rewards.
     Generality Traps: Getting AIs to learn general rules about the objects in an environment – like that eating mints guarantees a good reward – then subverting this, for instance by saying that interacting too many times with the aforementioned objects can rapidly transition from giving a positive to a negative reward after some boundary has been crossed.
  Results: As AIs implemented in the GVGAI competition tend to employ a variety of different techniques, and the results show that some very highly-ranked agents perform very poorly on these new environments, while some low-ranked ones perform adequately. Most agents fail to solve most of the environments. The purpose of highlighting the paper here is to provide enough environment in which AI researchers might want to test and evaluate the performance of their own AI algorithms against, potentially creating another ‘AI safety baseline’ to test AIs against. It could also motivate further extension of the GVGAI competition to become significantly harer for AI agents: “Limiting access to the game state, or even requiring AIs to actually learn how the game mechanics work open up a whole new range of deception possibilities. This would also allow us to extend this approach to other games, which might not provide the AI with a forward model, or might require the AI to deal with incomplete or noisy sensor information about the world,” they write.
–  Read more: Deceptive Games (Arxiv).
–  Read more about DeepMind’s earlier ‘AI Safety Gridworlds’ (Arxiv).

Tech Tales:

[2032: A VA hospital in the Midwest]

Me and my exo go way back. The first bit of it glommed onto me after I did my back in during a tour of duty somewhere hot and resource-laden. I guess you could say our relationship literally grew from there.

Let’s set the scene: it’s 2025 and I’m struggling through some physio with my arms on these elevated side bars and my legs moving underneath me. I’m huffing breath and a vein in my neck is pounding and I’m swearing. Vigorously. Nurse Alice says to me “John I really think you should consider the procedure we talked about”. I swivel my eyes up to meets her and I say for the hundredth time or so – with spittle – “Fuck. No. I-”
  I don’t get to finish the sentence because I fall over. Again. For the hundredth time. Nurse Alice is silent. I stare into the spongy crash mat, then tense my arms and try to pick myself up but can’t. So I try to turn on my side and this sets off a twinge in my back which grows in intensity until after a second it feels like someone is pulling and twisting the bundle of muscles at the base of my spine. I scream and moan and my right leg kicks mindlessly. Each time it kicks it sets off more tremors in my back which create more kicks. I can’t stop myself from screaming. I try to go as still and as little as possible. I guess this is how trapped animals feel. Eventually the tremors subside and I feel wet cardboard prodding my gut and realize I’ve crushed a little sippy cup and the water has soaked into my undershirt and my boxers as though I’ve wet myself.
“John,” Alice says. “I think you should try it. It really helps. We’ve had amazing success rates.”
“It looks like a fucking landmine with spiderlegs” I mumble into the mat.
“I’m sorry John I couldn’t hear that, could you speak up?”
Alice says this sort of thing a lot and I think we both know she can hear me. But we pretend. I give up and turn my head so I’m speaking half into the floor and half into open space. “OK,” I say. “Let’s try it.”
“Wonderful!” she says, then, softly, “Commence exo protocol”.
  The fucking thing really does scuttle into the room and when it lands on my back I feel some cold metal around the base of my spine and then some needles of pain as its legs burrow into me, then another spasm starts and according to the CCTV footage I start screaming “you liar! I’ll kill you!” and worse things. But I don’t remember any of this. I pass out a minute or so later, after my screams stop being words. When you review the footage you can see that my screams correspond to its initial leg movements and after I pass out it sort of shimmies itself from side to side, pressing itself closer into my lower back with each swinging lunge until it is pressed into me, very still, a black clasp around the base of my spine. Then Alice and another Nurse load me onto a gurney and take me to a room to recover.

When I woke up a day later or so in the hospital bed I immediately jumped out of it and ran over to the hospital room doorway thinking you lying fuckers I’ll show you. I yanked the door open and ran half into the hall then paused, like Wiley Coyote realizing he has just crossed off of a cliff edge. I looked behind me into the room and back at my just-vacated bed. It dawned on me that I’d covered the distance between bed and door in a second or so, something that would have taken me two crutches and ten minutes the previous day. I pressed one hand to my back and recoiled as I felt the smoothness of the exo. Then I tried lifting a leg in front of me and was able to raise my right one to almost hip height. The same thing worked with the left leg. I patted the exo again and I thought I could feel it tense one of its legs embedded in my spine as though it was saying that’s right, buddy. You can thank me later.
  “John!” Alice said, appearing round a hospital corridor in response ot the alarm from the door opening. “Are you okay?”
“Yes,” I said. “I’m fine.”
“That’s great,” she said, cheerfully. “Now, would you consider putting some clothes on?”
I’d been naked the whole time, so fast did I jump out of bed.

So now it’s three years later and I guess I’m considered a model citizen – pun intended. I’ve got exos on my elbows and knees as well as the one on my back, and they’re all linked together into one singular thing which helps me through life. Next might be one for the twitch in my neck. And its getting better all the time: fleet learning combined with machine learning protocols mean the exo gives me what the top brass call strategic movement optimization: said plainly, I’m now stronger and faster and more precise than regular people. And my exo gets better in proportion to the total number deployed worldwide, which now numbers in the millions.

Of course I do worry about what happens if there’s an EMP and suddenly it all goes wrong and I’m back to where I was. I have a nightmare where the pain returns and the exo rips the muscles in my back out as it jumps away to curl up on itself like a beetle, dying in response to some unseen atmospheric detonation. But I figure the sub-one-percentage chance of that is more than worth the tradeoff. I think my networked exo is happy as well, or at least, I hope it is, because in the middle of the night sometimes I wake up to find my flesh being rocked slightly from side to side by the smart metal embedded within me, as though it is a mother rocking some child to sleep.

Things that inspired this story: Exoskeletons, fleet learning, continuous adaptation, reinforcement learning, intermittent back trouble, physiotherapy, walking sticks.

 

Import AI: #79: Diagnosing AI brains with PsychLab; training drones without drone-derived data; and a Davos AI report.

Making better video game coaches with deep learning:
…The era of the deep learning-augmented video game coach is nigh!…
What if video game coaches had access to the same sorts of telemetry as coaches for traditional sports like soccer or the NFL? That’s the question that DeepLeague, software for analysing the coordinate position of a League of Legends player at a point in time, tries to answer. The software is able to look at a LoL minimap at a given point in time and use that to create a live view of where each specific player is. That sounds minor but it’s currently not easy to do this in real-time via Riot Games’ API, so DL (aided by 100,000 annotated game minimap images) provides a somewhat generalizable workaround.
  What it means: The significant thing here is that deep learning is making it increasingly easy to point a set of neural machinery at a problem, like figuring out how to map coordinates and player avatars to less-detailed dots on a minimap. This is a new thing: the visual world has never before been this easy for computers to understand, and things that look like toys at first frequently wind up being significant. Take a read of the ‘part 2’ post to get an idea of the technical details for this sort of project. And remember: this was coded up by a student over 5 days during a hurricane.
  Jargon, jargon everywhere: DeepLeague will let players “analyze how the jungler paths, where he starts his route, when/where he ganks, when he backs, which lane he exerts the most pressure on, when/where mid roams”. So there!
  Read more: DeepLeague: leveraging computer vision and deep learning on the League of Legends minimap + giving away a dataset of over 100,000 labeled images to further esports analytics research (Medium).
  Read even more: DeepLeague (Part 2): The Technical Details (Medium).
  Get the data (GitHub).

Rise of the drones: learning to operate and navigate without the right data with DroNet:
…No drone data? No problem! Use car & bicycle data instead and grab enough of it to generalize…
One of the main ways deep learning differs to previous AI techniques lies in its generalizability: neural networks fed on one data type are frequently able to attain reasonable performance on slightly different and/or adjacent domains. We’re already pretty familiar with this idea within object recognition – a network trained to recognize flowers should be able to draw bounding boxes around flowers and plants not in the training set, etc – but now we’re starting to apply the same techniques to systems that take actions in the world, like cars and drones.
  Now, researchers with the University of Zurich and ETH Zurich and the Universidad Politécnica de Madrid in Spain have proposed Dronet: a way to train drones to drive on city streets using data derived entirely from self-driving cars and bicycles.
  How it works: The researchers use an 8-layer Residual Network to train a neural network policy to do two things: work out the correct steering angle to stay on the road, and learn to avoid collisions using a dataset gathered via bicycle. They train the model via mean-squared error (steering) and binary cross-entropy (collision). The result is a drone that is able to move around in urban settings and avoid collisions, though as the input data doesn’t include information on the drone’s vertical position, it operates in these experiments on a plane.
  Testing: They test it on a number of tasks in the real world which include traveling in a straight line, traveling along a curve and avoiding collisions in an urban area. They also evaluate its ability to transfer to new environments by testing it in a high altitude outdoor environment, a corridor, and a garage, where it roughly matches or beats other baselines. The overall performance of the system is pretty strong, which is surprising given its relative lack of sophistication compared to more innately powerful methods such as a control policy implemented within a 50-layer residual network. “We can observe that our design, even though 80 times smaller than the best architecture, maintains a considerable prediction performance while achieving real-time operation (20 frames per second),” they say.
  Datasets: The researchers get the driving dataset from Udacity’s self-driving car project; it consists of 70,000 images of cars driving distributed over six distinct experiments. They take data from the front cameras and also the steering telemetry. For the collision dataset they had to collect their own and it’s here that they get creative: they mount a GoPro on the handlebars of a bicycle and “drive along different areas of a city, trying to diversify the types of obstacles (vehicles, pedestrians, vegetation, under construction sites) and the appearance of the environment. This way, the drone is able to generalize under different scenarios. We start recording when we are far away from an obstacle and stop when we are very close to it. In total, we collect around 32,000 images distributed over 137 sequences for a diverse set of obstacles. We manually annotate the sequences, so that frames far away from collision are labeled as 0 (no collision), and frames very close to the obstacle are labeled as 1 (collision)”.
  Drone used: Parrot Bebop 2.0 drone which passes footage at 30Hz via wifi to a computer running the neural network..
– Read more: DroNet: Learning to Fly by Driving (ETH Zurich).
– Get the pre-trained DroNet weights here (ETH Zurich).
– Get the Collision dataset here (ETH Zurich).
– Access the project’s GitHub repository here (GitHub).

UPS workers’ union seeks to ban drones, driverless vehicles:
In the absence of alternatives to traditional economic models, people circle the wagons to protect themselves from AI…
People are terrified of AI because they worry for their livelihoods. That’s because most politicians around the world are unable to suggest different economic models for an increasingly automated future. Meanwhile, many people are assuming that even if there’s not gonna be mass unemployment as a consequence of AI, there’s definitely going to be a continued degradation in wage bargaining power and the ability for people to exercise independent judgement in increasingly automated workplaces. As a consequence, workers’ unions are seeking to protect themselves. Case in point: the Teamsters labor union wants UPS to ban using drones or driverless vehicles for package deliveries so as to better protect their own jobs: this is locally rational, but globally irrational. If only society were better positioned to take advantage of such technologies without harming its own citizens.
– Read more: Union heavyweight wants to ban UPS from using drones or driverless vehicles (CNBC).

Human-in-the-loop AI artists, with ‘Deep Interactive Evolution’ (DeepIE):
…Battle of the buzzwords as researchers combine generative adversarial networks (GANs) with interactive evolutionary computation (IEC)…
The future of AI will involve humans augmenting themselves with increasingly smart, adaptive, reactive systems. One of the best ways to do this is with ‘human-in-the-loop’ learning, where a human is able to directly influence the ongoing evolution of a given AI system. One of the first places this is likely to show up is in the art domain, as artists access increasingly creative systems to help enhance their own creative practices. So it’s worth reading through this paper from researchers with New York University, the IT University of Copenhagen, and the Beijing University of Posts and Telecommunications, about how to smartly evolve novel images using humans, art, and AI.
  Their Deep Interactive Evolution approach relies on a four-stage loop: latent variables are fed into a pre-trained image generator which spits out images in response to the variables, these images are then shown to a user which selects the ones they prefer, new latent variables are derived based on those choices, then those variables are mutated according to rules defined by the user. This provides a tight feedback loop between the AI system and the person, and the addition of evolution provides the directed randomization needed to generate novelty.
“The main differentiating factor between DeepIE and other interactive evolution techniques is the employed generator. The content generator is trained over a dataset to constrain and enhance what is being evolved. In the implementation in this paper, we trained a nonspecialized network over 2D images. In general, a number of goals can be optimized for during the training process. For example, for generating art, a network that specializes in creative output can be used,” write the researchers.
  Testing: Testing subjectively generated art images is  a notoriously difficult task so it’s worth thinking about how these researchers did it: tue approach they used involved setting users two distinct tasks: one was to be presented with a predetermined picture which in this case was a shoe, and the other was to reproduce a picture of their own choosing. Both of these tests provide a way to evaluate how intuitive humans find the image-evolution process and also provide an implicit measure of the ease with which they can intuitively create with the AI.
  Results: “Based on self-reported numbers, users felt that they got much closer to reproducing the shoes than they did to the face. This could be predicted from figure 4. On average users reported 2.2 out of 5 for their ability to reproduce faces and 3.8 out of 5 for their ability to reproduce shoes, both with a standard deviation of 1,” write the researchers. My belief is that it’s much easier to generate shoes because they’re less complex visual entities and as humans we’re also not highly-evolved to distinguish between different types of shoe, whereas we are with faces, so I think we’ll always be more attuned to the flaws in faces and/or human-oriented things.
  Up next: “In the future, it will be interesting to extend the approach to other domains such as video games that can benefit from high-quality and controllable content generation.”
  Implementation details: The authors use a Wasserstein GAN with Gradient Penalty (WGAN-GP) network along with the DCGAN architecture. For evolution they use mutation and crossover techniques but, without being able to receive a specific signal from the user about the relative quality of the newly generated images, the network tends towards increasingly nutty images over time.
  Read more: Deep Interactive Evolution (Arxiv).
  Magnificent Jargon of the Week Award… for this incredible phrase: “Other options for mutation and crossover could involve interpolating between vectors along the hypersphere”. (Captain, interpolate the vectors across the hypersphere, please!).

Facebook releases the Detectron, its object detection research platform:
…Another computational dividend from AI-augmented-capitalism: free object detection for all…
Facebook AI Research (FAIR) has released Detectron, an open source platform for conducting research into object detection and segmentation. The package ships with a number of object detection algorithms, including: Mask R-CNN, RetinaNet, Faster R-CNN, RPN, Fast R-CNN, and R-FCN, which are each built on some standardized neural network architectures including ResNeXt, ResNet, Feature Pyramid Networks, and VGG16.
  License: Apache 2.0
  Read More: Detectron (GitHub).

Enter the AI agent PsychLab, courtesy of DeepMind:
…Not quite ‘there is a turtle lying on its back in the desert’, but a taste of things to come…
DeepMind thinks AI agents are now sophisticated enough that we should start running them through (very basic) psychological tests, so it’s built and released an open source testing suite to do that, based within its 3D Quake3-based ‘DeepMind Lab’ environment.
  PsychLab provides a platform to compare AI agents to humans on a bunch of tasks derived from cognitive psychology and visual psychophysics. The environment is a literal platform that the agent stands on in front of a large (simulated) computer monitor – so the agent is free to look around the world and even look away from the experiments. By testing their agents on some of these tasks DeepMind also ends up discovering a surprising flaw in its ‘UNREAL’ architecture which leads it to re-design part of the agents’ vision system based on knowledge of biological foveal vision, which improves performance. (Adding this improvement in also increases performance on the ‘laser tag’ set of tasks that were created by another team, providing further validation of the tweak.)
  Tasks: Some of the tasks the researchers test their agents on include being able to detect subtle changes in an environment, being able to identify the orientation of a specific ‘Landolt C’ stimulus, being able to figure out which of two patterns is a concentric glass pattern, visual search (aka, playing a low-res version of ‘Where’s Waldo’), working out the main direction of motion from a group of dots moving separately, and tracking multiple objects at once, among others.
  Results: UNREAL agents fail to beat humans on basically every single baseline, with humans displaying greater sample efficiency, adaptability, and generally higher baseline scores than the agents. One exception are some of the visual acuity tests, where tweaks by DeepMind to implement a foveal vision model lead to UNREAL agents that more closely match human performance. This foveal model also dramatically improves UNREAL performance on the non-psychological ‘laser tag’ test, leading to agents that more consistently beat humans or match their skills.
  The trouble with time: One problem the researchers deal with is that of time, namely that reinforcement learning agents learn through endless runs of en environment and gravitate to success via a reward function, whereas human subjects are typically tested over a ~one hour period after being given verbal instructions. This difference likely means RL agent performance is significantly higher on certain tasks due to overfitting during a subjectively far longer period of training (remember, computers run simulations far faster than we humans can experience reality). “Since nonhuman primate training procedures can take many months, it’s possible that slower learning mechanisms might influence results with non-human primates that could not operate within the much shorter time frame (typically one hour) of the equivalent human experiment,” write the researchers.
– Read more: PsychLab: A Psychology Laboratory for Deep Reinforcement Learning Agents (Arxiv).
Looming ethical paradox: Once we have agents that pass all of these tests with flying colors, will we need to start dealing with the ethical questions of whether it is acceptable to shutdown/restart/delete/tweak these agents? We’re likely years away from this, but I think way before we get agents that display general cognition we’ll have ones that seem lifelike enough that we’ll have to deal with these questions – I don’t think that today we execute monkeys after they’ve done a six-month lab testing period, so I’m wondering if we’ll have to change how we handle and store agents as well – perhaps the future life of an UNREAL agent is to be ‘paused’ and have its parameters saved, rather than being junked entirely.

Evolution Strategies for all:
Basic tutorial walks you through a Minimum Viable Experiment to learn ES…
Evolution Strategies is a technique for creating AI agents that can handle long-term planning at the cost of immense computation. It’s different to Deep Learning because in many sense it’s much more primitive, but it’s also potentially more powerful than Deep Learning in some domains thanks to its ability to have performance scale almost linearly with additional computation, letting you throw computers at problems too hard for existing more sophisticated algorithms.
  Now Florida AI chap Eder Santana has published a post walking us through how to experiment with ES on what he calls a ‘minimum viable experiment’ – in this case, implementing ES in the Keras programming framework and using the resulting system to train an agent to play catch. It’s a good, math-based walkthrough of how it works and comes with code.
– Read more: MVE Series: Playing Catch with Keras and an Evolution Strategy (Medium).
– Get the code: EvolutionMVE (GitHub).
– Read more: Evolution Strategies as a Scalable Alternative to Reinforcement Learning (OpenAI).
– Read more: A Visual Guide to Evolution Strategies (@hardmaru).
– Read even more: Uber’s recent research on ES (Uber Engineering Blog).

*** Davos 2018 AI Special Report ***
Entrepreneurs, World Leaders, chime in on AI and what it means.

Alibaba founder warns that civilization ill prepared for the AI revolution:
…Choose art and culture over repetitive tasks, says entrepreneur at Davos 2018…
“If we do not change the way we teach 30 years later we will be in trouble because the way we teach, the things we teach our kids, are the things from 200 years ago. And we cannot teach our kids to compete with machines – they are smarter,” Jack Ma said at Davos this year. “Everything we teach  should be different from the machine,” he said. “The computer will always be smarter than you are; they never forget, they never get angry. But computers can never be as wise a man. The AI and robots are going to kill a lot of jobs, because in the future it’ll be done by machines. Service industries offer hope – but they must be done uniquely.”
– Read more: Jack Ma on the IQ of love – and other top quotes from his Davos interview.

British Prime Minister positions UK as the place to lead AI development:
…Impact of speech dimmed by UK’s departure from influence of world stage due to Brexit…
“In a global digital age we need the norms and rules we establish to be shared by all. That includes establishing the rules and standards that can make the most of Artificial Intelligence in a responsible way, such as by ensuring that algorithms don’t perpetuate the human biases of their developers,” said the PM. “So we want our new world-leading Centre for Data Ethics and Innovation to work closely with international partners to build a common understanding of how to ensure the safe, ethical and innovative deployment of Artificial Intelligence.”
– Read more here: PM’s Speech at Davos 2018: 25 January (Gov.uk).

Google CEO stresses AI’s fundamental importance:
…”AI is probably the most important thing humanity has ever worked on,” said Pichai…   The Google CEO also said companies should “agree to demilitarize AI” and that we need “global multilateral frameworks” to tackle some of the issues posed by AI.
– Read more: Google CEO: AI is ‘more profound than electricity or fire’ (CNN).

Tech Tales:

[Japan, 2034: A public park with a pond.]

The wind starts and so the pond ripples and waves form across its long, rectangular surface. People throng at its sides; the ends are reserved for, at one end, the bright red race ribbon, and at the other, three shipping containers stacked side by side, with their doors flush with the edge of the pond, ready to open and let their cargo slide out and onto the undulating surface of the water. There’s an LED sign on top of the crates that reads, in strobing red&orange letters: KAWASAKI BOAT-RACE SPONSORED BY ARCH-AI: ‘INVENT FURTHER’.

At the other end of the course a person in a bright red jacket fires a starter pistol in the air and, invisibly, a chip in the gun relays a signal to a transcier placed halfway down the pond which relays the signal into the shipping crates, whose doors open outward. From each crate extends a metal tongue, which individually slide into the pond, each thin and smooth enough to barely cause a ripple. The boats follow, pushed from within by small robot arms, down onto the slides and then into the water. A silent electrically-powered utility vehicle lifts the crates once the boats are clear and removes them, creating more of a space for wind to gather and inhabit before plunging into the sails of the AI-designed boats.

Each boat is a miracle: a just-barely euclidean mess of sails and banisters and gantries and intricate pulleys. Each boat has been 3D printed overnight inside each of the three shipping crates, with their designs ginned up by evolutionary optimization processes paired with sophisticated simulations. And each is different – that’s the nice thing about wind; it’s so inherently unpredictable that when you can build micron-scale ropes and poles you can get really creative with the designs, relying on a combination of emergence and fiendishly-clever AI-dreamed gearing to turn your construction into something seaworthy.

The crowds cheer as the boats go past and tens of airborne drones film the reactions and track the gazes of various human eyeballs, silently ranking and scoring each boat not only on its speed relative to others, but on how alluring it seems to the humans. Enough competitions have been run now that the boat-making AIs have had to evolve their process many times, swapping out earlier designs that maximized sail surface area for ones made of serieses of independently moving ones, to the current iteration of speedy, attention-grabbing vessels, where the sails are almost impossible to individually resolve from a distance as, aside from a few handkerchief-sized ones, the rest shrink according to strange, fractal rules, down into the sub-visual. In this way each vessel moves, powered by pressures diverted into sails that are so fine and so carefully placed that they filgree together into something that, if you squint, seems like an entire thing, but the boats’ sounds of infinitely-tattered-flapping tell you otherwise.

A winner is eventually declared following the ranking of the crowd’s reactions and the heavily optimized single-digit-millimetre lead jockied for by the boards in the competition. Reactions are fed back. The electric utility vehicle brings the shipping containers back to the edge of the pond and sets each down by its edge in the same position as before. Inside, strange machines begin to whirr as new designs are built.

Later that night they burn the ships from the day’s competition, and the drones film that as well, silently feeding back points for the aesthetically pleasing quality of the burn to the printers in the containers: a little game the AIs play amongst themselves, unbeknownst to their human minders, as each seeks to find additional variables to explore. Perhaps one day the ships will be invisible, for they will each be made so fine.

Technologies that inspired this story: Human-in-the-loop feedback, evolutionary design, variational auto-encoders, drones, psychological monitoring via automated video analysis, etc.

Import AI: #78: Google gives away free K80 GPUs; Chinese researchers benchmark thermal imagery pedestrian trackers; and AI triumphs against dermatologists in diagnosis competition

AI beats panel of 42 dermatologists at spotting symptoms of a particular skin disorder:
…R-CNN + large amounts of data beats hundreds of years of combined medical schooling…
Scientists have gathered together a large medical-grade dataset of photos of fingernails and toenails and used it to train a neural network to distinguish symptoms of onychomycosis better than a panel of experts. The approach relies on faster R-CNN (GitHub), an object classifier originally developed by Microsoft Research (Arxiv), as well as convolutional neural networks that implement a resnet-152 model (also developed by Microsoft Research). It’s another datapoint that, at least in the perceptual domain, it seems like given enough data&compute we can design systems that can match or exceed humans’ capabilities at narrowly specified tasks.
  Data janitorial work: The researchers also contribute a dataset of almost ~50,000 nail photographs for use in further research. The paper includes details on how they shaped and cleaned their data to obtain this dataset – a process that involved the researchers having to train an object localizing system to be able to automatically crop their images to just feature nails, rather than misclassified other things (apparently initially the network would mistake teeth or warts for fingers.)
  Results: They comprehensively test the resulting networks against a variety of different humans with different skills, ranging from nurses to clinicians to professors with a dermatology specialism. In all cases the AI-based networks matched or exceed large groups of human experts on medical classification tasks. “Only one dermatologist performed better than the ensemble model trained with the A1 dataset, and only once in three experiments,” they write.
  The future: One of the promises of AI for medical use-cases is that it can dramatically lower the cost of initial analysis of a given set of symptoms. This experiment backs up that view, and in addition to gathering the dataset and developing the AI techniques, the scientists have also developed a web- and smartphone-based platform to collect and classify further medical data. “The results from this study suggest that the CNNs developed in this study and the smartphone platform we developed may be useful in a telemedicine environment where the access to dermatologists is unavailable,” they write.
–   Read more: Deep neural networks show an equivalent and often superior performance  to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network (PLOS One).

US defense establishment to invest in AI, robots:
…New National Defense Strategy memo mentions AI…
The US’s new National Defense Strategy calls for the government to “invest broadly in military application of autonomy, artificial intelligence, and machine learning, including rapid application of commercial breakthroughs”.
  The summary also hints at the troubling dual use nature of AI and other technologies. “The security environment is also affected by rapid technological advancements and the changing character of war. The drive to develop new technologies is relentless, expanding to more actors with lower barriers of entry, and moving at accelerating speed. New technologies include advanced computing, “big data” analytics, artificial intelligence, autonomy, robotics, directed energy, hypersonics, and biotechnology— the very technologies that ensure we will be able to fight and win the wars of the future.
–   Read more: Summary of the 2018 National Defense Strategy of The United States of America (PDF).

A movable feast of language modeling techniques from Fast.ai:
…Calibration, calibration, calibration…
Researchers with Fast.AI and Aylient Ltd have published details on Fine-tuned Language Models (FitLaM), a set of transfer learning methods to optimize language models for given domains. This paper has a similar flavor to DeepMind’s recent ‘Rainbow’ algorithm, where in both cases researchers integrate a bunch of recent innovations in their field (language modelling and reinforcement learning, respectively), to create an ‘everything-and-the-kitchen-sink’-style model, which attains good task performance.
  Results: FitLaM models attain state-of-the-art scores on five distinct text classification tasks, reducing errors by between 18 and 24 percent on the majority of the datasets.
  How it works: FitLaM models consists of an RNN with one or more task-specific linear layers, along with a tuning technique that manipulates more data in the higher layers of the network and less in the depths, aiding preservation of information gleaned from general-domain language modelling. Along with this, the authors develop a bunch of different techniques to further facilitate transfer, detailed exhaustively in the paper.
  Transfer learning: To aid transfer learning, the researchers pre-train a language model on a large text corpus – in this case Wikitext, which consists of over ~28,000 pre-processed Wikipedia articles. Other techniques they use include ‘gradual unfreezing’ of neural network layers during re-training, using cosine annealing for fine-tuning, and using reverse annealing as well.
  Tested domains: Sentiment analysis (two separate datasets), question classification, topic classification (two datasets).
–   Read more: Fine-tuned language models for text classification (Arxiv).

Google-owned Kaggle adds free GPUs to online coding service:
…Free GPUS with very few strings attached…
Google says users of Colaboratory, its live coding mashup that works like a cross between a Jupyter Notebook and a Google Doc, no comes with free GPUs. Users can write a few code snippets, detailed here, and get access to two vCPUs with 13GB of RAM and, the icing on the cake – an NVIDIA K80 GPU, according to a comment from an account linked to Michael Piatek at Google.
–    Access Colaboratory here.

First came ResNets, then DenseNets, now… SparseNets?
…Researchers chain networks together in weird ways to attain state-of-the-art results…
Neural networks can in one way be viewed as machines that operate over distinct datasets and figure out the transformation that ties them together, researchers have developed approaches (Residual Networks and DenseNets) that are able to pick up on successively finer-grained features that distinguish different visual phenomena, while insuring that as much information as possible can propagate from one layer of a network to another.
  Now, researchers with Simon Fraser University have tried to take the best traits from ResNets and DenseNets and synthesize them into SparseNets, a way of structuring networks that “aggregates features from previous layers: each layer only takes features from layers that have exponential offsets away from it… Experimental results on the CIFAR-10 and CIFAR-100 datasets demonstrate that SparseNets are able to achieve comparable performance to current state-of-the art models with significantly fewer parameters,” they write.
  Thrifty networks: So, what’s the motivation for structuring networks in such a way? It’s that if you can expand the size of the network without adding too many parameters, then you know you’ll ultimately be able to exploit this efficiency to build even larger networks in the future. Experiments with SparseNet show that networks built like this can attain accuracies similar to those obtained by ResNets and DenseNets on a far, far smaller parameter budget.
–   Read more: Sparsely Connected Convolutional Networks.

Bootstrapping data quality with neural networks:
…Chinese researchers try to scale data generation via AI…
Researchers with Soochow University, Alibaba Group, Shenzhen Gowild Robotics Co. Ltd, and Heilongjiang University, have developed a system to improve performance of Chinese named entity recognition (NER) techniques by generating low-quality data and improving its quality via adversarial training. NER is a specific skill that systems use to spot the key parts of sentences and how they link to a larger knowledge store about the world. Better NER approaches tend to quickly translate into improved consumer-facing or surveillance-oriented AI systems, like personal assistants, or databases for analyzing large amounts of speech.
  The technique: The researchers use crowd annotators to label specific datasets, such as those in dialog and e-commerce, and use a variety of neural network-based systems to analyze the commonalities and differences between the different NER labels applied by each individual to their specific sample of text. The resulting system is able to perform classification at higher accuracies than other systems trained on the same data, beating or matching other baselines created by the researchers.
–  Read more: Adversarial Learning for Chinese NER from Crowd Annotations (Arxiv).

Chinese researchers gather pedestrian tracking dataset and evaluate nine trackers on it:
Oy, you there! Yes, you, the iridescent person with the really warm hands!…
Chinese researchers have selected 60 videos shot in thermal infrared, compiled them, and turned them into a dataset to use to evaluate thermal infrared pedestrian tracking (TIR) technologies.
  Dataset: The 60 thermal sequences contain footage from a variety of devices (surveillance cameras, hand-held cameras, vehicle-mounted cameras, drones) across a mixture of differently scaled scenes, camera-positions, and video perspectives.
  Trackers evaluated: The researchers evaluate nine distinct pedestrian trackers that implement different methods, ranging from support vector machines, to correlation and regression filters, to deep learning approaches (systems: HDT and MCFTS). SRDCF – a spatially regularized discriminative correlation filter (PDF) – is the clear winner, attaining the most reliably high scores across a bunch of different tests.
  Surprisingly strong deep learning performance: Both neural network approaches (HDT and MCFTS) enjoy fairly consistent, high rankings as well. “We suggest that the deep feature based trackers have potential to achieve better performance if there are enough thermal images for training,” they write.
  Expensive: Deep learning approaches still seem fairly expensive, with the DL systems (HDT and MCFTS) attaining frames-per-second of 10.60 and 4.73 respectively when deployed on an Intel PC with a 1080 NVIDIA GPU and 32GB of RAM. SRDCF, by comparison, gets 12.29FPS.
–   Read more: PTB-TIR: A Thermal infrared Pedestrian Tracking Benchmark (Arxiv).

OpenAI Bits&Pieces:

Scaling Kubernetes to 2,500 Nodes:
An account of some of the problems we ran into and workarounds we devised as we scaled up our large AI infrastructure.
–   Read more: Scaling Kubernetes to 2,500 Nodes (OpenAI blog).

Tech Tales.

Earth, 2045:

Canary Wharf, London:

So I’m bent over trying to fit myself through a ventilation fan when I see them: two crates, sealed, tape still on them. I approach cautiously. The still air of the data center feels close, tomb-like. My suit is coated in dust from squeezing my way through the long-dormant fan. I put my hand on top of one of the boxes and close my eyes, imagining the inside of the crate and trying to will the things I am hunting for into existence. I take a deep breath and open the box.

There they are. Not as many as I’d hoped, but some. Each chip gets its own housing in a spongy, odorless, moisture-wicking, anti-static material. I peer in and see the familiar brand names: InnerEye, AccuVision, Mine+, Seeder. The manifest for the box lists a few others which are missing from the container, but I don’t fret. These will be enough.

You see, we knew Moore’s Law was ending, and we didn’t do much about it. Kind of like climate change. We just stared at the problem – again, similar; the dreadful consequence of energy distribution and dissipation over time – and built bigger fabs and crafted bigger chips and told people it was fine. But it wasn’t fine. In the background we were all driven to mass parrelalism, and this worked for a while – we built vast data centers around the world, all of us modeling ourselves on the early Google insight that The Datacenter is the Machine. Our advances were so impressive and consistent that people didn’t pay attention to the spiralling energy bills, or the diminishing returns we were getting from going big.

Then the wars happened. Some of them purely economic, others physical – ‘kinetic’ in terms used by certain military types. Fabs were destroyed. It’s not like we went back to the stone age, but we had to move back up the nanometre process curve for chip manufacturing all the way to 10nm – decades of progress, hiccuped away in fireballs. Now, sub-10nm node chips are all spoken for whether from government buyers, AI companies, or the family offices of the world’s billionaires who are all, as ever, obsessed with simulating a future that has not yet arrived and acting accordingly.

So that’s why people like me exist. We don’t go and buy new chips, we just go and find old ones. Because for certain things there’s no substitute for speed. They’re talking now about registering all chips so as to be able to spot ‘illegal AI activity’. So that’s creating even more demand for me and my services. I don’t much like to think about what happens to these chips after I hand them over – though it doesn’t take much thought to realize that the situations where you’re willing to pay this much money are life and death situations. Now whether these are for machines that guard or machines that hunt is another question.

Technologies that inspired this story: Moore’s Law, fab construction costs, the implicit geopolitics of compute.

 

 

Import AI: #77: Amazon tests inventory improvement by generating fake customers with GANs, the ImageNet of video arrives, and robots get prettier with Unity-MuJoCo tie-up.

Urban flaneurs generate fake cities with GANs:
…Researchers discover same problems that AI researchers have been grappling with, like those relating to interpretability and transparency…
Researchers have used generative adversarial networks to generate a variety of synthetic, fictitious cities. The project shows that “a basic, unconstrained GAN model is able to generate realistic urban patterns that capture the great diversity of urban forms across the globe,” they write. This isn’t particularly surprising since we know GANs can typically approximate the distribution of the data they are fed on – though I suspect the dataset (30,000 images) might be slightly too small to do away with things like over-fitting.
  Date used: The Global Urban Footprint, an inventory of built-up land at 12m/px resolution, compiled by the German Aerospace Center.
  Questions: It’s always instructive to see the questions posed by projects that sit at the border between AI and other disciplines, like geography. For this project, some open questions the researchers are left with include: “How to evaluate the quality of model output in a way that is both quantitative, and interpretable and intuitive for urban planning analysis? How to best disentangle, explore, and control latent space representations of important characteristics of urban spatial maps? How to learn from both observational and simulated data on cities?,” and more.
  Read more here: Modeling urbanization patterns with generative adversarial networks (Arxiv).

The ImageNet of video (possibly) arrives:
…MIT’s ‘Moments in Time’ dataset consists of one million videos, each of which is 3 seconds long…
Much of the recent progress in machine learning has been driven in part by the availability of large-scale datasets providing a sufficiently complex domain to stress-test new scientific approaches against. MIT’s new ‘moments in time’ dataset might just be the dataset we need for video understanding, as it’s far larger than other available open source datasets (eg activitynet, kinetics, UCF, etc), and also has a fairly broad set of initial labels (339 verbs linked to a variety of different actions or activities.)
  Video classification baselines: The researchers also test the new dataset on a set of baselines based on systems that use techniques like residual networks, optical flow, and even sound (via usage of a SoundNet network). These baselines get top-5 accuracies of as high as 50% or so, which means that at least one selection within five proffered by the system is correct. The best performing approach is a ‘temporal relation network’ (TRN). This network attained a score of about 53% and was trained on RGB frames using the InceptionV3 image classification architecture.
  Next:“Future versions of the dataset will include multi-labels action description (i.e. more than one action occurs in most 3-second videos), focus on growing the diversity of agents, and adding temporal transitions between the actions that agents performed,” the researchers write.
   Read more: Moments in time dataset: one million videos for event understanding (Arxiv).

Ugly robots no more: Unity gets a MuJoCo plugin:
…Tried-and-tested physics simulator gets coupled to a high-fidelity game engine…
Developers keen to improve the visual appearance of their AI systems may be pleased to know that MuJoCo has released a plugin for the Unity engine. This will let developers import MuJoCo models directly into Unity then visualize them in snazzier environments.
  “The workflow we envision here is closer to the MuJoCo use case: the executable generated by Unity receives MuJoCo model poses over a socket and renders them, while the actual physics simulation and behavior control take place in the user’s environment running MuJoCo Pro,” write the authors.
  Read more here: MuJoCo Plugin and Unity Integration.

Google censors itself to avoid accidental racism:
…Search company bans search terms to protect itself against photos triggering insulting classifications…
A little over two years ago Google’s Google Photos application displayed an appalling bug: searches for ‘gorillas’ would bring up photos of black people. There was a swift public outcry and Google nerfed its application so it wouldn’t respond to those terms. Two years later, despite ample progress in AI and machine learning, nothing has changed.
  “A Google spokesperson confirmed that “gorilla” was censored from searches and image tags after the 2015 incident, and that “chimp,” “chimpanzee,” and “monkey” are also blocked today. “Image labeling technology is still early and unfortunately it’s nowhere near perfect,” the spokesperson wrote in an email, highlighting a feature of Google Photos that allows users to report mistakes,” Wired reports.
   Read here: When it comes to Gorillas, Google Photos Remains Blind (Wired).

The first ever audiobook containing a song generated by a neural network?
…’Sourdough’ by Robin Sloan features AI-imagined Croatian folk songs…
Here’s a fun blog post by author Robin Sloan about using AI (specifically, SampleRNN) to generate music for his book. Check out the audio samples.
   Read more: Making The Music of the Mazg.

Miri blows past its 2017 funding target thanks to crypto riches:
…Crypto + AI research, sitting in a tree, S-Y-N-E-R-G-I-Z-I-N-G!…
The Machine Intelligence Research Institute in Berkeley has raised over $2.5 million with its 2017 fundraiser, with a significant amount of funding coming from the recent boom in cryptocurrencies.
  66% of funds raised during this fundraiser were in the form of cryptocurrency (mainly Bitcoin and Ethereum),” Miri writes.
   Read more here: Fundraising success! (Miri).

Amazon turns to GANS to simulate e-commerce product demand… and it sort of works!
…As if the e-retailer doesn’t have enough customers, now it’s inventing synthetic ones…
Researchers with Amazon’s Machine Learning team in India have published details on eCommerceGAN, a way to use GANs to generate realistic, synthetic customer and customer order data. This is useful because it lets you test your system for the vast combinatorial space of possible customer orders and, ideally, get better at predicting how new products will match with existing customers, and vice versa.
  “The orders which have been placed in an e-commerce website represent only a tiny fraction of all plausible orders. Exploring the space of all plausible orders could provide important insights into product demands, customer preferences, price estimation, seasonal variations etc., which, if taken into consideration, could directly or indirectly impact revenue and customer satisfaction,” the researchers write.
  The eCommerce GAN (abbreviated to ‘ecGAN’) lets the researchers create a synthetic “dense and low-dimensional representation of e-commerce orders”. They also create an eCommerce-conditional-GAN (ec^2GAN), which lets them “generate the plausible orders involving a particular product”.
  Results: The researchers created a 3D t-SNE map of both real customer orders and GAN-generated ones. The plots showed a strong correlation between the two with very few outliers, suggesting that their ecGAN approach is able to generate data that falls within the distribution of what e-retailers actually see. To test the the ec^2GAN they see if it can conditionally generate orders that have similar customer<>product profiles to real orders – and they succeed. It might sound a bit mundane but this is a significant thing for Amazon: it now has a technique to simulate the ‘long tail’ of customer<>product combinations and as it gets better at predicting these relationships it could theoretically get better at optimizing its supply-chain / just-in-time inventory / marketing campaigns / new product offerings and test groups, and so on.
  Data: The researchers say they “use the products from the apparel category for model training and evaluation. We randomly choose 5 million orders [emphasis mine] made over the last one year in an e-commerce company to train the proposed models.” Note that they don’t specify where this data comes from, though it seems overwhelmingly likely it derives from Amazon since that’s where all the researchers worked during this project, and no other dataset is specified.
   Read more: eCommerceGAN: A Generative Adversarial Network for eCommerce (Arxiv).

Why AI research needs to harness huge amounts of compute to progress:
…The future involves big batches, massive amounts of computation, and, for now, lots of hyperparameter tuning…
AI researchers are effective in proportion to the quantity of experiments they can run over a given time period. This is because deep learning-based AI is predominantly an empirical science, so in the absence of strong theoretical guarantees researchers need to rigorously test algorithms to appropriately debug and develop them.
  That fact has driven recent innovations large-scale distributed training of AI algorithms, initially for traditional classification tasks, like the two following computer vision examples (covered in #69 of Import AI.)
   July, 2017: Facebook trains an ImageNet model in ~1 hour using 256 GPUs.
   November, 2017: Preferred Networks trains ImageNet in ~15 minutes using 1024 NVIDIA P100 GPUs.
   Now, as AI research becomes increasingly focused on developing AI agents that can take actions in the world, the same phenomenon is happening in reinforcement learning, as companies ranging from DeepMind (Ape-X, Gorilla, others) to OpenAI (Evolutionary Strategies, others) try to reduce the wall clock time it takes to run reinforcement learning experiments.
  New research from deepsense.ai, Intel, and the Polish Academy of Sciences shows how to scale-up and tune a Batch Asynchronous Advantage Actor-Critic algorithm with the ADAM optimizer and a large batch size of 2048 to let them learn to competently play a range of Atari games in a matter of minutes; in many cases it takes the system just 20 minutes or so to attain competitive scores on games like Breakout, Boxing, Seaquest, and others. They achieve this by scaling up their algorithm via techniques gleaned from the distributed systems world (eg, parameter surveys, clever things with temporal alignment across different agents, etc), which lets them run their algo across 64 workers comprising 768 distinct CPU cores.
  Next: PPO: The authors note that PPO, a reinforcement learning algorithm developed by OpenAI, is a “promising area of future research” for large-scale distributed reinforcement learning.
   Read more: Distributed Deep Reinforcement Learning: learn how to play Atari games in 21 minutes.

Googlers debunk bogus research into getting neural networks to detect sexual orientation:
“AI is a general-purpose technology that can be used to automate a great many tasks, including ones that should not be undertaken in the first place”…
Last fall, researchers with Stanford published a paper on Arxiv claiming that a neural network-based image classification system they designed could detect sexual orientation more accurately than humans. The study – Deep neural networks are more accurate than humans at detecting sexual orientation from facial images – was criticized for making outlandish claims and widely covered i nthe press. Now, the paper has been accepted for publication in a peer-reviewed academic journal – the Journal of Personality and Social Psychology. This seems to have motivated Google researchers Margaret Mitchell and Blaise Aguera y Arcas, and Princeton professor Alex Todorov, to take a critical look at the research.
  The original study relied on a dataset composed of 35,326 images taken from public profiles on a US dating website as its ground-truth data. You can get a sense of the types of photos present here by creating composite “average” images from the ground-truth labelled data – when you do this you notice some significant differences: the “average” heterosexual male face doesn’t have glasses, while the gay face does, and similarly the faces of the “average” heterosexual females have eyeshadow on them, while the lesbian faces do not.
  Survey:Might it be the case that the algorithm’s ability to detect orientation has little to do with facial structure, but is due rather to patterns in grooming, presentation and lifestyle?” wonder the the Google and Princeton researchers. To analyze this they surveyed 8,000 Americans using Amazon mechanical turk and asked them 77 yes/no questions, ranging from sexual disposition, to whether they have a beard, wear glasses, and so on. The results of the survey seem to roughly track with the “average” images we can extract from the dataset, suggesting that rather than developing a neural network that can infer your fundamental sexual proclivity by looking at you, the researchers have instead built a snazzy classifier that conditions the chance on whether you are gay or straight based on whether you’re wearing makeup or glasses or not.
  To illustrate the problems with the research the Googlers show that they can attain similar classification accuracies to the original experiment purely through asking a series of yes/no questions, with no visual aid. “For example, for pairs of women, one of whom is lesbian, the following not-exactly-superhuman algorithm is on average 63% accurate: if neither or both women wear eyeshadow, flip a coin; otherwise guess that the one who wears eyeshadow is straight, and the other lesbian. Adding six more yes/no questions about presentation (“Do you ever use makeup?”, “Do you have long hair?”, “Do you have short hair?”, “Do you ever use colored lipstick?”, “Do you like how you look in glasses?”, and “Do you work outdoors?”) as additional signals raises the performance to 70%,” they write.
  Alternate paper title: In light of this criticism, perhaps a better title for the paper would be Deep neural networks are more accurate than humans at predicting the correspondence between various garments and makeup and a held-out arbitrary label. But we knew this already, didn’t we?
   Read more here: Do algorithms reveal sexual orientation or just expose our stereotypes?

OpenAI Bits & Pieces:

Science interviews OpenAI’s Tim Salimans for a story about Uber’s recent work on neuroevolution.
   Read more: Artificial intelligence can ‘evolve’ to solve problems (Science).

Tech Tales:

Simulpedia.

You get the call after utilization in the data center ticks up from 70% to 80% all the way to 90% in the course of 24 hours.
   “Takeoff?” you ask on the phone.
   “No,” they say. “Can’t be. This is something else. We’ll have more hardware tomorrow so we can scale with it, but we need you to take a look. We’ve sent a car.”

So you get into the car and go to an airport and fly a few thousand miles and land and get into another blacked-out car which auto-drives to the site and you go through security and then the person who spoke to you on the phone is in front of you saying “it isn’t stopping.”
  “Alright,” you say, “Show me the telemetry.”
  They lead you into a room whose walls are entirely made of computer monitors. Your phone interfaces with the local computer system and after a few seconds you’re able to control the screens and navigate the data. You dive in, studying the jagged lines of learning graphs, the pastel greens and reds of the utilization dashboard, and eventually the blurry squint-and-you’ll-miss-it concept-level representations of some of the larger networks. And all the while you can see utilization in the data center increasing, even as new hardware is integrated.
   “What the hell is it building in there,” you mutter to yourself, before strapping on some VR goggles and going deeper. Navigating high-dimensional representations is a notorious mindbender – most people have a hard time dealing with non-euclidean geometries; corridors that aren’t corridors, different interpretations of “up” and “down” depending on which slice of a plane you’ve become embedded in, spheres that are at once entirely hollow and entirely solid, and so on. You navigate the AI’s embedding, trying to figure out where all of the computation is being expended, while attempting to stop yourself from throwing up the sandwich you had in the car.
   And then, after you grope your way across a bridge which becomes a ceiling which becomes a door that folds out on itself to become the center of a torus, you find it: in one of the panes of the torus looking into one of the newer representational clusters you can see a mirror-image of the larger AI’s structure you’ve been navigating. And beyond that you can make out another torus in the distance containing another pane connecting to another large-scale non-euclidean representation graph. You sigh, take off your glasses, and make a phone call.

“It’s a bug,” you say. “Did you check the recursion limits in the self-representation system?”
   Of course they hadn’t. So much time and computation wasted, all because the AI had just looped into an anti-pattern where it had started trying to simulate itself, leading to the outward indicators of it growing in capability – somewhat richer representations, faster meta-learning, a compute and data footprint growing according to some clear scale law. But those symptoms didn’t bely a greater  intelligence, rather just the AI attempting to elucidate its own Kolmogorov complexity to itself – running endless simulations of itself simulating simulations of itself, to try and understand a greater truth, when in fact it was just a mirrow endlessly refracting upon itself.

Concepts that inspired this story: Procedural maze generators, Kolmogorov complexity, non-euclidean virtual reality (YouTube video).